key: cord-0428856-7stmm0ds authors: Šulc, Petr; Solovyov, Alexander; Marhon, Sajid A.; Sun, Siyu; LaCava, John; Abdel-Wahab, Omar; Vabret, Nicolas; De Carvalho, Daniel D.; Monasson, Rémi; Cocco, Simona; Greenbaum, Benjamin D. title: Repeats Mimic Immunostimulatory Viral Features Across a Vast Evolutionary Landscape date: 2021-11-04 journal: bioRxiv DOI: 10.1101/2021.11.04.467016 sha: 8dedcde5df9266104e60557aa453b48b968fb874 doc_id: 428856 cord_uid: 7stmm0ds An emerging hallmark across many human diseases - such as cancer, autoimmune and neurodegenerative disorders – is the aberrant transcription of typically silenced repetitive elements. Once transcribed they can mimic pathogen-associated molecular patterns and bind pattern recognition receptors, thereby engaging the innate immune system and triggering inflammation in a process known as “viral mimicry”. Yet how to quantify pathogen mimicry, and the degree to which it is shaped by natural selection, remains a gap in our understanding of both genome evolution and the immunological basis of disease. Here we propose a theoretical framework that combines recent biological observations with statistical physics and population genetics to quantify the selective forces on virus-like features generated by repeats and integrate these forces into predictive evolutionary models. We establish that many repeat families have evolutionarily maintained specific classes of viral mimicry. We show that for HSATII and intact LINE-1 selective forces maintain CpG motifs, while for a set of SINE and LINE elements the formation of long double-stranded RNA is more prevalent than expected from a neutral evolutionary model. We validate our models by showing predicted immunostimulatory inverted SINE elements bind the MDA5 receptor under conditions of epigenetic dysregulation and that they are disproportionately present during intron retention when RNA splicing is pharmacologically inhibited. We conclude viral mimicry is a general evolutionary mechanism whereby genomes co-opt features generated by repetitive sequences to trigger the immune system, acting as a quality control system to flag genome dysregulation. We demonstrate these evolutionary principles can be learned and applied to predictive models. Our work therefore serves as a resource to identify repeats with candidate immunostimulatory features and leverage them therapeutically. The ability to predict the presence of patterns sensed by the innate immune system is of considerable theoretical and practical interest 1 . For instance, mathematical models of the evolution of human H1N1 influenza since the 1918 pandemic showed an attenuation of CpG motifs, leading to the prediction such motifs are targeted by pattern recognition receptors (PRRs) 2, 3 , and trigger pro-inflammatory responses. It was subsequently discovered that the protein ZAP (ZC3HAV1) is a PRR targeting CpG motifs, indicating inferences drawn from genome evolution can predict new receptor specificities relevant to emerging and adapting viruses 4, 5 , including SARS-CoV-2 6 . It has been more difficult to predict PRR specificities from structure prediction. There are multiple receptors known to recognize long and short double stranded RNAs (dsRNAs). For example, MDA-5 (IFIH1I) recognizes long dsRNA segments present during RNA virus replication and TLR-3 recognizes shorter segments, on the order of tens of base pairs 7 . Surprisingly, it recently became clear that repetitive elements, which represent most of the human genome and may derive from integrated viruses, can di la -self" pathogen-associated molecular patterns (PAMPs). Under aberrant conditions such as in cancer 8 , repeats are frequently overexpressed, where they may display PAMPs, such as anomalous CpG content and dsRNA [9] [10] [11] [12] [13] [14] . Consistently, a growing body of literature has demonstrated the aberrant expression of immunostimulatory repeats across an array of human diseases, such as in aging 15 and autoimmunity 16 , i l i g i al i ic a be a fundamental feature of inflammatory diseases. Moreover, viral mimicry can be leveraged therapeutically: the expression of immunostimulatory repeats is inducible by epigenetic drugs, leading to the triggering of innate sensors and induction of an interferon response [10] [11] [12] [13] [14] . Several fundamental questions remain, such as which human sensors can be activated by which repeats, if viral mimicry serves a functional role in the genome as an evolved checkpoint for loss of epigenetic regulation or genome fidelity, and whether tumors and pathogens have learned to manipulate mimicry to their own selective advantage 17, 18 . In one evolutionary scenario, repeats which form features in somatically silenced, low-complexity regions can create PAMPs that offer a fitness advantage to cells due to their ability to trigger PRRs under epigenetic stress, eliminating dysregulated cells and maintaining tissue homeostasis 17, 18 . Such features would then be maintained by natural selection. Alternatively, in a neutral scenario, it may be that high RNA concentration resulting from dysregulation can engage PAMPs non-specifically, and their sensing is a convenient byproduct of dysregulation rather than selection acting on specific sequence features. Discriminating between these scenarios is key to understanding how non-self mimicry by the self-genome has evolved, and how it can be leveraged for emerging therapies and honed for existing ones. There is therefore a pressing need for new approaches to quantify the presence of viral mimics, infer parameters defining their immunological features, and quantify their evolutionary dynamics in this reduced feature space. We propose a theoretical approach to quantifying immunostimulatory nucleicacid motifs and double-stranded structures under selection, and present two models for describing the evolutionary dynamics of an immunological feature generated by repeats. In doing so we define specific categories of repeat families that most likely were retained by natural selection to trigger specific receptors of the innate immune system under aberrant conditions. We generalize the framework of selective and entropic forces to infer anomalous sequence features 3 . In our approach, genome segments, subject to constraints such as local nucleic acid content, are randomized by entropic forces to resemble, on average, self-genomic material and are ordered by selective forces acting on sequence features to oppose such randomization. Rather than using pvalues to compare the strength of avoidance or enhancement of a certain candidate immunostimulatory feature, the selective force is an intensive parameter that can be readily compared between sequences and is easily interpretable as the information theoretic cost of avoiding or enhancing specific features in a sequence. To calculate selective forces, one uses exact transfer matrix methods from statistical physics which, unlike previous approaches 2 , are computationally efficient (scaling with the length of the sequence) and facilitate the analysis of longer sequences and large databases. We calculate the degree to which any sequence displays a feature bias (as defined in Methods). To apply this formalism to the evolutionary dynamics of immunostimulatory features, we use this parameter for two approaches to study the population genetics of immunostimulatory features in an ensemble of genome sequences. The first approach uses relaxation dynamics for the evolution of repeats in the genome. In this formalism, a new repeat with a force on an immunostimulatory feature will evolve until its force value reaches an equilibrium determined by the specificity of PRRs in its host. For an analogy, the 1918 H1N1 influenza virus had one set of features in its original avian host, and then evolved towards a new equilibrium in humans, where PRRs target CpG with greater affinity and therefore exert a greater selective force 3 . The second approach uses a Wright-Fisher (WF) model that considers the evolution of the probability of a sequence with given immunostimulatory feature content 19 . While relaxation dynamics was applied to the evolution of dinucleotides motifs under selective pressure in viral genomes 3 , here we connect selective forces to intrinsic molecular mutational processes in human genomes by use of population genetics (Methods). We implement the WF model numerically and evolve, assuming haploid reproduction of sequences, a set of sequences according to a neutral mutation model without a selection term to provide a null model of repeat evolution in the human genome. For each simulation step, we pick a random base for each sequence in the ensemble and mutate it to a randomly chosen different base with a given probability. We consider different possible mutation probabilities depending on the type of base being mutated into, as well as on the local nucleotide context. Additionally, in vertebrates and plants, mutations in CpG context are known to be more common due to methylation induced hypermutability 20 . Hence, we use different ratios of mutation rates corresponding to nucleotide transitions and transversions in a CpG context and to transitions and transversion in non-CpG context 20 . We calculated the dinucleotide distribution stationary value, obtained as the stationary vector of the stochastic matrix with entries corresponding to probabilities of mutating from one dinucleotide to another dinucleotide (see Methods and Table 1 ). A repetitive element is primarily defined by the presence of multiple copies (inserts) of its sequence. We compare the evolution of dinucleotide motifs (quantified by calculating the selective force, , on a dinucleotide motif, , as defined in Methods) between the original consensus sequence, representing the sequence most likely to be close to the founding ancestral insertion, and its subsequent copies in the genome (Fig. 1) . We analyzed all repeat families annotated in the DFAM database and calculated the dinucleotide forces for their consensus sequences as well the mean force on all inserts from a given family 21 , finding outliers such as a set of Alu repeats and HSATII, the later consistent with previous results 9 (Fig. 1) . The greatest differences between the forces on dinucleotides for a consensus sequence and its subsequent inserts were observed for CpG (Fig. 1A) . For all other dinucleotides, the force change with respect to the consensus is approximately 0, as illustrated in Supplementary Figure 1 . Typically, CpG content in the human genome is highly underrepresented (Extended Data Fig. 1 ) and CpG sites mutate at a much faster rate than the rest of the genome due to their aforementioned hypermutability 20, [22] [23] . As a result, understanding whether the CpG content of a repeat has ela ed" a ical le el is held fixed by selection can indicate whether a repeat transcript can be recognized by a PRR that senses CpG motifs. We evaluated the mean force for all other annotated repeats longer than 150 bases. We plot the mean difference in CpG force per repeat family versus the CpG force of the consensus ancestral insert (Fig. 1B) . Consistently, we see that families where the selective force on CpG dinucleotides for the progenitor insert was greater than 1.9 ha e dec ea ed heir force to this value, while those less than 1.9 ha e increased their value. We therefore establish a genome-wide equilibrium in line with equilibria observed for human adapted viruses such as influenza and SARS-CoV-2 2,2,6 . If a repeat is not subject to selection, one would expect its insertion to evolve according to a WF model with respective mutation rates for transitions and transversions. This approach has been used in several sequence evolution models to explain lower CpG content in vertebrate genomes [24] [25] [26] . However, CpG motifs are also functional. Methylation of CpGs in DNA is an important regulator of gene expression 27, 28 , and CpG-rich RNA can have immunostimulatory properties 9 . Therefore, one could expect selection to act against depletion of functional CpG motifs, as observed in CpG islands located in gene promoters of vertebrates 29 . Indeed, most repeat families show relaxation to the mean genome force expected from the neutral model, further implying HSATII and Alu repeats may be specifically under selection to trigger PRRs (Fig. 1C ). As LINE-1 elements have the most copies in the genome, they are most amenable to our approach. They are estimated to constitute about 20% of human genome 30 . Here we only consider full-length inserts, as annotated in L1Base2, and contrast those designated as fully intact (denoted FLI), from those full-length sequences designated as non-intact (FLnI) 31 . Fully functional LINE-1 DNA sequences are regulated by promoter hyper-methylation, which occurs at CpGs, to inhibit their transcription 32 . Indeed, we find FLI LINE-1 have higher CpG content than FLnI ( Fig. 2A) . We calculated the mean Kimura distance 33 to all FLI sequences for each of the FLnI sequences as proxy for time since insertion, finding that as a LINE-1 genome insertion ceases to contain an intact copy, its CpG content decays to the genome mean in a predictable way (Fig. 2B) , reaching a plateau of -2.0, within the margin of error for the equilibrium of -1.9. We would expect the most recent inserts into the human genome to not have equilibrated. It is important to identify all such cases because the families that have not saturated are candidates for viral mimicry such as, for example, when overexpressed in tumors 8, [35] [36] [37] . The clearest instance is HSATII. The evolutionary dynamics of the force relaxation fit for HSATII ( Fig. 2B ) corresponds to saturation at force approximately e al 0.4, well above the equilibrium distribution given by the WF model simulations (Fig. 2B , green line), implying its ability to stimulate PRRs is maintained by selection. Other outliers comprise repeat families that are still close in age to the original CpG-rich insert or families whose CpG force is decreased at lower rate than observed for other repeat families, implying its features are maintained by selection. For most of families the data points are scarce and noisy, making a relaxation fit such as the one shown for HSATII and LINE-1 difficult. The full genome atlas of CpG-rich repeat families is listed in the Supplementary Table 1 Fig. 2C , showing an enhancement in introns and depletion in intergenic regions (Fig. 2D ). Most hotspot loci have a Kimura distance from the consensus of less than 0.1 and belong to Alu subfamilies, these species likely maintain their anomalous sequence features due to being evolutionary young compared to the founding member of their repeat family. Other families besides HSATII with higherthan-average Kimura distance from the consensus larger are MER21, TAR-1, and LTR6B families, which may have CpG dinucleotides maintained by selection to trigger PRRs in a dysregulated state. We extend our approach to the evolution of repeats that can trigger PRRs via double-stranded RNA (dsRNA) formation. Known dsRNA receptors include TLR-3, RIG-I, and MDA-5 7 . While the detailed mechanism of dsRNA motif recognition and receptor activation are still a subject of active research, it is generally accepted that TLR-3 is activated by short (approx. 30 bp) endosomal dsRNA and RIG-I (DDX58) by short (tens of bases) cytoplasmic dsRNA accompanied by a triphoshphate 37 , while MDA-5 recognizes longer cytoplasmic dsRNA 38 . We study the distribution of double-stranded segments in annotated regions in human genome, quantified by the double stranded force, (Methods). It is analogous to forces on dinucleotide motifs, where = 0 if, for a given sequence, the length of its longest complementary segments corresponds to what one would expect from a neutral model of a random sequence with the same nucleotide distribution and length. We quantified for repetitive families as well as ncRNA and mRNA sequences. The histogram of observed double-stranded forces is shown in Fig. 3A , along with a histogram of randomly generated sequences of different lengths. While the mean value and standard deviation of functional mRNA and ncRNA sequences is essentially random, the consensus sequences of repeats contain multiple families with long complementary segments, contributing to an increased average value (Fig. 3A) . Such repeats therefore entered the genome with the potential ability to form dsRNA segments and, as with CpG motifs, typically lost that ability over time due to mutations. While the general trend is to relax the double-stranded force towards zero (Fig. 3B) , there are several repeat families with large values, indicating a possible reservoir of double-stranded segments being maintained by selection (Fig. 3C , Extended Data Fig. 2A ). Many of these families were not detected by the selective force on CpG dinucleotides, implying the selective forces on dinucleotides and RNA structures are largely independent and detected by distinct PRRs. Several outliers have a high positive values, including the species Tigger4a and HSMAR (Extended Data Fig. 2B ). While they are DNA transposons, we found also their RNA transcripts in The Cancer Genome Atlas (TCGA -https://www.cancer.gov/tcga), and hence their RNA may still be immunostimulatory when transcribed. To locate possible sources of double-stranded segments originating from the same transcript, we scan the entire genome (HG38 assembly), using a window of transcripts of length 3000bp, comparable to typical lengths of long ncRNAs 39 . We scan these windows for two fully complementary segments (through Watson-Crick or wobble base pairs). We quantified the sequence complexity of such complementary segments (based on Kolmogorov complexity, as described in Methods), as shown in Fig. 3D . The segments close to the low complexity limit typically contain a repeating motif of only a few nucleic acids (such as poly(AT)) while the longest segments have higher complexity, i,e. the long dsRNA are not exclusively being formed by simple repeats. The longest inserts with high complexity correspond to segments that do not overlap with any known insert, annotated gene or ncRNA. An atlas of all families of repeats analyzed are summarized in Supplementary Table 2&3. We specifically explored which specific genome loci, as opposed to consensus repeats, can stimulate MDA-5 receptors by forming long dsRNA segments, as their transcription has been implicated as a response to genome-wide DNA demethylation 10 . Using a sliding window of the entire human genome, with transcript length of 3000bp, we observed the two peaks, a major one close to 0 and a smaller around 0.5 (Fig 4A) , consistent with the results for consensus repeats found in Fig. 3 . We found that for the majority (74%) of regions with 0.5 the complementary segments in the 3000 bases long regions overlap with known repeats. Greater than 90% of identified complementary segments correspond to AluS and AluY, two inserts from Alu families, where a copy has inserted in a positive orientation close to one in a negative orientation (inverted-repeat Alus IR-Alus) (Fig 4B) . These results, based solely on evolutionary analysis using our framework, are strikingly predictive of the experimental observations that IR-Alus are the major source of self-RNA that form MDA-5 agonists 10 . To test this hypothesis, we plotted a histogram of the transcripts found experimentally in Ref. 10 to bind MDA-5, both at baseline and after treatment with a DNA demethylating agent (Fig. 4A) . Those experimentally validated MDA-5 agonist dsRNAs indeed have a clear peak at 0.5 (Fig. 4A) , providing strong experimental support to the predictive power of our evolutionary model and, in turn, the hypothesis that evolution selected this feature as an epigenetic checkpoint 17, 18 . The mean length of the longest complementary segments found in the dataset with > 0.5 is 40 base pairs. We further investigated a subset, consisting of regions that can form 100 base pairs or longer doublestranded segments. In this subset, only 20% of the complementary segments overlap with known annotated repeat segments. Besides the Alu subfamilies (which constitute about 40% of long complementary segments that overlap with known inserts), we also identified complementary fragments inserted in their positive and negative orientation from the ORF2 open reading frame of LINE-1, which is lowly expressed compared to ORF1, the other LINE-1 open reading frame, in human cancers. 40 In addition, we observed most regions (56.9%) with 0.5 were over-represented at intronic regions ( Fig 4C) . These results are consistent with a recent hypothesis that intronic repeats can form dsRNA and induce viral mimicry as a checkpoint against intron retention 41, 42 . We therefore hypothesized that predicted repeats with high dsRNA force would be disproportionately present when introns are retained as a checkpoint against splicing abnormalities 18 . To test this hypothesis, we analyzed the effects of a class of inhibitors of RNA splicing which induce intron retention and exhibit synthetic lethal interactions in cancers with mutations in RNA splicing factors such as SF3B1 43 . We examined RNA sequencing data from SF3B inhibitors (including the drugs E7107 and H3B-8800) which cause the retention of introns in SF3B1 K700E mutant cells. Consistent with our model, we found splicing agents which lead to intron retention over express the high double-stranded force intronic repeats we predicted (Fig. 4D-E) , simultaneously supporting the evolutionary role of inverted SINE elements in guarding against intron retention and the potential ability to manipulate this feature using a cancer therapeutic targeting RNA splicing. Consistently, for inhibitors less associated with intron retention the effect was either weakened or not present (Extended Data Figs. 3&4). Finally, we annotated long dsRNA segments formed by bidirectional transcription, which have been implicated as potentially forming dsRNA due to their perfect complementarity 12 . To find plausible sources of regions that can be transcribed in both directions, we analyzed available transcription datasets from TCGA (Methods), finding multiple regions with long (over hundred base pairs) regions that are transcribed bidirectionally, indicating a possible source of antagonists (Extended Data Fig. 5 ). We found different inserts of MIR, Alus and LINE-1, i.e., some of the most abundant repeat families, to be the most represented among such transcripts. The respective loci for the top 1% highest bidirectional transcript counts, along with the number of reads transcribed from either the negative or positive strand, are listed in Supplementary Table 4 . We quantify the evolution of non-self, pathogen-associated patterns, based on competition between selective and entropic forces, within repeat families in the human genome. In doing so we find the high-copy satellite RNA HSATII is likely under selection to maintain its pathogen-associated CpG dinucleotide content and functional LINE-1 inserts maintain higher CpG content than expected. LINE-1 promoters are controlled at the DNA-level by CpG methylation, and it has an internal, bi-directional e a c ibed i h he 5 -UTR of the RNA 44, 45 , ensuring the promoter co-mobilizes with the protein coding regions. HSATII may have a DNA regulatory function as well, as its DNA sequences can sequester chromatin regulatory proteins and trigger epigenetic change 46 . However, at the RNAlevel, CpGs can function evolutionarily as a danger signal to maintain fitness of tissues under epigenetic stress for both LINE-1 and HSATII, whose immunostimulatory properties have been documented. Furthermore, we incorporate RNA secondary structure into evolutionary models and identify a reservoir of anomalous repeats with likely immunostimulatory dsRNAs. We attempt to exhaustively annotate regions where repeats evolutionarily maintain the ability to form long dsRNAs or present anomalous CpG motifs, providing an atlas for mapping transcriptomes of cells which exhibit stimulation of PRRs so one can identify the potential source of causal immunostimulatory selftranscripts. As strong validation of our approach, repeats predicted through evolutionary analysis to be dsRNA-forming were found to be MDA-5 agonists in a recently published MDA-5 protection assay that profiled ligands induced upon response to epigenetic cancer therapy by DNA demethylating agents 10 . The repeats that are induced by epigenetic therapy come from regions of the genome which may selectively maintain the ability to form dsRNA, implying the therapeutic condition mimics the evolutionary role of these RNA species to safeguard tissue homeostasis by killing dysregulated cells. Moreover, we find such repeats disproportionately arise within introns and can be disproportionately induced by intron retaining splice-inhibitors 43 , where they may be localized as a checkpoint against intron retention 18, 41, 42 . Furthermore, CpG sequences may make intronic repeats better targets for RNA-binding proteins without such insulation, repetitive elements within the introns of proteincoding genes could lead to deleterious RNA processing, which is ultimately relieved as the elements age by (presumably neutral) mutational decay 47, 48 . Our work therefore has several implications for how we understand self versus non-self discrimination. When one quantifies pathogen-associated features, specific repeats in the genome not only display PAMPs capable of stimulating PRRs but, in some instances, seemingly maintain such features under selection. For multicellular organisms with a high degree of epigenetic regulation and chromosomal organization, this offers an opportunity to maintain stimulatory features to release a danger signal when epigenetic control is lost, such as during the release of repeats after p53 mutations, where immunostimulatory repeats may offer a back-up for p53 functions such as senesence 12, 49 . Our work supports the hypothesis that repeats are selected to maintain -elf PAMPs to act as sensors for loss of heterochromatin as an epigenetic checkpoint of quality control system and avoid genome instability generally 17, 18 . With our framework one may learn how to identify which pathogen-associated features the genome maintains, which receptors they ligate, and, thereby, learn what pathways the genome has evolved to agonize and when. Specific genome repeats, such as HSATII and inverted SINE elements have been disproportionately implicated in the ability to stimulate non-self detection pathways and we predict that they are maintained under natural selection to do so. Each repeat likely engages a different receptor family. For CpG motifs the ZAP receptor and TLR7/8 have been implicated, and inverted SINE elements are likely detected by long dsRNA sensors such as MDA-5. Decoding viral mimicry by repeats using a combination of physically interpretable machine learning and predictive evolutionary models may therefore shed light on the function of genomic dark matter across disease indications, in a manner which may be further exploited therapeutically. For instance, it had been observed that early-stage melanoma may manipulate epigenetic regulators to suppress immunostimulatory repeat expression, and recent work has shown the possibility of targeting those proteins to reinvigorate the immune response 50, 51 . Furthermore, viruses and late-stage tumors may have learned to manipulate viral mimicry to their own advantage: Y-RNAs have been implicated in RIG-I sensing during RNA virus infection 52 and herpesviruses derive a fitness advantage from induction of HSATII, which is also often overexpressed in tumors 53 . The i lica i i ha e ca lea a e ea c de f elf-agonists within our genome held by selection to stimulate receptors under specific circumstances. We provide both an annotated atlas of predicted repeats under selection (Supplementary Tables) and software for building predictive models for this purpose. The lack of unbiased sequencing of repeats, which can easily be missed in RNA sequencing that focuses only on mRNA or in whole exome or short read whole genome DNA sequencing, is therefore a critical bottleneck. Once decoded we can better understand the evolution of these surprisingly non-self features encoded within families of repeats in our genome. Figure 1|Landscape of forces on CpG dinucleotides in the genome. A, Histogram of changes in the force on CpG motifs across all repetitive elements in the human genome. B, Change in the CpG force as a function of the force on the original (consensus) repeat insert over its evolutionary history. Each point represents a family of repetitive elements, along with a linear fit. All repeats whose consensus is above the mean force on CpG di cle ide ( 1.9) ha e dec ea ed hei C G c e . Alu repeats (green) and HSATII (red) are highlighted as exceptions to the general trend. C, The mean CpG force of all inserts in a repeat family as a function of the Kimura distance from the consensus sequence for each family. We only considered transcripts that come from genome regions of length 3000 that have double-stranded RNA force larger than 0.5. Only transcripts with one or more occurrences are shown. Sequence-specific sensing of nucleic acids Patterns of evolution and host gene mimicry in influenza and other RNA viruses Quantitative theory of entropic forces acting on constrained nucleotide sequences applied to viruses CG dinucleotide suppression enables antiviral defence targeting non-self RNA The evolutionary pathway to virulence of an RNA virus The heterogeneous landscape and early evolution of pathogenassociated CpG dinucleotides in SARS-CoV-2 Sensing of RNA viruses: a review of innate immune receptors involved in recognizing RNA virus invasion Aberrant overexpression of satellite repeats in pancreatic and other epithelial cancers Distinguishing the immunostimulatory properties of noncoding RNAs expressed in cancer cells Epigenetic therapy induces transcription of inverted SINEs and ADAR1 dependency DNA-demethylating agents target colorectal cancer cells by inducing viral mimicry by endogenous transcripts p53 cooperates with DNA methylation and a suicidal interferon response to maintain epigenetic silencing of repeats and noncoding RNAs Inhibiting DNA methylation causes an interferon response in cancer via dsRNA including endogenous retroviruses LSD1 Ablation stimulates anti-tumor immunity and enables checkpoint blockade L1 drives IFN in senescent cells and promotes age-associated inflammation Reverse-Transcriptase Inhibitors in the Aicardi Goutières Syndrome Reactivation of endogenous retroelements in cancer development and therapy Endogenous retroelements and the viral mimicry response in cancer therapy and cellular homeostasis An introduction to the mathematical structure of the Wright Fisher model of population genetics The expected equilibrium of the CpG dinucleotide in vertebrate genomes under a mutation model The Dfam database of repetitive DNA families The rate of hydrolytic deamination of 5-methylcytosine in double-stranded DNA DNA methylation and the frequency of CpG in animal DNA Modelling the ancestral sequence distribution and model frequencies in context-dependent models for primate non-coding sequences Reconstruction of ancestral nucleotide sequences and estimation of substitution frequencies in a star phylogeny Accurate estimation of substitution rates with neighbor-dependent models in a phylogenetic context DNA methylation in health and disease Charting a dynamic DNA methylation landscape of the human genome CpG islands and the regulation of transcription Initial sequencing and analysis of the human genome L1Base 2: more retrotransposition-active LINE-1s, more mammalian genomes Ide i ca i 189 A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences Landscape of somatic retrotransposition in human cancers Extensive transduction of nonrepetitive DNA mediated by L1 retrotransposition in cancer genomes Global cancer transcriptome quantifies repeat element polarization between immunotherapy responsive and T cell suppressive classes RIG-I-mediated antiviral responses to single-a ded RNA bea i g 5 -phosphates Structural basis for dsRNA recognition, filament formation, and antiviral signal activation by MDA5 Sizing up long non-coding RNAs: do lncRNAs have secondary and tertiary structure? LINE-1 ORF2p expression is nearly imperceptible in human cancers Spliceosome-targeted therapies trigger an antiviral immune response in triple-negative breast cancer Altered RNA splicing initiates the viral mimicry response from inverted SINEs following type I PRMT inhibition in Triple-Negative Breast Cancer H3B-8800, an orally available small-molecule splicing modulator, induces lethality in spliceosome-mutant cancers Identification of critical CpG sites for repression of L1 transcription by DNA methylation Antisense promoter of human L1 retrotransposon drives transcription of adjacent cellular genes Demethylated HSATII DNA and HSATII RNA foci sequester PRC1 and MeCP2 into cancer-specific nuclear bodies Heteromeric RNP assembly at LINEs controls lineage-specific RNA processing RNA Binding proteins as regulators of retrotransposon-Induced exonization The maintenance of epigenetic states by p53: the guardian of the epigenome Transcriptional dissection of melanoma identifies a high-risk subtype underlying TP53 family genes and epigenome deregulation KDM5B promotes immune evasion by recruiting SETDB1 to silence retroelements Y-RNAs lead an endogenous program of RIG-I agonism mobilized upon RNA virus infection and targeted by HIV A tumor-specific endogenous repetitive element is induced by herpesviruses Data Availability Statement: Original data will be made available upon reasonable request.