key: cord-0068716-t7qry794
authors: Kolesnik, Matvey V.; Fedorova, Iana; Karneyeva, Karyna A.; Artamonova, Daria N.; Severinov, Konstantin V.
title: Type III CRISPR-Cas Systems: Deciphering the Most Complex Prokaryotic Immune System
date: 2021-10-20
journal: Biochemistry (Mosc)
DOI: 10.1134/s0006297921100114
sha: 1417790b9a85553a11b051f9e25c31c1a6aecd4a
doc_id: 68716
cord_uid: t7qry794

The emergence and persistence of selfish genetic elements is an intrinsic feature of all living systems. Cellular organisms have evolved a plethora of elaborate defense systems that limit the spread of such genetic parasites. CRISPR-Cas are RNA-guided defense systems used by prokaryotes to recognize and destroy foreign nucleic acids. These systems acquire and store fragments of foreign nucleic acids and utilize the stored sequences as guides to recognize and destroy genetic invaders. CRISPR-Cas systems have been extensively studied, as some of them are used in various genome editing technologies. Although Type III CRISPR-Cas systems are among the most common CRISPR-Cas systems, they are also some of the least investigated ones, mostly due to the complexity of their action compared to other CRISPR-Cas system types. Type III effector complexes specifically recognize and cleave RNA molecules. The recognition of the target RNA activates the effector large subunit – the so-called CRISPR polymerase – which cleaves DNA and produces small cyclic oligonucleotides that act as signaling molecules to activate auxiliary effectors, notably non-specific RNases. In this review, we provide a historical overview of the sometimes meandering pathway of the Type III CRISPR research. We also review the current data on the structures and activities of Type III CRISPR-Cas systems components, their biological roles, and evolutionary history. Finally, using structural modeling with AlphaFold2, we show that the archaeal HRAMP signature protein, which heretofore has had no assigned function, is a degenerate relative of Type III CRISPR-Cas signature protein Cas10, suggesting that HRAMP systems have descended from Type III CRISPR-Cas systems or their ancestors.

All known life forms on Earth share a universal fea ture -almost all their biological traits are encoded in the nucleic acid sequences that are replicated through a tem plate based principle. While changes in nucleic acid sequence may have a great influence on biological traits (e.g., protein structures it encodes), these changes usual ly produce little effect on the chemical and physical prop erties of nucleic acids themselves, allowing the existence of nucleic acids with every possible sequence. Since an increase in the replication fidelity could be achieved only at a price of increased energy consumption [1] , replica tion of nucleic acids is inevitably error prone. This neces sarily implies an existence of genetic heredity with the variation of biological traits encoded by nucleic acids, which, in turn, provides necessary factors for the evolu tion by natural selection and/or genetic drift. BIOCHEMISTRY (Moscow) Vol. 86 No. 10 2021 Any community of evolving self replicating living systems inevitably gives rise to genetic parasites and cor responding defense systems [2] , so the eternal arms race between selfish elements and their hosts apparently goes back to the origin of life. This competition leads to the development of the anti parasite defense systems that tar get different mechanisms involved in the parasite's life cycle. In their turn, parasites evolve to dodge the defense systems. The hosts are apparently incapable to complete ly get rid of genetic parasites, since this entails reduction of the horizontal gene transfer essential for the long term genome stability and evolution [2] . The evolution of defense systems sometimes goes through very peculiar ways, including the shuffling of components of different defense systems and, most strikingly, adoption of compo nents of genetic parasites themselves for the host defense. From this point of view, CRISPR Cas systems are espe cially fascinating, as they appear to be bizarre mosaics of "tamed" transposons, toxin-antitoxin systems, and other components of unclear origin. In this review, we describe the structural and mechanistic features of Type III CRISPR Cas, probably, the most complex defense sys tem of prokaryotes.

The diversity of CRISPR Cas loci and the main fea tures of the adaptive immunity mediated by different CRISPR Cas systems have been already discussed in dozens of reviews (see, for example, [3, 4] ). Most CRISPR Cas loci include CRISPR arrays that consist of two or more repeats separated by unique spacers and adjacent clusters of cas genes. The CRISPR Cas immune response can be divided into three stages: (a) adaptation, (b) expression, and (c) interference (as shown for Type III CRISPR Cas systems in Fig. 1 ). At the adaptation stage, short fragments of DNA are inserted into a CRISPR array, forming a new spacer. Adaptation is mediated by the Cas1 Cas2 integration complex. Although this com plex is highly conserved, the exact mechanism of spacer acquisition depends on the CRISPR Cas system type. At the expression stage, CRISPR arrays are transcribed into pre CRISPR RNA molecules that are further processed into mature small CRISPR RNAs (crRNAs). The pro cessing stage proceeds via mechanisms depending on the Fig. 1 . Type III adaptive immunity. a) Adaptation: insertion of small fragments of invader derived DNA into the host CRISPR array with the formation of a new spacer repeat unit. In some systems, the spacers can be acquired from RNA through the activity of the RT domain fused to Cas1. b) Expression: maturation of crRNAs and assembly of effector complexes. c) Interference: triggering of the immune response by spe cific recognition of foreign RNA. BIOCHEMISTRY (Moscow) Vol. 86 No. 10 2021 CRISPR Cas system type. At the interference stage, crRNAs are incorporated into the Cas proteins effector complexes and used as guides for the recognition of for eign nucleic acid sequences (protospacers) that are then destroyed by Cas nucleases. CRISPR Cas systems have been found in ∼90% sequenced archaeal and ∼40% eubacterial genomes [5] . Based on the effector complex composition, CRISPR Cas systems can be divided into two classes. Class 1 includes systems with the multisub unit effector complexes; Class 2 effectors are single mul tidomain proteins. CRISPR Cas systems were further subdivided into 6 types and multiple subtypes based on the composition and organization of cas loci: types I, III and IV belong to class 1, and types II, V and VI belong to class 2 [4] . In this review, we focus only on the Type III systems.

Type III CRISPR Cas systems are widespread in both archaeal and bacterial genomes (34 and 25% of all CRISPR cas loci, correspondingly) [5] . The genes cur rently known as Type III cas genes were first identified during the search for conserved gene clusters in the known genomes of hyperthermophilic archaea. At that time, it was hypothesized that these genes belong to pre viously unknown DNA repair systems [6] . Among the genes present in such clusters, Makarova et al. identified a subset encoding large conserved multidomain proteins. These proteins were shown to contain a domain similar to the Palm domain, a component of various enzymes, such as A, B and Y superfamilies of DNA dependent DNA polymerases, viral RNA dependent RNA polymerases, DNA dependent RNA polymerases of some viruses and mitochondria, reverse transcriptases, and a large group of cyclases and nucleotidyltransferases [7, 8] . On these grounds, these large conserved multidomain proteins were predicted to be polymerases/cyclases. Several fami lies of other protein coding genes were found to be asso ciated with the predicted polymerase/cyclase genes, but their roles remained unclear. Believing that the discov ered loci belonged to a new DNA repair system, Makarova et al. named these proteins RAMPs (Repair Associated Mysterious Proteins). Besides the poly merase/cyclase RAMP encoding loci, Makarova et al. identified another kind of conserved gene clusters that were later recognized as Type I CRISPR Cas systems. However, the linkage of both kinds of clusters to CRISPR arrays was missed at the time. Few years later, Haft et al. performed an extensive search and classification of pro tein coding genes located in the vicinity of CRISPR arrays and delineated the organization of these genes in specific loci [9] . Among the currently recognized CRISPR Cas types, Haft et al. also described Csm (CRISPR Cas Subtype Mycobacterium tuberculosis) and

Cmr (CRISPR Cas Module RAMP). Some products of the csm and cmr genes were found to be homologous to each other. Both the csm and cmr loci contain genes for the Palm domain proteins (csm1 and cmr2) and encode at least two homologous RAMP proteins (csm3 and cmr4). The csm and cmr loci were later designated as Type III A and Type III B CRISPR Cas systems [10] . Haft et al. noticed that the cmr loci never occur as the only CRISPR Cas system in prokaryotic genomes. Consistently, it was shown that Type III B CRISPR Cas systems often lack the adaptation module and therefore must rely on the spacer acquisition machinery of other CRISPR Cas systems [5] .

Type III CRISPR Cas systems from different organ isms were experimentally studied by several independent scientific groups; however, the results of these studies were rather puzzling. Based on the data of in vivo experi ments, Marraffini et al. characterized a Type III A CRISPR Cas system from Staphylococcus epidermidis as a DNA targeting system [11] . On the other hand, Type III effector complexes were shown to specifically recognize and cleave RNA targets in vitro [12, 13] . Although we now know that conclusions made based on the in vivo results were incorrect, this work is worth to be discussed in detail, since it is important not only in terms of CRISPR Cas research but also from a methodological point of view. In their experimental system, Marraffini et al. used a con jugative plasmid and two strains of S. epidermidis. One of these strains lacked CRISPR Cas systems, while another one harbored a Type III A system with a spacer matching the sequence of the plasmid borne nes gene encoding a nickase (a component of conjugational machinery). It was shown that the Type III A system suppressed the conjugal transfer of the nes harboring plasmids. Since the spacer matched the coding strand of the nes gene (i.e., the resulting crRNA did not recognize the sense nes tran scripts), it was expected that the system would target DNA. Furthermore, the fact that expression of nickase is needed only in the donor strain to initiate the DNA trans fer, but not in the recipient strain to maintain the plas mid, also supported the targeting of DNA. The authors considered that the CRISPR array may be transcribed from both strands, thus producing crRNAs targeting the nes transcript, yet no anti sense CRISPR transcripts were found. Finally, the authors performed an ingenious experiment, disrupting the nes protospacer sequence with a self splicing intron and showing that such plasmid evaded Type III A mediated immunity, providing the strongest argument in favor of DNA targeting. However, they did not consider the possibilities that (i) there is a considerable level of anti sense nes transcription and/or (ii) Type III immunity is triggered by the recognition of nascent RNA by the effectors. Few years later, the same group established that plasmid protospacers are tran scribed in both directions and proved the in vivo specifici ty of Type III A systems towards RNA [14] . BIOCHEMISTRY Despite primary targeting of RNA, Type III CRISPR Cas systems protect cells from viruses with DNA genomes and interfere with plasmid transforma tion, as long as viral or plasmid DNA is transcribed with the production of RNA molecules complementary to the crRNA spacers. Transcripts complementary to the pro tective crRNAs do not have to be essential for the viral or plasmid life cycle/maintenance [14, 15] . The first insights explaining this puzzling observation were obtained from the in vitro characterization of the activities of the Csm and Cmr effector complexes. In addition to the Palm domains, the large subunits of Type III effector complex es (Csm1 and Cmr2 proteins, further designated as Cas10) also contain the HD domains (named after con served histidine and aspartate residues) [5] . The HD domain proteins are also associated with other CRISPR Cas systems. For example, in Type I systems, the Cas3 protein destroys target DNA due to the single strand DNase activity of its HD domain [16] . The binding of target RNA activates the single strand DNase activity of Cas10 HD domain in both Type III A [17] and Type III B [18, 19] effector complexes. Therefore, a model of the co transcriptional DNA cleavage was proposed to explain the mechanism of Type III immunity. According to this model, when a Type III effector recognizes a nascent transcript, the HD nuclease domain of the Cas10 subunit is activated and cleaves the single stranded DNA within the transcription bubble ( Fig. 2a ) [17] . Some support for this model was provided by the in vitro experiments of the Marraffini group [20] , although later these results were put to doubt [21] . Be that as it may, mutations of the cat alytic residues in the Cas10 nuclease domain do not affect the interference with the plasmid transformation. In con trast, mutations of the catalytic residues of the Cas10 Palm domain significantly attenuated the Type III anti plasmid immunity [22] . These results clearly imply that the Type III immune response cannot be reduced to the co transcriptional DNA cleavage only. CRISPR Cas loci often contain genes coding for proteins that are not directly involved in the spacer acqui sition, crRNA maturation, or formation of effector com plexes. The role of most of these genes (usually referred to as auxiliary) is still poorly understood. Among such genes, there is a family coding for proteins with a specific variant of the Rossmann fold domain (the so called CRISPR Cas associated Rossmann Fold, or CARF). In b) cOAs activate the auxiliary nucleases that target DNA or RNA molecules. The activity of the auxiliary effectors is regulated through the degra dation of cOAs by ring nucleases or, in some cases, by the auxiliary proteins. c) The avoidance of self targeting in Type III CRISPR Cas sys tems: the complementarity between the target and repeat derived 5′ tag of crRNA prevents the activation of both the HD and the Palm domains of the Cas10 subunit. these proteins, CARF domains are frequently linked to various domains with predicted RNase, DNase or DNA binding activities. The Rossmann fold is a common motif in nucleotide binding proteins; it was suggested that CARF domain proteins participate in the CRISPR Cas mediated immune response by sensing some nucleotide ligands with subsequent activation of their effector nucle ase domains [23] . The disruption of the csx1 and csm6 genes, which encode CARF domain proteins and are fre quently associated with the Type III cas operons, greatly hinders the ability of Type III CRISPR Cas systems to interfere with the plasmid transformation [15, 22] , pro viding an additional layer of complexity to the Type III mediated immunity.

In the Csx1 and Csm6 proteins, the CARF domains are fused with the HEPN (Higher Eukaryotes and Prokaryotes Nucleotide binding) domains [23] . Proteins with the HEPN domains exhibit the RNase activity and are frequently involved in various defense systems [24] . Some Csx1 and Csm6 proteins cleave single stranded RNA in vitro through their HEPN domains [25, 26] . Yet, the observed RNase activities are relatively weak, suggest ing that the activity of these proteins in the Type III medi ated immune response in vivo may be somehow upregu lated [25, 26] . This hypothesis was confirmed by the experiments showing that upon recognition of the target RNA, Type III effector complexes convert ATP mole cules to a range of cyclic oligoadenylates (cOAs) employ ing the polymerization activity of the Cas10 Palm domains [27, 28] . These cOAs function as secondary mes sengers triggering the non specific RNase activity of the CARF HEPN proteins (Fig. 2 , a and b) [27 29 ]. The exact role of the non specific RNase activity of the Csx1/Csm6 proteins is still rather speculative. Inhibition of viral transcription, cell dormancy, or even death were proposed as possible outcomes of the non specific RNA degradation by the activated Csx1/Csm6 [30, 31] . In the latter case, one can envision that inhibition of infection by DNA viruses detected by standard plaque assays, can be achieved even without the activation of the DNase activity of Type III effectors. This scenario, however, is inconsistent with the observations that cells mounting Type III interference against lytic viruses clear the infec tion and survive [32] .

The fact that individual cells mounting the Type III interference remain viable implies that the cOA mediated activation of cellular RNases must be transient. There are two obvious ways to control the activity of the Csm6/Csx1 nucleases -regulation of the cOAs synthesis or degrada tion of cOAs. Both mechanisms have been experimental ly confirmed. First, the cOA synthesizing activity of Cas10, which is activated upon the binding of target RNA, is abolished upon the target cleavage [33, 34] . Second, several cOA degrading nucleases have been characterized. In some organisms, cOAs are degraded by dedicated ring nucleases [35] ; other organisms contain CARF HEPN proteins capable of degrading cOAs that activate them [36, 37] . Interestingly, a highly efficient ring nuclease encoded by a virus infecting Sulfolobus was shown to counter the Type III CRISPR Cas immunity of the host [38] .

Strikingly, the cOA dependent arm of prokaryotic immunity is similar to one of the pathways of mammalian innate immunity. In the latter case, the presence of dou ble stranded RNAs in the cytoplasm activates oligoad enylate synthetase (OAS) that converts ATP to 2′ 5′ oligoadenylates, which, in turn, activate RNase L. RNase L non specifically degrades RNA in the cytoplasm [39] . This resemblance turns out to be even more exciting con sidering that the catalytic core of OAS shares similarity with the Palm domain [40] , while the activity of RNase L relies on a distinct variant of the HEPN domain [24] . Recently, the OAS RNase L pathway has drawn a partic ular attention, since it was discovered that nucleotide polymorphisms in the locus encoding OAS genes are associated with the COVID 19 induced respiratory fail ure, suggesting that this pathway is important in the immune response against SARS Cov 2 [41] .

While the role of the cOAs pathway in Type III CRISPR Cas immunity is relatively clear, the significance of the single strand DNase activity of the Cas10 HD domain remains obscure. Originally, it was shown that mutations of the Cas10 HD domain catalytic residues have no effect on the Type III A mediated interference with the plasmid conjugation in S. epidermidis cells [22] (in this work, the interference was registered as a decreased number of transconjugant colonies). However, it was discovered later that unlike the wild type system, the Type III A system with inactivated Cas10 HD domain did not prevent the formation of transconjugant colonies harboring the targeted plasmid, but rather severely retard ed their growth, so that the colonies became visible only after a prolonged incubation [31] . Interestingly, when both the HD and the Palm domains were inactivated, no inter ference was observed and the number, as well as the appearance of the targeting and non targeting transconju gant colonies were the same [31] . Thus, it appears that both Cas10 domains are needed for the full interference. In fact, the requirement for the cOA dependent arm of the Type III A interference may depend on the RNA target abundance. When the transcript recognized by the Type III A effector is abundant, the activity of the HD domain alone suffices for the full interference; when the target abundance is low, the activities of both the HD and the Palm domain become essential. It is possible that the Type III interference is kinetically controlled: cOA dependent non specific RNase inhibits propagation of targeted genetic elements, thus giving sufficient time for their degradation due to the slower activity of the HD domain [31] . This scenario seemingly requires the co transcriptional DNA cleavage. However, despite the appeal of the co transcriptional DNA cleavage model, 10 2021 there is little experimental support for it. Co transcrip tional DNA cleavage was observed only in the in vitro experiments described in a single paper [20] , in which the double stranded transcription templates were created by combining a complementary template and the non tem plate strand DNA oligos. However, further studies showed that what has been considered as the co tran scriptional DNA cleavage in these experiments could be observed only in the presence of excess of the "targeted", non template strand DNA oligonucleotides over the non targeted ones, whose sequence was complementary to the RNA transcript [21] . The results suggest that in the experiments of Samai et al. [20] , Type III effectors cleaved free single stranded DNA rather than the DNA within the transcription bubble [21] . As an alternative hypothesis explaining the ability of Type III systems to protect from DNA invaders while leaving alive the cells mounting the interference, it was suggested that the HD domain is specific toward the single stranded DNA, including replication intermediates of some viruses and plasmids [21] . In summary, while the importance of the Cas10 HD nuclease activity for the Type III interference can be considered as established, the mechanism of its involvement remains uncertain due to the lack of knowl edge on its in vivo targets. Recently, it was observed that the Cas10 HD nuclease activity increases the rate of mutagenesis in the cells harboring Type III CRISPR Cas systems, implying that Cas10 HD could non specifically target cellular DNA [42] .

While the above results suggest that the Type III mediated interference against DNA invaders requires orchestrated activities of the Cas10 HD and Palm domains along with cOA activated RNases, confusingly, many Type III systems lack some of these apparently essential components. This implies that such "incom plete" systems rely on/recruit other mechanisms to com pensate for the missing components or may function in a completely different way.

An example of such apparently incomplete system is the Type III B CRISPR Cas system of Thermus ther mophilus, whose Cas10 protein lacks the HD domain [43] . However, since Cas10 possesses an intact Palm domain, it is still capable of activating auxiliary effectors via the cOAs pathway [44] . The Type III B locus of T. thermophilus encodes a cOA activated DNA nick ase, which might be responsible for the plasmid degrada tion in the absence of the Cas10 HD DNase activity [45] .

In Type III F CRISPR Cas systems, the Palm domain of Cas10 is predicted to be inactive due to muta tions in the catalytic site. Indeed, the Type III F loci lack genes coding for the CARF domain proteins [4] . One can speculate that the Type III F systems provide at least par tial protection against mobile genetic elements by relying on the activity of the Cas10 HD domain only. Consistent with this view, it has been shown that some "complete" Type III CRISPR Cas systems protect cells from viral infections even when the genes for the auxiliary CARF domain RNases are deleted from the host genome [30] .

Type III E systems are even more peculiar, as they completely lack the genes coding for the Cas10 subunits. What is even more curious, in these systems, several Cas7 subunits that form the multisubunit crRNA binding fila ment in other Type III systems are fused, forming a single large multidomain protein. Type III E loci lack genes coding for the CARF domain proteins; but some of them encode putative caspase like auxiliary effectors [4] . The action mechanisms of these enigmatic systems are still obscure and await future investigations.

Bioinformatically predicted HRAMP systems of Halobacteria provide yet another example of "incom plete" Type III related loci. HRAMP (named after Halobacterial RAMP) systems lack CRISPR arrays and consist of the so called HRAMP signature gene, cas7, and cas5 genes, and are often associated with the nucle ases containing DEDDy and HNH domains. The func tion of the HRAMP systems is unknown, although it was proposed that they are dsDNA targeting immune sys tems. The HRAMP signature protein does not display any sequence similarity with known protein families, and consequently is considered to be a protein with an unknown function [46] . The recently developed protein structure prediction tool AlphaFold2 allowed us to detect a high structural similarity between the HRAMP signa ture proteins and Csm1 (modelling was done using ColabFold AlphaFold2_mmseqs2 with default parame ters); and the structure homology search was performed using the Dali server [47 49] ; the Dali Z score for the model of the HRAMP signature WP_013440547.1 from Halogeometricum borinquense and the Csm1 from Streptococcus thermophilus (PDB 6IFR A) is 10.6. It becomes apparent from the model that the HRAMP sig nature protein possesses an HD domain that is highly similar to the HD domain of Csm1 with the characteris tic His Asp active site (Fig. 3) . The HRAMP signature protein is two times smaller than Csm1 and lacks domains responsible for the interaction with Csm2 and Csm4 (proteins present in Type III A CRISPR Cas systems and absent from the HRAMP systems). The Palm domain of the HRAMP signature protein is also significantly reduced compared to Csm1 and is likely non functional. The inability of the HRAMP signature protein to perform cyclic oligonucleotide synthesis is supported by the absence of the HRAMP associated CARF coding genes. Yet, the final judgement shall await the results of experi mental testing of the HRAMP signature proteins cyclase activity.

The relatedness between the HRAMP and Type III CRISPR Cas systems was first suggested based on a dis tant similarity of the HRAMP Cas7 proteins with the Type III Cas7 proteins [46] . The structural homology between the HRAMP signature proteins and Type III Cas10 proteins provides further support for the related ness of these systems. If HRAMP systems had evolved from complete Type III systems, CRISPR arrays and adaptation modules must have been lost during their evo lution. However, one could speculate that HRAMP sys tems have originated from ancestral Type III systems that had existed before the emergence of the CRISPR arrays and adaptation modules. In the latter case, the HRAMP systems may be molecular relics that could shed light on the functions of ancient prototypical Type III systems.

Although the most common auxiliary effectors in Type III systems are cOA activated CARF domain RNases of the HEPN family [4] , in some cases, Type III systems employ other cOA sensing effectors. One such unusual Type III associated auxiliary effector was recent ly characterized. This cOA activated Card1 protein from the Type III A locus of Treponema succinifaciens, exhibits nuclease activity towards both single stranded DNA and RNA in vitro. However, Card1 activation in S. aureus cells did not produce detectable changes in the transcrip tome [50] , suggesting that its RNase activity is either inhibited in vivo or is highly specific and does not cause massive transcript degradation. Expression of Card1 compensates for the lack of the csm6 gene in the Type III A system of S. epidermidis and restores the ability of the cells to resist phage infection. However, strains with cells harboring Cas10 gene with mutated catalytic residues of the HD domain could not clear the infection by a phage even in the presence of Card1 [50] , suggesting that its sin gle strand DNase activity is not sufficient for interfer ence. The clearance of the targeted plasmid from the cells required the activities of both Cas10 HD and Card1, which suggests that the Cas10 HD nuclease activity is specific towards the protospacer containing DNA [50] .

Spacer acquisition machinery of almost all CRISPR Cas systems, including Type III, employs a complex composed of conserved Cas1 and Cas2 proteins (Fig. 1a) [4] . The Cas1 Cas2 integration complexes are able to capture short DNA fragments with further inser tion into the CRISPR arrays [51] . The origin of spacer integration intermediates is still rather obscure; however, it was shown for the Type I and II systems that they can be produced via DNA degradation by the RecBCD or AddAB complexes [52, 53] or, in case of so called primed adaptation in the Type I systems, by the processive Cas3 nuclease [54] . Since the Type III immune response against plasmids and viruses with DNA genomes requires transcription of the targeted protospacer, the mechanisms that allow preferential acquisition of spacers targeting transcriptionally active sites of plasmids or viral genomes should be beneficial for the cell. Indeed, some Type III CRISPR Cas loci encode genes for the Cas1 proteins fused with the reverse transcriptase (RT) domains, sug gesting that spacers could be acquired from RNA mole cules (Fig. 1a) [55, 56] . Indeed, at conditions of overex pression of the cas1::RT cas2 adaptation modules, acqui sition of spacers derived from RNA, particularly from abundant cellular transcripts was demonstrated [57, 58] . Yet, the spacers were acquired in both orientations with an equal efficiency, which suggests that only half of them would be functional in interference.

The mechanisms responsible for the acquisition of RNA derived spacers are not fully understood. In vitro, the Cas1::RT Cas2 complex from Marinomonas mediter ranea ligated the 3′ ends of RNA molecules to the 5′ ends of repeats in DNA molecules containing a cognate CRISPR array. The resulting RNA DNA junction inter mediates, once formed, served as substrates for reverse transcription [57] . Interestingly, the RNA molecules could be ligated to either side of the first repeat of the array [57] , which is consistent with the observed lack of orientation bias in the acquired spacers. On the other hand, the Cas1::RT Cas2 complex from Thiomicrospira ligated RNA molecules to the CRISPR containing DNA fragments in vitro but was unable to convert such inter mediates to extended CRISPR arrays, suggesting that in vivo such conversion either requires additional factors, or that the cDNA synthesis must occur before the inte gration of prespacers into the CRISPR array, which should proceed in both orientations to explain the lack of spacer orientation bias [59] .

The presence of the RT domain appears to be a derived feature since the Cas1 part of the Cas1 RT fusion is able to integrate spacers from DNA only [57] . Moreover, most Type III loci encode standard Cas1 pro teins [57] without RT domains. In such systems, the adaptation machinery should be directed to DNA and, thus, be indifferent to the protospacer transcription, unless these systems employ other, yet to be described mechanisms allowing to acquire spacers from transcrip tionally active sites. The adaptation module in T. ther mophilus has a standard cas1 cas2 configuration and no RT domain proteins are encoded in the genome. Yet, an extreme bias in the spacers acquired in the course of infection by a lytic phage was observed: only cells that expressed crRNAs targeting viral transcripts were detect ed in the infected cultures. However, this bias was due to the purifying selection for protective crRNAs and not caused by the intrinsic biases of the adaptation machin ery [32] .

CRISPR Cas effector complexes recognize nucleic acids that are complementary to the spacer part of the bound crRNAs. Since crRNAs are derived from CRISPR arrays, the effectors might target genomic DNA (or, in the case of RNA targeting systems, the "anti sense" CRISPR array transcripts). Such targeting shall be detrimental and thus must be avoided. The immune response mediated by the DNA targeting CRISPR Cas systems (Types I, II, and V) requires protospacer adja cent motifs (PAMs) -short (few nucleotides) degenerate sequences that are located in the target DNA near the protospacer but are not present near the spacers in the CRISPR arrays [60] . The effector complexes scan the double stranded DNA for the PAM sequences, and, upon PAM recognition, initiate the melting of the DNA duplex followed by the formation of the R loop complex with the complementary targets [ 61 63 ]. In the Type III systems, self targeting is avoided via a completely differ ent mechanism. While Type III effectors recognize and cleave any RNA molecule complementary to the crRNA spacer part, activation of the HD and Palm catalytic domains of the Cas10 subunit requires the absence of complementarity between the target and repeat derived 5′ tag of crRNA (Fig. 2c) [17, 64, 65] . Strikingly, a sim ilar "tag antitag" principle of "self versus non self" dis crimination is employed by the Type VI effec tors [66, 67] . While being evolutionary and structurally unrelated to each other, both Type III and Type VI effec tors target RNA. The recognition of the target triggers the non specific nuclease activity [68, 69] .

Why do DNA targeting CRISPR Cas systems require PAMs, whereas RNA targeting systems rely on different mechanisms? In principle, the tag antitag strat egy should be suitable for discriminative DNA targeting (in fact, the tag antitag strategy of Type III systems was elaborated when these systems had been believed to target DNA [64] ). Yet, only the PAM dependent mechanisms have evolved, convergently, in the DNA targeting systems of Types I, II, and V [70] . To complicate the matters, it was suggested that Type III systems are ancestral to all Class 1 systems [70] , implying that derived Type I systems have switched from the initial tag antitag to the PAM dependent self avoidance.

The apparent preference for the PAM dependent mechanisms in the DNA targeting systems may be explained by the kinetics of target recognition by the effectors. The nitrogenous bases of nucleotides in the sin gle stranded RNA molecules are exposed, allowing direct interaction with the effector bound crRNA. In contrast, in double stranded DNA, complementary interactions with crRNA are impossible, at least initially, and the recognition requires local melting of the DNA duplex, which must dramatically slow down the process of target BIOCHEMISTRY (Moscow) Vol. 86 No. 10 2021 recognition. The binding energy of the interaction between the effector protein components and the double stranded PAM sequence may provide the driving force for the initial target melting. In addition, the requirement for a PAM limits the number of available targets by at least an order of magnitude, decreasing the time needed to locate a matching protospacer. One can thus speculate that the primary reason for the use of PAMs is not so much to ensure the self versus non self discrimination, but to accelerate the search for matching protospacers in the double stranded DNA fast enough to provide immunity.

The immune response mediated by Type III CRISPR Cas relies on an interplay between several com plex mechanisms, including acquisition of new spacers, maturation of crRNA, assembly of effector complexes, target recognition, synthesis of signaling compounds, activation of auxiliary effectors, and regulation of all aforementioned processes. Such baroque complexity inevitably raises a question about the origin and evolution of Type III systems.

The first glimpses into this puzzle came from the analysis of cas genes decoupled from the CRISPR Cas systems. Some of the "solo" cas1 genes found outside of the CRISPR Cas loci [71] were shown to belong to a new family of mobile genetic elements, named casposons, which employ Cas1 proteins for the integration/exci sion [72, 73] . The mechanism of casposon integration clearly resembles the mechanism of insertion of new spacers into the CRISPR arrays, suggesting that CRISPR associated Cas1 have originated from a caspo son integrase, while CRISPR repeats could be derived from the terminal repeats flanking the casposons [70, 73] . Along with the Cas1 proteins, casposons also encode the Cas4 like proteins [72] , which are components of the spacer acquisition machinery in some CRISPR Cas sys tems [51] .

Another important component of the spacer integra tion complex is Cas2, a protein homologous to the VapD family nuclease toxins and thus presumably derived from corresponding toxin-antitoxin systems [74] . Although in many cases, Cas2 proteins retain nuclease activity, it is not essential for the spacer integration [75] , in which Cas2 plays the structural functions by tethering the Cas1 dimers.

The origins of the effector complexes are less clear. Here, we focus only on a putative evolutionary history of Class 1 CRISPR Cas systems, which includes Types I, III, and IV. These multisubunit complexes have similar architectures and share several homologous key subunits, suggesting their common origin (Fig. 4) . The backbone of the Type I and Type III effectors consists of a crRNA molecule covered by several Cas7 monomers in a complex with small Cas11 subunits. The 5′ end of crRNA is bound to a Cas5 family protein. The large subunits (Cas8 in Type I and Cas10 in Type III) are located in the vicinity of the 5′ end of crRNA [76, 77] . Type I and Type III sys tems also share a common mechanism of crRNA matura tion via the action of Cas6 proteins that recognize the stem loop structures formed by the repeat sequences in pre crRNA [78] . Although less is known about the Type IV effectors, it is clear that their backbone is also formed by several Cas7 subunits bound to crRNA [79] ; Cas5 and Cas6 homologs are also encoded within the Type IV loci [4] . The key components of Type I and Type III effectors, such as Cas10, Cas5, Cas7, and Cas6, share structural similarity and contain domains with the RRM fold [71, 80] . In addition, Type I Cas11 proteins share structural similarity with the C terminal domain of Cas10 from the Type III systems [71] .

Based on these observations, Koonin and Makarova suggested a scenario of the origin of Class 1 CRISPR Cas systems (Fig. 5 ). According to this scenario, despite their enormous complexity, Type III systems appear to be ancestral to all known Class 1 systems [4, 70, 71] . It is envisioned that the origin of Class 1 effectors goes back to a putative signaling system that included a Cas10 like polymerase and a CARF HEPN effector protein that could produce nucleotide based secondary messengers (likely, cOAs) in response to stress/environmental signals, followed by the activation of the RNase activity of the CARF HEPN protein [4] . Indeed, loci encoding Cas10 like polymerases fused with the CARF HEPN domains were detected through bioinformatic analysis [80] , how ever, these systems are not yet characterized. The dupli cation of the ancestral Cas10 like protein gene followed by the fragmentation could have given rise to the Cas7 like and Cas11 like subunits. Genes encoding Cas5 and Cas6 could have originated through a fusion of two Cas7 like genes, since all these genes share a specific structural motif missing from the Cas10 proteins [70] . The acquisi tion of the adaptation module components from cas posons and toxin-antitoxin systems finally could have given rise to a functional adaptive immune system [70] .

We can only speculate about the original functions of ancestral Cas10 like CARF HEPN signaling systems and stimuli that activated the polymerase/cyclase activity of the Cas10 ancestors. Likewise, the functions of the prototypical Type III systems that must have existed before the acquisition of the adaptation module are unknown. However, distinct variants of Class 1 systems that are not linked to the CRISPR arrays are known [46] . Experimental characterization of these systems may shed light on the functions of ancient Class 1 effectors. Interestingly, it was shown that Type IV effector complex es heterologously expressed in Escherichia coli preferen tially associate with small RNAs transcribed from plas mids despite the presence of a transcribed cognate CRISPR array and functional crRNA maturation com ponents [81] . One could envision that ancient prototypi cal Type III effectors could also bind RNAs derived from mobile genetic elements, employing them as guides for the target recognition in the absence of CRISPR arrays and functioning as a primitive and inefficient immunity system similar to prokaryotic Argonaute proteins [82, 83] . Additionally, the Palm domain of Cas10 shares similarity with the catalytic core of the Thg1 enzyme, an unusual 3′ 5′ RNA polymerase essential for the maturation of his tidine tRNA [8] . Given this fact and assuming that the RNA binding Cas7 proteins have originated from the Cas10 RRM domain, one could propose that the ancestor of the present day Cas10 proteins possessed RNA bind ing and RNA polymerase activities and was a part of sys tems involved in the synthesis, repair, and/or maturation of RNA molecules. Next, the RNA polymerase activity could have specialized/degenerated for the production of signaling compounds. The ligands for the ancestor Cas10 polymerase are unknown, but given the fact that the ancient Cas10 might have possessed the RNA binding activity, it is possible that such ligands were RNA mole cules. While it is hard to envision which RNA molecules activated the immune response in such system, it is note worthy that many transposable elements, including some casposons, use tRNA genes as targets for their integra tion [72, 84] . Since Type IV effectors bind small RNAs, including tRNAs [85] , one can speculate that before the acquisition of adaptation modules, ancestral Type III sys tems had recognized the transcripts produced from tRNA genes corrupted by the insertions of mobile genetic ele ments.

Defense systems that employ sensors producing nucleotide based messengers to activate the downstream effectors are exceedingly diverse and widespread across all cellular organisms. Beside the described cOA pathway in Type III systems and eukaryotic OAS, one could mention a large family of cGAS/DncV like nucleotidyltransferas es. In animals, cyclic GMP AMP synthase (cGAS) pro duces cyclic GMP AMP in response to the cytosolic DNA. Cyclic GMP AMP activates the pathway leading to the upregulation of numerous immune response genes [86] . Multiple cGAS homologs have been identified in the genomes of prokaryotes; many of them are associ ated with the genes for various effectors, including nucle ases, phospholipases and transmembrane proteins that comprise so called cyclic oligonucleotide based antiphage signaling systems (CBASS) [87 90 ]. Since sig nal transduction between the sensors (cyclases) and effec tors in such defense systems is mediated by small diffus ing molecules, numerous sensor/effector pairs and inter connections between different systems has become possi ble. Interestingly, some of cGAS/DncV like cyclases pro duce cOAs [90] ; moreover, some of such cOA activated CBASS effectors are also employed by the Type III CRISPR Cas systems [91, 92] .

Although being extremely complex, Type III systems appear to be ancestral to all Class 1 systems. Given the prevalence of Class 1 systems, Type III systems could in fact be the most ancient among all CRISPR Cas systems. The mechanisms of Type III immunity have remained enigmatic for a long time. The discovery of the cOA path way solved a big part of the puzzle; however, several aspects of the Type III immunity remain unclear. While the DNase activity of the Cas10 HD domain is essential for the immune response, its targets are still unknown, and we could only speculate about the biological role of this activity. Although auxiliary RNases (Csm6 and Csx1) were characterized as non specific in in vitro experi ments, the data on their in vivo specificity are lacking. There are a number of genes associated with the Type III CRISPR Cas systems, but only few of such genes have been experimentally characterized. Several types of membrane associated effectors have not been studied at all [93] . Beside complete Type III systems, simplified RAMP containing loci were predicted by bioinformatics analysis. Such systems could be intermediates in the evo lution of the present day Type III CRISPR Cas systems. In particular, the structural homology between the HRAMP signature proteins and the Cas10 proteins of Type III systems discovered by us suggests that the HRAMP systems have either originated from the existing Type III systems or from their ancestors. Finally, the activity and biological functions of the Cas10 CARF HEPN signaling systems remain to be characterized. To summarize, our current understanding covers (incom pletely) only a small part of mechanisms behind the action of the Type III CRISPR Cas systems and lots of exciting research remains to be done.

Dissipation error tradeoff in proof reading

Inevitability of the emergence and persistence of genetic parasites caused by evolutionary instability of parasite free states

A decade of discov ery: CRISPR functions and applications

Evolutionary classification of CRISPR Cas systems: a burst of class 2 and derived vari ants

An updated evolutionary classifi cation of CRISPR Cas systems

A DNA repair system spe cific for thermophilic Archaea and bacteria predicted by genomic context analysis

DNA polymerase beta like nucleotidyltransferase superfamily: identification of three new families, classification and evolutionary histo ry

Presence of a classical RRM fold palm domain in Thg1 type 3′ 5′ nucleic acid polymerases and the origin of the GGDEF and CRISPR polymerase domains

A guild of 45 CRISPR associated (Cas) pro tein families and multiple CRISPR/Cas subtypes exist in prokaryotic genomes

Evolution and classi fication of the CRISPR Cas systems

CRISPR interference limits horizontal gene transfer in staphylococ ci by targeting DNA

RNA guided RNA cleavage by a CRISPR RNA-Cas protein complex

Structure and mechanism of the CMR com plex for CRISPR mediated antiviral immunity

Conditional tolerance of temperate phages via transcription dependent CRISPR Cas targeting

A novel interference mechanism by a type IIIB CRISPR Cmr module in Sulfolobus

Molecular insights into DNA interference by CRISPR associated nuclease helicase Cas3

Spatiotemporal con trol of Type III A CRISPR Cas immunity: coupling DNA degradation with the target RNA recognition

Bipartite recognition of target RNAs activates DNA cleavage by the type III B CRISPR Cas system

RNA activated DNA cleavage by the Type III B CRISPR Cas effector complex

Co tran scriptional DNA and RNA cleavage during Type III CRISPR Cas immunity

Target preference of Type III A CRISPR Cas complexes at the transcription bubble

Genetic characterization of antiplasmid immunity through a type III A CRISPR Cas system

CARF and WYL domains: ligand binding regulators of prokaryotic defense systems

Comprehensive analysis of the HEPN superfamily: identification of novel roles in intra genomic conflicts, defense, pathogenesis and RNA processing

The CRISPR associated Csx1 protein of Pyrococcus furiosus is an adenosine specific endoribonu clease

Structural basis for the endoribonuclease activity of the type III A CRISPR associated protein Csm6

Type III CRISPR Cas systems produce cyclic oligoadenylate second messengers

A cyclic oligonu cleotide signaling pathway in type III CRISPR Cas sys tems

A Type III B Cmr effector complex catalyzes the synthesis of cyclic oligoadenylate second messengers by cooperative substrate binding

Degradation of phage transcripts by CRISPR associated RNases enables Type III CRISPR Cas immunity

Non specific degradation of transcripts promotes plasmid clearance dur ing type III A CRISPR Cas immunity

Spacer acquisition by Type III CRISPR Cas system during bacteriophage infection of Thermus thermophilus

Regulation of cyclic oligoadeny late synthesis by the Cas10-Csm complex

Control of cyclic oligoadenylate synthesis in a type III CRISPR system

Ring nucleases deactivate type III CRISPR ribonucleases by degrading cyclic oligoadenylate

Activation and self inactiva tion mechanisms of the cyclic oligoadenylate dependent CRISPR ribonuclease Csm6

A Type III CRISPR ancillary ribonuclease degrades its cyclic oligoadenylate activator

An anti CRISPR viral ring nuclease subverts type III CRISPR immunity

OAS proteins and cGAS: unifying concepts in sensing and responding to cytosolic nucleic acids

The nature of the catalytic domain of 2′ 5′ oligoadenylate synthetases

Genetic mechanisms of critical illness in COVID 19

Type III A CRISPR immuni ty promotes mutagenesis of staphylococci

Structure and activity of the RNA targeting Type III B CRISPR Cas complex of Thermus thermophilus

SCOPE enables type III CRISPR Cas diagnostics using flexible targeting and strin gent CARF ribonuclease activation

Structure and mechanism of a Type III CRISPR defence DNA nuclease activated by cyclic oligoadenylate

Predicted highly derived class 1 CRISPR Cas system in Haloarchaea containing diverged Cas5 and Cas7 homologs but no CRISPR array

Highly accurate protein structure prediction with AlphaFold

Structure studies of the CRISPR-Csm complex reveal mechanism of co transcriptional interference

ColabFold -Making protein folding accessible to all, bioRxiv

The Card1 nuclease provides defence during type III CRISPR immunity

CRISPR Cas: adapting to change

CRISPR adaptation biases explain preference for acquisition of foreign DNA

CRISPR Cas systems exploit viral DNA injection to estab lish and maintain adaptive immunity

Detection of spacer precursors formed in vivo during primed CRISPR adaptation

Multiple origins of reverse transcriptases linked to CRISPR Cas systems

On the origin of reverse transcrip tase using CRISPR Cas systems and their hyperdiverse, enigmatic spacer repertoires

Direct CRISPR spacer acquisition from RNA by a natural reverse transcriptase Cas1 fusion protein

Spacer acquisition from RNA mediated by a natural reverse transcriptase Cas1 fusion protein associated with a type III D CRISPR Cas system in Vibrio vulnificus

Structural coor dination between active sites of a CRISPR reverse tran scriptase-integrase complex

PAM identification by CRISPR Cas effector complexes: diversified mechanisms and structures

Structure basis for directional R loop formation and substrate handover mechanisms in Type I CRISPR Cas system

DNA interrogation by the CRISPR RNA guided endonuclease Cas9

Real time observation of DNA target interro gation and product release by the RNA guided endonucle ase CRISPR Cpf1 (Cas12a)

Self versus non self discrimination during CRISPR RNA directed immunity

Programmable RNA shredding by the type III A CRISPR Cas system of Streptococcus thermophilus

The molecular architecture for RNA guided RNA cleavage by Cas13a

RNA guide complementarity prevents self targeting in Type VI CRISPR systems

2021 component programmable RNA guided RNA targeting CRISPR effector

Cas13 induced cellular dormancy prevents the rise of CRISPR resistant bacteriophage

Origins and evolution of CRISPR Cas systems

The basic building blocks and evolution of CRISPR CAS sys tems

Casposons: a new superfamily of self synthesizing DNA transposons at the origin of prokaryotic CRISPR Cas immunity

Casposon integration shows strong target site preference and recapitulates protospacer integra tion by CRISPR Cas systems

A putative RNA interfer ence based immune system in prokaryotes: computational analysis of the predicted enzymatic machinery, functional analogies with eukaryotic RNAi, and hypothetical mecha nisms of action

Cas1 Cas2 com plex formation mediates spacer acquisition during CRISPR Cas adaptive immunity

Structural biology. Crystal structure of the CRISPR RNA guided surveillance complex from Escherichia coli

Structural biology. Structures of the CRISPR-Cmr complex reveal mode of RNA target positioning

Biogenesis pathways of RNA guides in archaeal and bacterial CRISPR Cas adaptive immunity

Type IV CRISPR RNA processing and effec tor complex formation in Aromatoleum aromaticum

Comparative genomic analyses reveal a vast, novel network of nucleotide centric systems in biological conflicts, immunity and signaling

Structure of a type IV CRISPR Cas ribonucleoprotein complex, iScience

DNA guided DNA interference by a prokaryotic Argonaute

DNA targeting and interference by a bacterial Argonaute nuclease

Casposons: mobile genetic elements that gave rise to the CRISPR Cas adaptation machinery

Structure of a type IV CRISPR Cas effector complex, bioRxiv

Molecular mech anisms and cellular functions of cGAS STING signalling

Diversity and classification of cyclic oligonu cleotide based anti phage signalling systems

Cyclic GMP AMP signalling protects bacteria against viral infec tion

CBASS immunity uses CARF related effectors to sense 3′ 5′ and 2′ 5′ linked cyclic oligonucleotide signals and protect bacteria from phage infection

HORMA domain proteins and a Trip13 like ATPase regulate bacterial cGAS like enzymes to mediate bacteriophage immunity

Structure and mechanism of a cyclic trinu cleotide activated bacterial endonuclease mediating bacte riophage immunity

A jumbo phage that forms a nucleus like structure evades CRISPR Cas DNA targeting but is vulnerable to type III RNA based immunity

Systematic prediction of genes functionally linked to CRISPR Cas systems by gene neighborhood analysis

We would like to thank Svetlana Belukhina for her help during the preparation of the review.