key: cord-0010217-0m2f0eh4 authors: Caspi, Jonathan; Amitai, Gil; Belenkiy, Olga; Pietrokovski, Shmuel title: Distribution of split DnaE inteins in cyanobacteria date: 2003-11-11 journal: Mol Microbiol DOI: 10.1046/j.1365-2958.2003.03825.x sha: f67664515fb5ad87e828b47ce36e28778600f8c3 doc_id: 10217 cord_uid: 0m2f0eh4 Inteins are genetic elements found inside the coding regions of different host proteins and are translated in frame with them. The intein‐encoded protein region is removed by an autocatalytic protein‐splicing reaction that ligates the host protein flanks with a peptide bond. This reaction can also occur in trans with the intein and host protein split in two. After translation of the two genes, the two intein parts ligate their flanking protein parts to each other, producing the mature protein. Naturally split inteins are only known in the DNA polymerase III alpha subunit (polC or dnaE gene) of a few cyanobacteria. Analysing the phylogenetic distribution and probable genetic propagation mode of these split inteins, we conclude that they are genetically fixed in several large cyanobacterial lineages. To test our hypothesis, we sequenced parts of the dnaE genes from five diverse cyanobacteria and found all species to have the same type of split intein. Our results suggest the occurrence of a genetic rearrangement in the ancestor of a large division of cyanobacteria. This event fixed the dnaE gene in a unique two‐genes one‐protein configuration in the progenitor of many cyanobacteria. Our hypothesis, findings and the cloning procedure that we established allow the identification and acquisition of many naturally split inteins. Having a large and diverse repertoire of these unique inteins will enable studies of their distinct activity and enhance their use in biotechnology. Inteins are genetic elements present in protein coding regions. All the element codes for a protein that is translated together with the coding region of its host gene. The intein protein is removed from the host protein by a pro-tein-splicing reaction that joins the intein flanks with a peptide bond. This reaction is autocatalytic, fully catalysed by the intein and the residue C-terminal to it, with no need for other proteins, ATP or such molecules (Paulus, 2000) . It is typically a cis intramolecular reaction but can also occur when the N-and C-terminal parts of the intein are split and encoded on separate protein chains, each with its own flank. A trans protein-splicing reaction then ligates the flanks of the two intein parts (Shingledecker et al ., 1998; Southworth et al ., 1998) . A split intein is naturally present in the dnaE genes of a few cyanobacteria (Gorbalenya, 1998; Kaneko et al ., 2001; Nakamura et al ., 2002) . Inteins are active with various flanks, in heterologous organisms and in vitro . Biochemical studies of the protein-splicing mechanism led to the use of typical and split inteins in diverse biotechnology applications (Perler and Adam, 2000; Ozawa et al ., 2001; Mootz and Muir, 2002) . Inteins are present in a variety of protein genes from diverse bacteria and archaea and in several eukaryotes. However, intein distribution is extremely sporadic. Although inteins are widely distributed, present in the three domains of life, they are relatively rare, and about 160 are currently known (Perler, 2002) . Moreover, their distribution is discontinuous and irregular, with even closely related species differing in intein presence at homologous integration sites (Liu, 2000; Pietrokovski, 2001) . Consequently, intein presence is very difficult to predict. Intein distribution is the result of two parallel processes. The primary process seems to be independent of gradual loss of intein elements from separate species during evolution (Pietrokovski, 2001) . Inteins are not known to contribute any advantage to their host genes or species and are believed to be selfish genetic elements (Belfort et al ., 1995) . Active selection against inteins seems to be relatively weak because of their apparent negligible disruption of their host genes, protein products and organisms. Intein removal from the genome requires a precise DNA excision event as they are inserted in highly conserved points of genes coding for essential proteins (Derbyshire and Belfort, 1998) . Counteracting this slow extinction is horizontal transfer to specific integration points, e.g. homing (Belfort and Roberts, 1997) . Most intein proteins include a homing endonuclease domain that can mediate insertion of their intein gene into homologous unoccupied intein integration points (Gimble and Thorner, 1992) . Hence, homing activ-ity of inteins can reinsert their gene into integration sites that they were lost from. Cyanobacteria are a large and diverse group of bacteria. They can be clustered by their 16S rRNA sequences into seven monophyletic evolutionary groups (Honda et al ., 1999) . Although the exact relation between the groups is not fully certain, each group of species is highly likely to have diverged from a distinct species. The recent genome sequencing of various cyanobacteria prompted us to examine the distribution and origin of the cyanobacterial split DnaE inteins. Here, we show that split inteins of the dnaE gene are common in several groups of cyanobacteria. This gene arrangement seems to be fixed in at least three large and probably related groups that include scores of known species. We cloned split inteins from five diverse species of these groups and thus show how to obtain split inteins from many different species. The distribution of these genes allows us to reconstruct their evolution. We discuss necessary steps in the transition from contiguous to split inteins and whether this transition is likely to be common. We first screened for the presence of split inteins by database searches. DNA polymerase III alpha subunit (PolC or DnaE protein) was found to be encoded on two genes in several cyanobacteria -Synechocystis species PCC 6803 (Ssp PCC6803) (Kaneko et al ., 1996) , Synechococcus species PCC 7002 (Ssp PCC7002) (Yu et al ., 2002) , Nostoc species PCC 7120 (Nsp PCC7120, previously called Anabaena species PCC 7120) (Kaneko et al ., 2001) , Nostoc punctiforme (Npu) (Meeks et al ., 2001) , Therrmosynechococcus elongatus BP-1 (Tel) (Nakamura et al ., 2002) and Trichodesmium erythraeum (Ter, http:// genome.jgi-psf.org/draft_microbes/trier/trier.home.html). The gene organization is identical in all cases. The dnaE coding sequences are split in the same highly conserved point into two genes. The 5 ¢ part of the dnaE gene ( dnaE 1) is followed by a 5 ¢ part of the split intein, and the dnaE gene 3 ¢ part ( dnaE 2) is preceded by a 3 ¢ part of the split intein ( Fig. 1) . The Ter dnaE 1 gene also includes other inteins and group II introns (not shown). The mature DnaE Cyanobacterial dnaE gene loci. Each line shows the dnaE gene locus or the loci of dnaE 1 and dnaE 2 split genes from one species, except for the top locus that is identical in all three listed species. Gene protein-coding regions are shown as rectangles with an arrowhead at their 3 ¢ ends. The dnaE genes are shown in dark grey with the split intein parts in black. Other homologous genes are indicated by similar patterns -there are only two such pairs, between Nostoc species PCC7120 and Prochlorococcus marinus MED4, and Nostoc species PCC7120 and Nostoc punctiforme . Gene names or functions are indicated where known. The distance between the split dnaE genes in each species is indicated where known. protein in each species is assumed to be ligated by the split intein in a trans protein-splicing reaction from the separately translated DnaE1 and DnaE2 proteins (Wu et al ., 1998; Perler, 1999) . DnaE genes were also found in three other species of cyanobacteria, Prochlorococcus marinus MED4 (Pma MED4) and MIT9313 (Pma MIT9313) (Hess et al ., 2001) and Synechococcus species WH8102 (Ssp WH8102, http://genome.jgi-psf.org/finished_microbes/ synw8/synw8.home.html). In all these species, the dnaE genes are contiguous, having no inteins, and are flanked by the same genes. These genes thus appear in the same genomic context (Fig. 1) . Corresponding parts of known split dnaE genes are each flanked by different genes. Thus, split dnaE genes are in different genomic contexts (Fig. 1) . In addition to being integrated at the same point, the DnaE split intein amino acid sequences are more similar to each other then to those of other inteins (Fig. 2) . The progenitor intein of all DnaE split inteins was probably a typical, contiguous intein. At some time point, this intein and its dnaE host gene were split by some genetic rearrangement event to form two genes (Perler, 1999) . Although all known split dnaE genes underwent further genomic rearrangements, evidenced by their differing genomic contexts, they retained the split intein parts. This indicates the stability of the split intein organization in dynamic genomes. Five of the six above-mentioned known split dnaE genes are present in species from three of the seven distinct groups of cyanobacteria; Thermosynechococcus elongatus BP-1 (previously named Synechococcus elongatus Toray) is not placed in any of the seven groups by 16S rRNA analysis and is considered as part of an early diverged group of cyanobacteria (Honda et al ., 1999) . Cyanobacteria with known contiguous dnaE genes are all from a fourth group, and there is no public data on cyanobacterial dnaE genes from the other three defined groups ( Fig. 2A) . Split intein genes are extremely unlikely to be transferred by homing. Not only do both intein genes need to be copied, but they have no endonuclease domains. Thus, Honda et al . (1999) . Species with known DnaE sequences are marked by filled bullets for DnaE sequences split by inteins and by empty bullets for contiguous DnaE sequences. Underlined filled bullets denote split DnaE inteins identified and sequenced in this work ( Aphanizomenon , Aphanothece , Oscillatoria and Thermosynechococcus vulcanus ). The relations shown between the groups are the most probable ones but are not as certain as the group definitions (Honda et al ., 1999) . Listed for each group are species with known DnaE or DnaE intein sequences or a representative species chosen from the groups of Honda et al . (1999) . The listed Aphanothece and Oscillatoria species 16S rRNA sequences were significantly most similar to each other (931/1000 bootstrap value). They were closest to group 2 species but could not be definitely clustered with it. Centre. A dendogram calculated from conserved DnaE polymerase protein regions spanning 409 amino acids. The dendogram is rooted by the position of all cyanobacterial DnaE protein regions within a larger dendogram of DnaE proteins from various bacteria (not shown). Right. A dendogram calculated from conserved split DnaE intein sequences spanning 132 amino acids (see Fig. 4A ). The dendogram is rooted by the position of all split DnaE inteins within a larger dendogram of various inteins (not shown). Bootstrap confidence values for grouping of DnaE proteins and split inteins (values at the root) are from the larger dendograms. Bootstrap confidence values for DnaE polymerase and intein dendograms calculated from 1000 trials. Nodes below values of 700/1000 were collapsed. Therrmosynechococcus vulcanus sequences are not included in the analysis as only a small part upstream of its DnaE1 intein was determined. Nevertheless, the determined sequences of the intein and polymerase regions are very similar to T. elongatus BP-1, and the two species are expected to cluster together. the distribution of known split DnaE inteins is probably the result of regular vertical transmission. We hypothesized that, as split DnaE inteins are present in species from three diverse cyanobacterial groups, they might be very common, or even invariably present, in other species from these groups and maybe also in other related groups ( Fig. 2A) . To test our hypothesis, we set out to clone the DnaE genes from diverse cyanobacteria. Analysed species included Aphanizomenon ovalisporum (Aov), a freshwater cyanobacteria most similar by its 16S rRNA sequence to Nostoc species; Microcystis species (Vardi et al ., 2002) , a freshwater cyanobacteria belonging to group 5 (related to Ssp PCC6803 and Ssp PCC7002); Aphanothece halophytica (Aha) and Oscillatoria limnetica (Oli), unicellular and filamentous, respectively, facultative anaerobic photoautotrophic cyanobacteria that are most similar by their 16S rRNA sequences to group 2 (not shown); and Therrmosynechococcus vulcanus (Tvu), a thermophilic cyanobacteria species closely related to T. elongatus BP-1 (Nakamura et al ., 2002) . Using sequential degenerate primer polymerase chain reaction (PCR) amplifications, we determined the sequences of conserved dnaE gene regions flanking one side of the known split intein integration point. We then amplified the region of the insertion site by single primer linear amplification and terminal transferase tailing (Rudi et al ., 1999) (Fig. 3) . The dnaE genes of all five tested species were found to be split in two and have a split intein at the same point as the other known cyanobacterial split DnaE inteins. We obtained complete sequences of the split inteins and the DnaE flanks from four of the species (Fig. 4A) . Together with the previously publicly available sequences, there are now nine known complete split DnaE intein sequences. The length of the N ¢ split intein parts is between 123 and 101 amino acids, with the longest being from Ssp PCC6803 and the shortest from Aov. The length variability results from the C ¢ ends of these intein parts. In contrast, all eight known C ¢ split intein parts are 35 or 36 amino acids in length (Fig. 4A) . The combined length of the Aov split intein is 137 amino acids, only three residues longer then the length of the shortest known intein, from the archaeon Methanobacterium thermoautotrophicum (Smith et al ., 1997) . None of the split inteins has an endonuclease domain, like 18% of known inteins (http://bioinfo.weizmann.ac.il/ ~ pietro/inteins). All nine split DnaE inteins have the six conserved sequence motifs that define the intein proteinsplicing fold and active site (Duan et al ., 1997; Pietrokovski, 1998) . They are further conserved along their entire sequence, except for the C ¢ ends of the N ¢ parts, which differ from each other in sequence and length (Fig. 4A) . The conserved minimal sequence regions of the N-and C-intein parts are those defined by computational and experimental analyses (Pietrokovski, 1998; Ghosh et al., 2001; Mootz and Muir, 2002) . Split DnaE inteins were found to be common in cyanobacteria. They are present in all examined dnaE genes of species from three groups of cyanobacteria and an unassigned species (Thermosynechococcus) and were probably fixed in them by one common event. We suggest that most, if not all, cyanobacteria originating from the last common ancestor of species with split DnaE inteins also have this split intein. The apparent fixation of split DnaE inteins in a subdivision of cyanobacteria contrasts with the discontinuous distribution of other inteins, including those present in cyanobacteria (Table 1) . Split DnaE intein presence is a taxonomic trait that could be used in classifying cyanobacteria. We suggest it to be an ancient, highly persistent and vertically transmitted trait that separates cyanobacteria into two separate Fig. 3 . Split inteins primer-walking amplification strategy. Sequence of a dnaE1, the dnaE N¢ part, and its N¢ intein part are shown as boxes. Amplification reactions are shown as lines beneath the sequence region amplified, with the primer regions shown as triangles. Degenerate primers are shown as grey triangles, and specific primers are shown as black triangles. Terminal transferase 3¢ added tail is shown as a dotted line. The reaction order is from top to bottom. The first reaction amplified a region 5¢ to the intein integration point using degenerate primers to two flanking conserved regions. The next reaction amplified the 3¢ region of the previous reaction and a downstream region, using a specific 5¢ primer and a degenerate 3¢ primer. Typically, two or three such reactions sufficed to determine the sequence up to the insertion site. To amplify the less conserved Nintein region and its downstream 3¢ untranslated region, a linear amplification using a single specific 5¢ primer was followed by terminal transferase polycytosine tailing. Products from this reaction were then reamplified by a specific internal 5¢ primer and a polyguanosine primer. Finally, the sequence was verified by amplifying from genomic DNA the intein part with upstream and downstream regions using specific primers. The C-intein part of the dnaE2 gene was cloned similarly with the amplification reactions proceeding in the reverse orientation -advancing from the dnaE2 central region towards its 5¢ end. clades. Thermosynechococcus, classified by its 16S rRNA gene as an early diverged genus of cyanobacteria (Honda et al., 1999) , is found by the presence of a split DnaE intein to be part of a larger assembly of cyanobacterial groups. This could be investigated further by analysing the now available complete genome of T. elongatus BP-1 (Nakamura et al., 2002) . Split cyanobacterial dnaE genes are present in diverse loci, whereas all known cyanobacterial contiguous dnaE genes appear in stable genomic positions. The original genomic rearrangement that split the dnaE gene thus seems to have been followed by further genomic rearrangements. Nostoc species PCC 7120 and related cyanobacteria undergo developmentally regulated genome rearrangements (Golden et al., 1987; Carrasco and Golden, 1995) . These are species in which split dnaE genes also occur. It is possible that some cyanobacteria are more tolerant to genome rearrangements. Such events produce split genes that are reactivated by regulated rearrangements. One rearrangement may have split A. DnaE1 (top) and DnaE2 (bottom) intein parts. The start position of each intein is shown in their host proteins, except for the DnaE1 sequences determined in this work where only the 3¢ parts of the genes were determined. Asterisks indicate stop codons. Underlined regions are intein conserved sequence motifs (Pietrokovski, 1998) . Lower case sequence regions in the C¢ end of the DnaE1 intein parts are not conserved in these sequences and should not be considered aligned. Below the sequence alignment is a graphical representation of it (Henikoff et al., 1995) a dnaE gene in its intein region. However, the two dnaE gene parts could reassemble at the protein level by their intein regions. There is no biological assay for intein activity. New intein genes can be identified by amplifying genes known to include inteins in some species from other related species (Davis et al., 1994; Fsihi et al., 1996; Sander et al., 1998; Saves et al., 2000; Lazarevic, 2001) . However, all known inteins are randomly distributed, and only one case was found of closely related species with consistent intein distribution (Sander et al., 1998) . Usually only a few of the examined species were found to contain inteins. DnaE split inteins seem to be an exception, being present in all cyanobacteria species that we examined that belong to certain well-defined groups. Other species from these groups can thus serve as a reliable source of split inteins. The continuous distribution of DnaE split inteins in cyanobacterial groups has theoretical and applied implications for intein studies. We can now obtain inteins that co-evolve with their host species. These inteins can be used to study the evolutionary change rate of inteins, for example to decide whether inteins are of ancient or recent origin (Pietrokovski, 2001) . They could also be a valuable source in theoretical and experimental studies of proteinprotein interaction and intein catalytic activity (Gorbalenya, 1998; Martin et al., 2001) . From a practical aspect, our ability to obtain an apparently very large number of different DnaE split inteins will be enhanced. For example, the Pasteur Culture Collection of Cyanobacteria (PCC) maintains about 475 axenic strains, many belonging to the groups in which we believe DnaE split inteins are fixed. Of particular interest will be split inteins from species living in extreme conditions such as high temperature or salinity. The protein-splicing activity of these split inteins may depend on the conditions in which their host species are active. Our hypothesis is that split inteins are very difficult to lose. This is based on the idea that split inteins cannot be lost by one, or even two, simple DNA excision events. Cyanobacterial DnaE proteins are split by inteins in a highly conserved motif appearing in all known DnaE proteins (Fig. 4B ). This strongly suggests that the two split DnaE host protein parts would have to be attached by a peptide bond at the intein integration point for the motif to adopt the fold, and have the activity, found in other DnaE proteins. Loss of one or both intein parts will leave the products of their gene flanks split with no mechanism to ligate. To lose the intein genes and retain a functional dnaE gene requires parallel precise loss of the intein parts followed directly by precise fusion of their flanks into one gene; this is a highly unlikely event. An alternative method for removal of split intein genes is to acquire a surrogate gene to replace the product of the split genes. However, DnaE is an essential and tightly regulated gene forming the core of the bacterial replicative DNA polymerase. It interacts with four other protein subunits in the polymerase holoenzyme and with various cofactors during replication (Kelman and O'Donnell, 1995) . Thus, a surrogate gene must be precisely adapted to the species to replace its DnaE gene. There are more then 50 known intein alleles, but only the cyanobacterial dnaE intein allele is split. Nevertheless, these split inteins are common in cyanobacteria. If split inteins are common, at least in one group of species, why is there only one known type (allele) of them? The rarity of split intein organization probably results from the difficulty in switching to a functional two genes and two prod- a. Species include fully and almost fully sequenced cyanobacteria. +, intein is present; -, intein is absent; \, species does not have the orthologue of the intein host protein. ucts state from a single gene and a single product one. Gene splitting would require acquiring a promoter and translation initiation signal for the downstream part of the split gene, adapting co-regulation for the two formed genes and evolving amino acid sequences to stabilize the two new ends of the intein parts. Gene splitting of the intein might also require adaptation for trans-splicing and high-affinity interaction between the two intein parts. Artificially created split inteins were found to have some affinity between their two parts and a low level of trans-splicing (Shingledecker et al., 1998; Southworth et al., 1998; Mootz and Muir, 2002) . Thus, adapting the protein function of a newly split intein gene might be the minor difficulty compared with adapting proper transcription and translation control and protein stability. Once all these adaptations were complete, the unidirectional ratchet nature of the intein-splitting process would ensure the evolutionary persistence of the gene organization. However, still unknown are what initial circumstances led to the selection of the individual cyanobacterium in which the intein splitting occurred. Significant advantage/s likely to have accompanied this event to compensate for all necessary genetic adjustments are discussed above. Analysed species Sequence determination of the DnaE split inteins was done in two steps for each part of the DnaE gene. First, a conserved region flanking the N¢ or C¢ split intein part was amplified by degenerate primer PCRs. In some cases, a number of partially overlapping regions were amplified until the intein parts were approached. Next, linear (single primer) amplifications were done towards each split intein part using specific primers, designed from the previously amplified regions. The 3¢ (intein) ends of the single-stranded amplification products were tailed by a homo-oligomer using terminal transferase. The tailed products were then PCR amplified by a primer complementary to the homo-oligomer tail and a second specific primer, nested to the first specific primer (Rudi et al., 1999) . This PCR amplification was repeated twice, and the products were cloned and sequenced or sequenced directly. To confirm the resulting sequence, the intein-containing region, together with its adjacent untranslated and dnaE flanks, was amplified from genomic DNA using specific primers and sequenced on both strands (Fig. 3) . PCR amplifications included ª50 ng of genomic DNA, 25 pmol of each specific primer and 30-100 pmol of each degenerate primer, 2 nmol of dNTP mix, 2.5 units of Taq DNA polymerase (Sigma) and 5 ml of 10¥ Taq polymerase buffer (Sigma) in 50 ml reaction volumes. Linear amplifications were done in the same way except using ª500 ng of genomic DNA and 12.5 pmol of one specific primer. Amplification schedules and temperatures are detailed in Supplementary material (Table S1 ). Primer sequences are listed in Supplementary material (Table S2) . The linear, single-stranded amplification products were purified using a PCR purification kit (Qiagen), as recommended by the manufacturer. Cytosine tailing was done in 20 ml reactions with 5 ml of the purified linear amplification products, 200 pmol of dCTP, 30 units of terminal deoxynucleotidyl transferase (TDT) (Fermentas) and 4 ml of 5¥ TDT buffer (Fermentas). Reaction mixtures were incubated at 37∞C for 20 min and stopped by a 10 min incubation at 72∞C. PCR amplifications with the polyguanine primer (polyG) used 4 ml of the TDT reaction products as template with all the other components as listed above. Products were purified by a PCR purification kit (Qiagen) and sent directly for sequencing. The confirmed sequence data have been submitted to the GenBank database under accession numbers AY209003-AY209008 and AY311409-AY311410. Phylogenetic analysis was done using programs from the PHYLIP package (Felsenstein, 1989) , version 3.55. Trees were calculated using the SEQBOOT, PROTDIST and NEIGHBOR programs with 1000 bootstrap trials and default settings. Homing endonucleases: keeping the house in order Prokaryotic introns and inteins: a panoply of form and function Two heterocystspecific DNA rearrangements of nif operons in Anabaena cylindrica and Nostoc sp Evidence of selection for protein introns in the recAs of pathogenic mycobacteria Lightning strikes twice -intron-intein coincidence Crystal structure of PI-SceI, a homing endonuclease with protein splicing activity PHYLIP -phylogeny inference package, version 3.2 Homing events in the gyrA gene of some mycobacteria Zinc inhibition of protein trans-splicing and identification of regions essential for splicing and association of a split intein Homing of a DNA endonuclease gene by meiotic gene conversion in Saccharomyces cerevisiae Different recombination site specificity of two developmentally regulated genome rearrangements Non-canonical inteins Automated construction and graphical presentation of protein blocks from unaligned sequences The photosynthetic apparatus of Prochlorococcus: insights through comparative genomics Detection of seven major evolutionary lineages in cyanobacteria based on the 16S rRNA gene sequence analysis with new sequences of five marine Synechococcus strains Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. II. Sequence determination of the entire genome and assignment of potential protein-coding regions Complete genomic sequence of the filamentous nitrogen-fixing cyanobacterium Anabaena sp. strain PCC 7120 DNA polymerase III holoenzyme: structure and function of a chromosomal replicating machine Ribonucleotide reductase genes of Bacillus prophages: a refuge to introns and intein coding sequences Protein-splicing intein: genetic mobility, origin, and evolution Characterization of a naturally occurring trans-splicing intein from Synechocystis sp An overview of the genome of Nostoc punctiforme, a multicellular, symbiotic cyanobacterium Protein splicing triggered by a small molecule Complete genome structure of the thermophilic cyanobacterium Thermosynechococcus elongatus BP-1 Split luciferase as an optical probe for detecting protein-protein interactions in mammalian cells based on protein splicing Protein splicing and related forms of protein autoprocessing A natural example of protein transsplicing InBase: the intein database Protein splicing and its applications Modular organization of inteins and C-terminal autocatalytic domains Intein spread and extinction in evolution Restriction cutting independent method for cloning genomic DNA segments outside the boundaries of known sequences Inteins in mycobacterial GyrA are a taxonomic character Inteins invading mycobacterial RecA proteins Molecular dissection of the Mycobacterium tuberculosis RecA intein: design of a minimal intein and of a trans-splicing system involving two intein fragments Complete genome sequence of Methanobacterium thermoautotrophicum strain DH: functional analysis and comparative genomics Control of protein splicing by intein fragment reassembly Dinoflagellate-cyanobacterium communication may determine the composition of phytoplankton assemblage in a mesotrophic lake Protein trans-splicing by a split intein encoded in a split DnaE gene of Synechocystis sp. PCC6803 PGAAS: a prokaryotic genome assembly assistant system We thank Z. Kelman The following material is available from http://www.blackwellpublishing.com/products/journals/ suppmat/mmi/mmi3825/mmi3825sm.htm Table S1 . DNA amplification conditions. Table S2 . PCR primers.