key: cord-0956895-wxbcjofa authors: Cantara, Alessio; Luo, Yu; Dobrovolná, Michaela; Bohalova, Natalia; Fojta, Miroslav; Verga, Daniela; Guittat, Lionel; Cucchiarini, Anne; Savrimoutou, Solène; Häberli, Cécile; Guillon, Jean; Keiser, Jennifer; Brázda, Václav; Mergny, Jean Louis title: G-quadruplexes in helminth parasites date: 2022-03-02 journal: Nucleic Acids Res DOI: 10.1093/nar/gkac129 sha: f95d1eae0a49a49163f2d552a361fd5c8b0e8c15 doc_id: 956895 cord_uid: wxbcjofa Parasitic helminths infecting humans are highly prevalent infecting ∼2 billion people worldwide, causing inflammatory responses, malnutrition and anemia that are the primary cause of morbidity. In addition, helminth infections of cattle have a significant economic impact on livestock production, milk yield and fertility. The etiological agents of helminth infections are mainly Nematodes (roundworms) and Platyhelminths (flatworms). G-quadruplexes (G4) are unusual nucleic acid structures formed by G-rich sequences that can be recognized by specific G4 ligands. Here we used the G4Hunter Web Tool to identify and compare potential G4 sequences (PQS) in the nuclear and mitochondrial genomes of various helminths to identify G4 ligand targets. PQS are nonrandomly distributed in these genomes and often located in the proximity of genes. Unexpectedly, a Nematode, Ascaris lumbricoides, was found to be highly enriched in stable PQS. This species can tolerate high-stability G4 structures, which are not counter selected at all, in stark contrast to most other species. We experimentally confirmed G4 formation for sequences found in four different parasitic helminths. Small molecules able to selectively recognize G4 were found to bind to Schistosoma mansoni G4 motifs. Two of these ligands demonstrated potent activity both against larval and adult stages of this parasite. Helminth infections caused by parasitic Nematodes (roundworms) and Platyhelminths (flatworms) are among the most prevalent afflictions for people living in poor areas of the world with over a quarter of the total human population affected worldwide (1, 2) . Parasitic worm infections cause anemia, malnutrition, allergies, bloody diarrhea, bowel cramps and inflammation associated with colonic polyposis (3) and increased susceptibility to HIV and progression to AIDS, resulting in many obstructive pathologies (3, 4) . In addition, severe anemia in pregnancy is associated with neonatal prematurity and reduced birthweight (3, 5) . Cattle infections have a significant economic impact on livestock production due to a reduction in growth, milk yield and fertility (6) . Recent estimates suggest that Ascaris lumbricoides infects over a billion, Trichuris trichiura 795 million (7) , and Strongyloides stercoralis 30-100 million people (8) . Hookworms such as Necator americanus and Ancylostoma duodenale cause hookworm diseases, which are associated with blood loss and anemia. Schistosomiasis is the third most reported global tropical disease caused by trematode flukes of the genus Schistosoma (9) . The three most common species of parasitic trematodes of the family Schistosomatidae are Schistosoma mansoni,Schistosoma japonicum and Schistosoma haematobium (1, 9) . The greatest numbers of S. mansoni infections occur in Sub-Saharan Africa, the Middle East, the Caribbean, Brazil, Venezuela and Suriname. S. japonicum is localized to Asia, primarily the Philippines, Indonesia, and China. S. haematobium is highly prevalent in Sub-Saharan Africa, the Middle East, and was recently reported in Corsica (France) (2) . Schistosomiasis is estimated to affect ∼200 million people worldwide (10) . Few treatments are available to fight worm infections. In addition, widespread use of drugs such as benzimidazoles selects for drug-resistant parasite strains (11) , and reduced praziquantel efficacy leads to low egg reduction rates (12) . The recent sequencing of the genome of helminths (e.g. S. mansoni, S. japonicum and S. haematobium (13) (14) (15) offers new opportunities to identify novel targets in key genes of the parasites. Identifying potential G4-forming sequences (PQS) in key genes associated with the infectivity of parasitic worms should allow the prediction of new druggable targets for G4 ligands. G4 are noncanonical nucleic acid secondary structures (16) formed by G-rich sequences that are built of stacked tetrads (also called 'G-quartets') constituted of four Hoogsteen hydrogen-bonded guanine bases. They can display a wide variety of topologies, resulting from various possible combinations of DNA/RNA strand directions, as well as variations in loop size and molecularity (17) . G4 are further stabilized by the presence of monovalent cations such as potassium or sodium, which are abundant in cells (18) . G4 can be found throughout a variety of genomes (19) , and are overrepresented in the promoter regions of genes, especially regulatory genes involved in cell proliferation or survival (20, 21) as well as in regions which regulate important biological processes (22) including immune response, transcriptional activation, DNA damage repair (23) and telomere maintenance (24) , as shown mostly in mammalian cells. The interest in the therapeutic potential of gene promoters containing G4 has resulted in a rapidly increasing number of studies during the past decade in which small molecules have been used to act as G4 stabilizers, with reports of transcription inhibition in cell-based assays. G4 structures have only been investigated in a few Platyhelminths and Nematode species. For instance, 1,500 PQS were found in Caenorhabditis elegans using the Quadparser algorithm, of which ∼500 are associated with transcription start site regions. The peak in PQS density coincides almost exactly with the nucleosome depleted region, which is consistent with the hypothesis that functional PQS may be located outside nucleosome bound regions (25) . G4 stable under physiological conditions (K + -rich buffer) and stabilized by Pyridostatin (PDS, a specific G4-targeting small molecule) were identified in C. elegans (26) . The presence of G4 structures in heterochromatin and the difference in G4 staining between somatic and stem cells in germline DNA of the flatworm Macrostomum lignano pointed out the possibility that the resolution or suppression of G4 structures is important for stem cells with regenerative potential (27) . A study conducted on A. lumbricoides shows that the fluorescent G4 ligand Q1 binds selectively to the antiparallel telomeric G4 (28) . Very recently, G4 motifs were identified in the genome of S. mansoni and the authors confirmed the presence of G4 in adult worms, by means of the BG4 Gquadruplex-specific antibody (29) . Given the lack of information available in regard to the nature of G4s formed in infective species of Platyhelminths and Nematodes, we analyzed differences in potential G4forming sequences (PQS) presence, frequency and localization in the genomes of six Platyhelminths and four Nematodes using the G4Hunter tool. Comparing the genomes of parasites from these two phyla with different infectivity (parasitic and non-parasitic worms) may reveal details of processes driving pathogenicity and new possible drug targets. Finally, we showed that the recently-identified G4 motifs found in S. mansoni (29) may be targeted by G4 ligands, leading to anti-parasitic effects. The set of selected genomes (both mitochondrial and nuclear DNAs) of two Clades (Platyhelminths and Nematodes) belonging to the Clade Nephozoa (subgroup Protostomia) was downloaded from the Genome database of the NCBI. Six Platyhelminths (Schistosoma haematobium, Schistosoma japonicum, Schistosoma mansoni, Trichobilharzia regenti, Dibothriocephalus latus and Taenia asiatica), and four Nematoda (Strongyloides stercoralis, Trichuris trichiura, Ascaris lumbricoides and Caenorhabditis elegans) were chosen for an initial analysis. Four additional nematode species (Anisakis simplex, Ascaris suum, Parascaris equorum and Toxocara canis) were added for comparison with Ascaris lumbricoides. All accession numbers are provided in Supplementary Material 01. All sequences were analyzed using the G4Hunter Web tool (30) (http://bioinformatics.ibp.cz/#/analyse/quadruplex) which can read National Center for Biotechnology Information (NCBI) IDs. Unless specified otherwise, parameters for G4Hunter were set to 25 nucleotides for window size and ≥1.2 for threshold scores. This threshold appears as a reasonable compromise, giving few false positives (sequences not forming a G4 despite a G4Hunter score above threshold) and false negatives (sequences able to form a stable G4 despite having a G4Hunter score below threshold). Scores above 1.2 correspond to sequences having a higher guanine content and likely to form stable G4s. To rank sequences based on score, motifs were binned in five intervals covering the G4Hunter scores 1. To test whether the PQS occurrence in chromosomal breakpoint regions is significantly different than in a randomly shuffled sequence, we generated 40 random sequences with length and nucleotide content the same as in original A. suum breakpoints. All sequences were randomly shuffled using the Sequence Manipulation Suite (https: //www.bioinformatics.org/sms2/shuffle dna.html) and were manually analyzed using the G4Hunter Web tool (30) . The parameters for G4Hunter were set to 25 nucleotides for window size and thresholds of 1.2 or more (several thresholds Nucleic Acids Research, 2022 , Vol. 50, No. 5 2721 were considered). Sequences were merged in a single Excel file where was made the statistical evaluation using t-test. NCBI sequences in FASTA format were downloaded and the dataset was uploaded to SnapGene program. For every PQS we used the corresponding sequences from all analyzed genomes and alignments were generated using the Clustal Omega tool. All PQS found were searched in aligned sequences and WebLogo 3 was used for generating LOGO sequences Supplementary Material 04. Raw data were converted in .xlsx file format and analyzed through Microsoft Excel. All data files are available in Supplementary Materials. Correlation was evaluated by the Spearman's rank correlation coefficient (r s ) and are presented in Supplementary Material 05. Oligonucleotides were purchased from Eurogentec (Belgium) and used without further purification. Stock solutions were prepared at 100 M strand concentration for the unlabeled oligonucleotides and at 200 M strand concentration for double-labeled oligonucleotides in ddH 2 O. Sequences of tested G4 motifs and control G4s, single-strands, and duplexes are shown in Supplementary Table S5 . All oligonucleotides were annealed (95 • C for 5 min and slowly cooled to room temperature) in the corresponding buffer before measurements. 3 M oligonucleotide solutions were annealed in K100 buffer (100 mM KCl, 10 mM lithium cacodylate, 90 mM LiCl, pH 7.2). UV-melting profiles were recorded with a Cary 300 spectrophotometer (Agilent Technologies, France). Heating runs were performed between 10 • C and 95 • C, the temperature was increased by 0.2 • C/min, and absorbance was recorded at 260 and 295 nm (31). 3 M oligonucleotide solutions were annealed in K100 buffer. CD spectra were recorded on a J-1500 spectropolarimeter (JASCO, France) at room temperature (25 • C), using a scan range of 300-220 nm, a scan rate of 100 nm/min and averaging four accumulations. FRET melting assay was performed in 96-well plates and the fluorescence of dual-labeled G4-forming oligonucleotides (including F21T; sequences are shown in Supplementary Table S7 ) was recorded using a CFX96 qPCR instrument (Biorad). F21T sequence was annealed at 0.23 M in K10 buffer (10 mM KCl, 10 mM lithium cacodylate, 90 mM LiCl, pH 7.2), then the oligonucleotide was added to each well (final strand concentration of 0.2 M) which was incubated with or without the tested ligands at 2 and 5 M final concentration, to a final volume of 25 l. Competition experiments were performed in the presence of nonlabeled sequences, including one auto complementary duplex, ds26, one parallel G4, c-myc and G4s from S. Mansoni. The microplate was incubated at 25 • C for 5 min, after which the temperature was increased by increments of 0.5 • C/min to reach 95 • C. The collected signal was normalized to 1 and the melting temperature (T m ) was defined when the normalized signal was 0.5. T m corresponds to the difference of T m between the oligonucleotides with and without the ligands. This FRET-melting assay was done in duplicates. FRET melting competition (FRET-MC) experiments were performed in 96-well plates using a HT7900 RT-PCR instrument (Applied BioSystem), as previously described (32) . 50 M oligonucleotide solutions were annealed in K10 buffer. Each well contained 3 M competitors, 0.2 M fluorescent oligonucleotide F21T in the presence or absence of 0.4 M G4 ligand (PhenDC3) in K10 buffer, for a final volume of 25 l. Samples were kept at 25 • C for 5 min, then the temperature was increased by 0.5 • C per minute until 95 • C, and the FAM channel was used to collect the fluorescence signal. The T m of an oligonucleotide is defined as the temperature at which 50% of the oligonucleotide is unfolded. T m is determined as the difference in T m with the sample containing F21T in the absence of PhenDC3. Each experimental condition was tested in duplicate on two separate plates. Absorbance spectra were recorded on a Cary 300 spectrophotometer (Agilent Technologies, France) (scan range: 500-200 nm; scan rate: 600 nm/min; automatic baseline correction). -TDS: 3 M oligonucleotide solutions were annealed in K100 buffer. After recording the first spectra (folded) at 25 • C, temperature was increased to 95 • C, and the second UV-absorbance spectra was recorded after 15 min of equilibration at high temperature. TDS corresponds to the arithmetic difference between the initial (folded; 25 • C) and second (unfolded; 95 • C) spectra (33). -IDS: 3 M oligonucleotide solutions were annealed in Li-Caco10 buffer (10 mM lithium cacodylate, pH 7.2). Absorbance spectra were first recorded at 25 • C in the absence of any stabilizing cation. 1 M KCl was added after recording the first spectrum, to a final potassium concentration of 100 mM KCl. The second UV-absorbance spectrum was recorded after 15 min of equilibration. IDS correspond to the arithmetic difference between the initial (unfolded) and final (folded, thanks to the addition of K + ) spectra, after correction for dilution. -ThT (Thioflavin T) was used as previously described (34) . 7.5 M oligonucleotide solutions were annealed in K100 buffer. Each component was added in the order: 10 l K100 buffer, 10 l oligonucleotide and 5 l of 10 M ThT (dissolved in milli-Q water). The plate was shaken for 5 min and was incubated for 10 min at room temperature. Fluorescence intensity was collected at 490 nm after excitation at 420 nm in a TECAN M1000 pro plate reader. -NMM (N-methyl mesoporphyrin IX) was used under the same condition as ThT, except that fluorescence intensity was collected at 610 nm after excitation at 380 nm in a TECAN M1000 pro plate reader. The synthesis of G4 ligands tested against S. mansoni was previously described (35) (36) (37) . Stock solutions were prepared at 10 mM in DMSO. Newly transformed schistosomula (NTS) drug assay. S. mansoni cercariae were collected from infected snails and mechanically transformed to newly transformed schistosomula (NTS). 30-40 NTS/well were incubated with the drugs for 72 h at 37 • C, 5% CO 2 in a final well volume of 200-250 l. Compounds were tested in triplicate and the highest concentration of DMSO (<1%) served as control. Evaluation was done by microscopic readout (Carl Zeiss, Germany, magnification 80x) as summarized in a previous publication (38) . Adult S. mansoni drug assay. Animal studies were carried out following Swiss national and cantonal regulations on animal welfare at the Swiss Tropical and Public Health Institute (Basel, Switzerland (Swiss TPH) (permission no. 2070). Female mice (NMRI; age 3 weeks; weight ca. 20-22 g) were purchased from Charles River, Germany. Mice were kept under environmentally controlled conditions (temperature ∼25 • C; humidity ∼70%; 12 h light and 12 h dark cycle) with free access to water and rodent diet, and acclimatized for 1 week before infection. Adult schistosomes were collected by mechanical picking from the hepatic portal system and mesenteric veins of mice 49-day post-infection with 100 S. mansoni cercariae. Worms were incubated with the compounds for 72 h. Wells with 1% DMSO served as negative controls. IC 50 values were calculated using CalcuSyn Version 2.0 (Biosoft, Cambridge, UK). Phenotypes were evaluated under an inverted microscope and viability scores calculated (38) . We have selected 10 helminth organisms based on their impact on health or relevance as model species. From the accessible genomes, three of the most pathogenic species of Platyhelminths (Schistosoma haematobium, Schistosoma japonicum and Schistosoma mansoni) and three important Nematode species (Strongyloides stercoralis, Trichuris trichiura, and Ascaris lumbricoides) were selected. As reference organisms we have additionally selected three Platyhelminths (Trichobilharzia regenti, Dibothriocephalus latus, Taenia asiatica) and Caenorhabditis elegans, one of the best studied Nematodes. Both mitochondrial DNA (mtDNA) and nuclear genomes were analyzed by the G4Hunter algorithm for the presence of PQS. At first, we analyzed the mitochondrial genomes of the 10 helminths listed above (Supplementary Table S1 ). mtDNA length varied from 13,608 to 15,003 bp, with a GC content between 23% (S. stercoralis) and 32% (T. trichiura). The results show that GC content is poorly correlated with the number of detected PQS: we found only one PQS sequence for the organism with the highest GC content (T. trichiura) while S. stercoralis, which has the mitochondrial genome with the lowest GC content in our dataset, has 6 PQS in its mtDNA. In total, we found 77 PQS with a G4Hunter score above 1.2, but none with a score above 1.4 (in other words, all motifs found are in the 1.2-1.4 interval). A 1.2 threshold is considered as a reasonable compromise to identify G4 prone motif (45) ; higher scores correspond to motifs capable of forming very stable quadruplexes. The majority of these sequences (40/77) were found in the mtDNA of a single species, T. regenti. To analyze the PQS localization in mtDNA, we downloaded their annotations from NCBI and overlaid the PQS presence with these features (Supplementary Material 03). The organism with the highest number of PQS in mtDNA, T. regenti, has the majority of PQS in the repeat region of its mtDNA. However, PQS were also found in gene regions of this organism encoding cytochrome c oxidase subunit III, cytochrome b, NADH dehydrogenases subunits 4 and 2 and ATP synthase F0 subunit 6. For S. mansoni, all (11 out of 11) PQS are located in the CDS of the genes coding for cytochrome c oxidase and NADH dehydrogenase subunit. This can be found in Supplementary Material 03 and Supplementary Table S1 . The results demonstrate a nonrandom distribution of PQS in mtDNA of analyzed organisms according to PQS position and reveal that the prevalence of PQS is related mainly to production of NADH dehydrogenase subunits in S. mansoni. Supplementary Table S2 shows that S. mansoni is the only organism with an overlap of the PQS with the gene coding for the NADH dehydrogenase subunits 2, 5 and 4; the other two PQS are located inside the region coding for the cytochrome c oxidase subunits I and II. To explore if the PQS are over-represented in regions coding NADH dehydrogenase subunits, we took feature tables containing annotations of known features found in mtDNA sequences and counted them. We then counted features with PQS. As mentioned before, the majority of the PQS is located inside the mitochondrial genes and coding sequences (CDS). PQS within the region coding for NADH dehydrogenase subunits are present in over 17% of all NADH dehydrogenase subunit coding regions and over 36% of all regions coding for cytochrome c oxidase subunits. In contrast, only 10% of rRNA and <2% of tRNA sequences contained PQS suggesting enrichment in NADH dehydrogenase sub-unit and cytochrome c oxidase subunits. Data are available in Supplementary Material 03. PQS within the region coding for cytochrome c oxidase subunits are present in each analyzed organism. The position of this PQS may vary; it is found either in the region coding for the subunit I (S. hematobium, S. japonicum and S. mansoni) or subunit II (S. mansoni). A complete table can be found in Supplementary Material 03. Predicted PQS in mtDNA of the three analyzed schistosoma species were compared looking for conserved motifs or sequences. We aligned the predicted PQS and generated their LOGO using the WEB LOGO tool (39) and the results are presented in Supplementary Material 04 for the most conserved sequences among the three Schistosoma species analyzed here: S. haematobium, S. japonicum and S. mansoni. A conservation of ∼70% is found in these motifs. A similar level of conservation was seen among all six Platyhelminths. Evaluation of the PQS position shows that these most conserved sequences are in the COX1 gene. Using standard values for G4Hunter (i.e. a window size of 25 nucleotides and threshold score of 1.2), we found over 1.3 million PQS among all 10 genomes. Overall, we do not advocate the use of a single threshold, but prefer to analyze data for various G4Hunter score windows. For this reason, we present these with different threshold windows (Table 1) . We do have a good idea on false positive (FP) rate depending on threshold (<5% for scores >1.25; <2% for score >1.5 and close to 0 for scores >1.75; (45) and unpublished data; false positives are defined as sequences predicted to form a stable quadruplex, but cannot be confirmed experimentally). Our understanding on false negative (FN) rate as a function of threshold is less reliable, primarily because we did not explore as many sequences with relatively low G4Hunter scores (false negatives are sequences forming stable G4 which are missed by the algorithm). Our unpublished data with Dr Laurent Lacroix suggests that 1.2 is a reasonable threshold to maximize accuracy (i.e. minimize the fraction of FP + FN). For this reason, we chose this value as the minimal threshold considered. Higher thresholds (up to 2.0) tend to select sequences with high stability and propensity to form G4 structures. Total PQS counts, percentage of GC and PQS frequencies characteristics for each organism are summarized in Table 2 . The length of analyzed nuclear genomes varied from 42 Mbp (S. stercoralis) to 701 Mbp (T. regenti). The mean GC content was 34.9%, with a minimum of 22.1% for S. stercoralis, and a maximum of 42.2% for T. trichiura. The highest PQS frequencies were found in D. latus in which 473 thousand PQS were present in a 531 Mb genome (giving a PQS frequency of 0.89 PQS per 1000 nucleotides) and T. asiatica with 151,000 PQS in a 168 Mb genome, giving a similar PQS frequency of 0.90 PQS per kb (exact values provided in Table 2 ). In contrast, the parasite with the lowest PQS frequency was S. stercoralis with only 3,037 PQS for a 42.7 Mbp nuclear genome, giving a PQS frequency of 0.071 PQS per kb. We then compared the PQS frequency between moderately and highly infective parasites (40, 41) . Interestingly, PQS frequency partially depends on GC content (a GCpoor genome is less likely to exhibit local G-rich motifs necessary for G4 formation). We present the global density in PQS (for all motifs with a G4Hunter score above 1.2) as a function of GC content for each individual organism in Figure 1A . A detailed analysis for each G4Hunter score interval is provided in Supplementary Material 06. The Spearman's rank correlation coefficient (r s ) was used to determine the association between PQS and GC content (Supplementary Material 05). PQS frequency is correlated with GC% (r s = 0.78). This correlation is however far from perfect: for example, T. trichiura and T. asiatica have almost the same GC content, but their PQS frequencies differ considerably (0.263 for T. trichiuria vs 0.881 PQS per kb for T. asiatica). When considering all potential G4s (for a threshold ≥ 1.2), Platyhelminths look enriched with PQS (with a total of 1,069,344 PQS found) compared to Nematoda (with a total of 251,854 PQS found). The results appear completely different if we restrict the analysis of PQS frequency to motifs with a G4Hunter score above 1.6 ( Figure 1B )--in this case most species exhibit very low PQS frequencies (<0.002/kb), with the striking exception of A. lumbricoides, for which the density is at least 10fold higher than in any other species. In contrast with mitochondria (where no PQS with a score >1.4 was found), we still found a significant number of PQS with high G4Hunter scores (Table 1) . However, and as for other species including bacteria (42) and archaea (43) , the number of G4 motifs found drops significantly when higher thresholds are selected. This observation is valid for all genomes tested so far, including viruses, bacteria, archaea and eukaryotes. The majority of the PQS have a score in the range 1.2-1.4: most PQS sequences have a relatively Figure 2 ). Compared to mtDNA, the nuclear DNAs are still poorly annotated in these species, with exception of C. elegans. Contrary to mtDNA, where PQS are located in the repeat region and gene regions, PQS are slightly enriched before and after gene regions, in the annotations for various RNA such as ncRNA, tRNA and precursor RNA. The most significant enrichment for PQS were found in the regions before and after rRNA and prim transcript (Supplementary Material 10: Elegans Annotations-result.xls). The comparison between phyla is interesting; Homo sapiens, archaea and bacteria are provided for comparison (43) . Of note, PQS with a high G4Hunter score are very rare in Platyhelminths, while low-score PQS are extremely abundant. Platyhelminths behave like archaea and bacteria, with a stronger counter selection against very stable G4 than in Homo sapiens (Figure 2 ). Very stable G4 are therefore strongly counter selected in Platyhelminths as compared to humans: these G4 with high G4Hunter scores are extremely rare. We then analyzed individual Nematode species in Figure 2B . We plotted data for each species (Homo sapiens and the group of three nematodes are again provided for comparison). As can be shown, A. lumbricoides is the organism that maintains the highest relative level of stable PQS at high thresholds, while the three other Nematoda exhibit a drop in relative number of PQS that is comparable (for C. elegans) or even sharper (for S. stercoralis and T. trichiura) than Homo sapiens. A. lumbricoides is therefore unique among nematodes, and actually unique among all species we have studied so far: while its overall density in all G4 motifs is not remarkable (Figure 2) , it can tolerate high-stability G4 structures, suggesting that these motifs are not counter-selected at all. The data is available in Supplementary Material 07. PQS in Ascaris lumbricoides genome. The ability of A. lumbricoides to maintain the highest relative level of stable PQS at high thresholds prompted us to analyze its G4 motifs in more details. To this aim, we selected the 2,313 sequences with a G4Hunter score above 2.0 (Supplementary Material 08) and analyzed them. Table 2 , blue -Platyhelminths, yellow -Nematodes). Two different G4Hunter score thresholds were chosen for this analysis: (A) threshold ≥1.2. (B) threshold ≥1.6. The most salient common feature of nearly all these A. lumbricoides motifs is that they are composed of poly-dG stretches, (or poly-dC when the score is negative) rather than other repetitive motifs (GGGT, GGGA, GGGGT, GGGGA, GGGGC, GGGGTT and GGGGAA would all give G4Hunter scores above 2.0). Out of 2,313 A. lumbricoides sequences with a G4Hunter score above 2.0 (or <2.0), 2311 (99.9%) contain at least one run of at least 10 C/10 G. The length distribution of these runs is presented in Figure 3 . We then checked if a similar behavior was found for nematodes closely related to A. lumbricoides (40) (Supplementary Table S4 ). T. canis contained the highest number of PQS: 242,923 with a frequency of 0.77 PQS/kb and a GC content of 37.6% in the total genome. T. canis is followed by A. suum, with 182,227 PQS, with a frequency of 0.61 PQS/kb and a GC% of 37.7% (Supplementary Table S5 ). The majority of the PQS was found in the G4Hunter threshold 1.2-1.4 (476,476 PQS in total) with a frequency of 0.259 PQS/kb inside this range. Programmed DNA elimination is a feature of nematodes such as A. lumbricoides and A. suum. This developmentally regulated process leads to the reproducible loss of specific genomic sequences in somatic cells, leaving the germline genome intact. In-depth analyses of DNA elimination in A. lumbricoides and T. canis have been performed recently (40) . We analyzed G4 propensity in the breakpoint regions of A. suum, and found that 39 out of 40 regions contained at least one PQS nearby (within 3 kb), suggesting a possible role of G-quadruplexes in this process. When it comes to the total frequency comparison, the majority of the PQS found in A. suum are present before or inside the chromosomal break regions. To test whether the PQS occurrence in chromosomal breakpoint regions is significantly different than in randomly shuffled sequences, we generated 40 random sequences with the same length and nucleotide content as original A. suum breakpoints. We found that 38 out of 40 randomly generated regions contained at least one PQS (compared to the original breakpoint regions where PQS Supplementary Table S3 . occurred in 39 out of 40 regions). However, the mean PQS count decreased from 9.25 PQS per breakpoint to 5.6 PQS. In addition, there was a highly significant difference between PQS counts (P < 0.005) in breakpoints and shuffled sequences. We performed a similar analysis with various G4Hunter thresholds (1.2-1.6). The higher the threshold, the higher the difference (and significance) between A. suum breakpoints and shuffled sequences. For a threshold of 1.6, 35 breakpoint sequences contain a least one PQS, while only 11 shuffled sequences contain one PQS ( Figure 4 ; data provided in Sup Material 09). To check if this trend was valid in other helminths, we performed additional analyzes of the breakpoints in two related species, Toxocara canis and Parascaris equorum, thanks to the data collected by Wang et al. (44) . We found a clear overrepresentation of G4 motifs in breakpoints as compared to shuffled sequences, and this result was valid at all threshold considered (see Supplementary Figure S7 ) as found in A. suum. We identified a number of potential G4-forming motifs in helminth genomes using bioinformatics approaches. While G4Hunter's accuracy is reasonable, and actually excellent for high-scoring motifs (45), we found essential to confirm experimentally that some of these sequences are actually forming G4s, at least in vitro. To do so, we chose four se- For the biophysical characterization of A. lumbricoides motifs, one should note that poly-dG runs have already been characterized in previous articles (46) . For this reason, we focused our efforts on A. lumbricoides motifs that do not correspond to pure homopolymeric runs of guanines. These 20 sequences, which are 22-39 nucleotide-long, have G4Hunter scores between 1.46 and 2.48. G4 formation is considered extremely likely (>98.5%) for sequences with a score above 1.5 (45) but we wanted to provide compelling proof of G4 formation. To demonstrate G4 formation, we used a combination of techniques, starting with FRET-MC, a method we very recently introduced (32) ( Figure 5A ). FRET-MC allows to test multiple sequences in parallel. A negative control (a sequence that does not form a G4 but a duplex, ds26) and two positive controls (sequences known to adopt G4 structures, Pu24T and c-myc) were used for comparison. The FRET-MC method measures the ability of a sequence to compete for binding to a well-known G4 ligand, PhenDC3. This compound is highly selective for G4s: efficient competitors are able to act as decoys for this G4 ligand, leading to a strong decrease in T m of a fluorescent G4-forming oligonucleotide (29) ; when added in large excess, these specific competitors can lead to negligible T m values. As can be seen in this panel, 19 out of 20 sequences considered here acted as efficient competitors (AL1 was found to be less efficient), arguing for G4 formation for most motifs. Concluding on G4 formation based on a single technique is not recommended (45) . Therefore, we used an independent approach, and investigated if the fluorescence emission of G4 light-up probes (47) , such as Thioflavin T (34) and NMM (48) was increased in the presence of A. lumbricoides motifs. Two negative and four positive controls were tested for comparison ( Figure 5B, C) . As shown in these panels, most sequences (including AL1 with NMM) induced significant increases in fluorescence emission to levels comparable or higher than the positive controls tested. Finally, we performed additional spectroscopic experiments. G4s give specific signatures in isothermal and thermal difference spectra (IDS and TDS, respectively). The principle of these experiments is to compare the absorbance properties of the same oligonucleotide, in folded and unfolded state. The arithmetic difference between these two spectra gives a difference spectrum. Unfolding can be achieved by heating (for TDS) or by omitting stabilizing cations (for IDS). G4s exhibit a negative peak around 295 nm and a positive peak around 273 nm for both IDS and TDS (33) . IDS and TDS of Ascaris motifs are shown in Supplementary Figure S3 . Circular dichroism spectra of all AL sequences are presented in Supplementary Figure S4 ; interestingly, most spectra were indicative of parallel G4 formation (either exclusively or predominantly), except for AL14. Altogether, these experiments confirmed predominant G4 formation for 19 out of the 20 Ascaris motifs tested. Of note, the motif for which a G4 may not be the dominant species and/or is of marginal stability (AL1) is the one with the lowest G4Hunter score (1.4) . We first verified that the S. mansoni G4 sequences recently described in (29) formed G4s, using the same techniques as the ones described for A. lumbricoides. As shown in Sup- plementary Figure S5 , this is indeed the case for the five positive sequences described in this article. Interestingly, for the sixth one (smp-196840), while the CD spectra reported by Craven et al. could not be associated with any known G4 CD profile, we were able to conclude that this motif was also able to form a stable G4, as shown by a combination of three independent methods (FRET-MC and two fluorescent light-up probes). The formation of a G4 by this motif is hardly surprising given its high G4Hunter score (2.0); smp-196840 sequence is d-GGGAGGGGGAGAGA GAGAGGGGGAGGTAAAGGG). Overall, we conclude that all six sequences investigated form stable G4s. We next investigated whether G4 ligands (i.e. small compounds which selectively recognize this unusual nucleic acid structure) recently synthesized ( Figure 6A ) (35) (36) would bind to the S. mansoni G4s described in (29) . Six compounds were chosen, with variable levels of stabilization of G4 structures. We performed a FRET-melting assay, in which we measured the melting temperature of a duallabeled fluorescent G4-forming oligonucleotide (F21T, corresponding to the human telomeric motif, but also to S. mansoni telomeres). The tested compounds have variable affinities for the telomeric motif, with T m of +0 to +14 • C. To verify that the active compounds were also able to recognize other S. mansoni quadruplexes, we added these sequences as unlabeled competitors. As shown in Figure 6B , the addition of some, but not all of these oligonucleotides led to a decrease in the stabilization induced by the G4 ligand considered here. This indicates that motifs such as smp163240 or smp319480 were able to act as efficient 'de- coys' for the G4 ligands (even more efficient than the cmyc quadruplex used as a positive control), confirming that these molecules have an affinity for at least some of the G4 motifs (telomeric and non-telomeric) found in S. mansoni genome. In contrast, the ds26 negative control (doublestranded oligonucleotide) and two S. mansoni quadruplexes had little or no effect ( Figure 6B ), suggesting that JG1352 has little or no affinity for these structures. The FRET melting assay presented in Figure 7A illustrates that the best ligands (e.g. JG1352) stabilize all five G-quadruplexes tested in this assay to a variable extent, but do not stabilize the hairpin double-stranded control (FdxT; T m ≈ 0; sequences provided in Supplementary Table S7 ). The next step was to determine if these G4 ligands would have an antiparasitic activity. We tested the biological activ-ity of these six compounds against larval and adult S. mansoni. JG1057 and JG1352 showed high activity at 100 M and 10 M and moderate activity at 1 M against the larval stages. Both compounds revealed also high activity at 10 and 1 M against adult S. mansoni. JG966 revealed a lower activity in particular against adult S. mansoni, affecting adult S. mansoni only at the highest concentration of 10 M (Table 3) . IC 50 values calculated for JG1057 and JG1352 are in the range of praziquantel (a drug currently used to treat parasitic worm infections). Interestingly, the three inactive or weakly active compounds (ligands exhibiting low T m values on G-quadruplexes) ( Figure 7A ) were also significantly less active towards the parasite ( Figure 7B ) than two of the three best ligands. This effect was in particular against adult S. mansoni (both at 1 and 10 M compound concentrations), suggesting that part of the antiparasitic effect of these compounds was mediated by a G4related mechanism. Besides the classical B-DNA double-helix structure, genomic DNA may adopt a variety of non-canonical structures which may play important roles (49) . Repetitive sequences may form G4s (50) or i-DNA, inverted repeats can adopt cruciform structures (51) , while CAG/CTG triplet repeats form unusual duplexes (52) and homopurinehomopyrimidine repeats adopt triplex structures (53) . Many of these structures are involved in human patholo-gies, such as neurological disorders or cancers (53) (54) (55) . Recently their existence have been confirmed in living cells by several methods including secondary structures specific antibodies, synthetic compounds, and structure-sensitive sequencing (55) (56) (57) . One of the most important local DNA structures seems to be G4, which has a better thermal stability compare to the B-DNA (58, 59) . Growing full-genome sequencing data provide an excellent source of information for detailed G4 prediction in various organisms. G4 has been shown to exist in archaea (43) , bacteria (42, 60, 61) and eukaryote domains (19) as well in various viruses (62) where PQS propensity correlate with their host (63): viruses causing acute type of infections (including SARS-CoV2 genome) seem to be depleted for PQS (64, 65) . G4 have been Nucleic Acids Research, 2022 , Vol. 50, No. 5 2731 suggested to be valuable druggable targets for the development of a therapy against various pathogens (60, 66) . The recently published genome sequences of helminth species allowed us to perform comparative analyses of PQS in their genomes including broadly spread pathogenic species (4). We provided compelling evidence that many of these sequences form G4 structures in vitro. This does not necessarily imply that all of them adopt a G4 fold in vivo, as PQS were tested in the absence of flanking sequences, their reverse complement sequence, and proteins that could promote or inhibit folding. Nevertheless, results previously published on S. mansoni with the BG4 antibody attest that at least some of the candidate motifs adopt a quadruplex fold in vivo (29) . Far less potential G4 motifs are found in mitochondrial DNA. This is due to the much smaller size of the mitochondrial genome, even when considering the number of PQS per kb, a lower density is still found on mtDNA, and this observation is true both for platyhelminths and nematodes (Supplementary Figure S6) . Overall, and in stark contrast to humans, where G4 are abundant in mtDNA, G4 seem relatively rare in helminth mtDNA. Differences in GC content partially account for these differences. Nuclear DNA and mtDNA of S. mansoni, S. japonicum, S. haematobium and T. regenti have relatively similar GC content, while for other helminths (D. latus, T. asiatica, T. trichiura, S. stercoralis, A. lumbricoides, C. elegans), the GC content of nuclear DNA is higher by 8% or more over mtDNA. Surprisingly, only one parasite (T. regenti) has a higher mitochondrial over nuclear GC content. This relative paucity in mtDNA prevents an in-depth comparison of mt G4 density between helminths. Nevertheless, some motifs appear conserved between species and are located in gene regions with potential to regulate their expression. On the other hand, numerous PQS are found in the genomes of all ten helminths considered here. Unfortunately, due to lack of annotation in the most species, the only species for which a detailed analysis of the reference genome is possible is C. elegans (67, 68) . These analyses of PQS localization would be interesting to evidence putative G4 regulatory function in translation, which have already been demonstrated in other organisms (69) (70) (71) . While Platyhelminths and humans share the same telomeric motif (TTAGGG) n (72), Nematodes have a slightly different telomeric repeat (TTAGGC) n (73) . This one-nucleotide difference has a strong impact on G4 formation. The human telomeric DNA motif (TTAGGG) n with a G4Hunter score of 1.5 is able to form a stable G4 both in vitro (74) and in vivo (75, 76) . Interestingly, both S. haematobium and S. mansoni have telomeric-like sequences integrated at non-telomeric sites (72) . In contrast, the Nematode telomeric motif (TTAGGC) n has a significantly lower G4Hunter score (0.5). Despite this low score, such motif may form an antiparallel G4 (T m ≈ 40 • C; Marquevielle, submitted), possibly in competition with a fundamentally different secondary structure called a foldback (77) . DNA elimination occurs in a number of species, including nematodes. It mostly corresponds to repetitive sequences and germline-specific genes. Previous analyses suggest that DNA elimination in nematodes silences germline-expressed genes (44) . Their results suggested a sequence-independent mechanism for DNA breakage. Interestingly, we found that there is a clear overrepresentation of G4 motifs in breakpoints as compared to shuffled sequences, and this result was valid at all threshold considered (see Figure 4 and Supplementary Figure S7 ; Supplementary Material 9) and for three nematode species: Ascaris suum, Toxocara canis and Parascaris equorum. This conservation suggests that Gquadruplex formation may be involved in this programmed elimination, perhaps by recruiting a DNA cleaving complex (44) . Telomere healing (that would lead to the insertion of a few telomeric motifs) would not explain this bias given the low G4Hunter score (see § above) of nematode telomeres. Using G4Hunter score as a proxy for G4 stability, there is a strong counterselection against stable G4 in all Platyhelminths, in a manner similar to what is found in archaea and bacteria. G4 motifs with G4Hunter scores above 1.8 or 2.0 tend to be very rare, in comparison with the total number of motifs, as illustrated in Figure 2 . This may suggest that Platyhelminths are unable to cope with very stable G4s, which cause problems during replication or transcription. On the other hand, nematodes follow a profile comparable to humans, where selection against stable G4 is not as strong as in prokaryotes. Poly G runs frequency is worth discussing in the light of the analysis performed by Puig-Lombardi et al. (78) . They analyzed the frequency of (GGGN) 3 GGG motifs (15nt sequence), corresponding to four runs of three guanines plus one extra nucleotide N. When N = G, this corresponds to 'pure' G 15 runs. This analysis was performed on over hundreds of genome assemblies. The authors found that the (GGGA) 3 GGG motif was largely predominant in placental mammals, including humans, while an excess of (GGGG) 3 GGG (pure poly G runs) was found in amphibians, fish, plants, invertebrates and nematodes, including C. elegans (see Fig. S15 of reference (78)). The authors wrote that 'the extreme prevalence (98%) of the G-runs versus the other GGGX motifs in the C. elegans genome and the finding that these sequences are eliminated by complete deletion during development and in animals deficient for the dog-1 helicase suggest that different molecular mechanisms can play a role in handling the equilibrium between the maintenance and the inactivation of short-loop G4-L1 motifs'. What is unique about A. lumbricoides is therefore not the presence of pure poly G runs as compared to other repeats, but the overall abundancy of long PQS with high G4Hunter scores. A. lumbricoides and related species such as A. suum have an exceptional density of high stability G4s (with a high G4Hunter score); far more frequent than any other species. These results suggest that A. lumbricoides must have evolved very efficient mechanisms to cope with these stable secondary structures. Therefore, we checked if its genome contained putative helicases susceptible to unfold G4s. Dog-1 has been found to be involved in the genomic stability of poly dG stretches in C. elegans. For this reason, we looked for orthologs of the Dog-1 helicase domain (aa 80-440) in A. lumbricoides. Interestingly, four genes had homology, suggesting putative helicases involved in G4 resolution/unfolding (see Supplementary Table S8) . Of note, one of them is an ortholog of a human ATPdependent DNA helicase (also called DDX11, CHLR1 or KRG2) reported to act on G4 substrates (79) . Further studies could evaluate the impact of G4 ligands on Ascaris spp. Helminth genomes contain multiple PQS sequences. The comparison of various nematodes and Platyhelminths revealed interesting and marked differences between helminths. Experimental confirmation of G4 formation in vitro was obtained for four different species. Two of the G4 ligands able to bind to Schistosoma mansoni G4 motifs exhibited potent antiparasitic activity both against larval Nucleic Acids Research, 2022, Vol. 50, No. 5 2733 and adult forms of this parasite, opening new perspectives for the use of G4 ligands to fight neglected tropical diseases. These results therefore open new perspectives for the development of novel therapeutic strategies against these widespread helminths. Helminth Infections: soil-transmitted helminth infections and schistosomiasis Soil-transmitted helminth infections: ascariasis, trichuriasis and hookworm. The Lancet Schistosomiasis: still a cause of significant morbidity and mortality Transformative tools for parasitic flatworms Antenatal anthelmintic treatment, birthweight, and infant survival in rural Nepal Decision making on helminths in cattle: diagnostics, economics and human behaviour Soil-transmitted helminth infections: updating the global picture Human infection with Strongyloides stercoralis and other related Strongyloides species Human schistosomiasis Schistosomiasis chemotherapy Benzimidazole resistance in helminths: from problem to diagnosis Elimination of schistosomiasis: the tools required The genome of the blood fluke Schistosoma mansoni The Schistosoma japonicum genome reveals features of host-parasite interplay Whole-genome sequence of Schistosoma haematobium Structure and function of multimeric G-quadruplexes Quadruplex DNA: sequence, topology and structure Stability and kinetics of G-quadruplex structures Identification of G-quadruplex clusters by high-throughput sequencing of whole-genome amplified products with a G-quadruplex ligand 2020) G-quadruplex, friend or foe: the role of the G-quartet in anticancer strategies Targeting G-quadruplexes in gene promoters: a novel anticancer strategy? Direct evidence for a G-quadruplex in a promoter region and its targeting with a small molecule to repress c-MYC transcription Developing novel G-quadruplex ligands: from interaction with nucleic acids to interfering with nucleic acid-protein interaction Non-B DNA structure-induced genetic instability and evolution Stable G-quadruplexes are found outside nucleosome-bound regions Whole genome experimental maps of DNA G-quadruplexes in multiple species Guanine quadruplex structures localize to heterochromatin Microenvironment-sensitive fluorescent ligand binds ascaris telomere antiparallel G-quadruplex DNA with blue-shift and enhanced emission Identifying and validating the presence of guanine-quadruplexes (G4) within the blood fluke parasite Schistosoma mansoni G4Hunter web application: a web server for G-quadruplex prediction Following G-quartet formation by UV-spectroscopy FRET-MC: a fluorescence melting competition assay for studying G4 structures in vitro Thermal difference spectra: a specific signature for nucleic acid structures Thioflavin T as a fluorescence light-up probe for G4 formation Design, synthesis, and antiproliferative effect of 2,9-bis Design, synthesis, and antiprotozoal evaluation of new 2,9-bis[(substituted-aminomethyl)phenyl]-1,10-phenanthroline derivatives Design, synthesis, and antiprotozoal evaluation of new 2,4-bis Life cycle maintenance and drug-sensitivity assays for early drug discovery in Schistosoma mansoni WebLogo: a sequence logo generator Comparative genomics of the major parasitic worms Soil-transmitted helminth infections The presence and localization of G-quadruplex forming sequences in the domain of bacteria Quadruplexes in the archaea domain. Biomolecules, 10 Comparative genome analysis of programmed DNA elimination in nematodes Re-evaluation of G-quadruplex propensity with G4Hunter Formation of G-quadruplexes in poly-G sequences: structure of a propeller-type parallel-stranded G-quadruplex formed by a G15 stretch Visualizing the quadruplex: from fluorescent ligands to light-up probes N -methylmesoporphyrin IX fluorescence as a reporter of strand orientation in guanine quadruplexes Distinct DNA repair pathways cause genomic instability at alternative DNA structures The structure and function of DNA G-quadruplexes Cruciform structures are a common DNA feature important for regulating biological processes Stability of intrastrand hairpin structures formed by the CAG/CTG class of DNA triplet repeats associated with neurological diseases binds preferentially to non-B DNA structures formed by the pyrimidine-rich strands of GAA·TTC trinucleotide repeats associated with Friedreich's ataxia Emerging role of G-quadruplex DNA as target in anticancer therapy Landscape of G-quadruplex DNA structural regions in breast cancer Alternative DNA structures in vivo: molecular evidence and remaining questions. Microbiol Visualization of parallel G-quadruplexes in cells with a series of new developed Bis(4-aminobenzylidene)acetone derivatives DNA secondary structures: stability and function of G-quadruplex structures Effect of pressure on thermal stability of G-quadruplex DNA and double-stranded DNA structures 2021) G-Quadruplex structures in bacteria -biological relevance and potential as antimicrobial target Structures and stability of simple DNA repeats from bacteria Viral G-quadruplexes: new frontiers in virus pathogenesis and antiviral therapy Tracing dsDNA virus-host coevolution through correlation of their G-quadruplex-forming sequences Analyses of viral genomes for G-quadruplex forming sequences reveal their correlation with the type of infection In-depth bioinformatic analyses of human SARS-CoV-2, SARS-CoV, MERS-CoV, and other nidovirales suggest important roles of noncanonical nucleic acid structures in their lifecycles Viral G-quadruplexes: new frontiers in virus pathogenesis and antiviral therapy Illumina synthetic long read sequencing allows recovery of missing sequences even in the Finished C. elegans genome Mutagenic capacity of endogenous G4 DNA underlies genome instability in FANCJ-defective C. elegans Where are G-quadruplexes located in the human transcriptome? A 5' UTR GGN repeat controls localisation and translation of a potassium leak channel mRNA through G-quadruplex formation A small molecule that represses translation of G-quadruplex-containing mRNA Chromosomal differentiation of schistosomes: what is the message? New telomere formation after developmentally regulated chromosomal breakage during the process of chromatin diminution in Ascaris lumbricoides Human telomeric G-quadruplex: structures of DNA and RNA sequences Evaluation of parameters critical for observing nucleic acids inside living Xenopus laevis oocytes by in-cell NMR spectroscopy Quantitative visualization of DNA G-quadruplex structures in human cells Unique C. elegans telomeric overhang structures reveal the evolutionarily conserved properties of telomeric DNA Thermodynamically stable and genetically unstable G-quadruplexes are depleted in genomes across species RNA helicase DDX1 converts RNA G-quadruplex structures into R-loops to promote IgH class switch recombination Identification of antischistosomal leads by evaluating bridged 1,2,4,5-tetraoxanes, alphaperoxides, and tricyclic monoperoxides We thank S. Amrane (IECB), L. Lacroix (ENS) and T. Mardivirin (LOB) for helpful discussions. Supplementary Data are available at NAR Online.