key: cord-0028204-s4gnprby authors: Bendifallah, Sofiane; Dabi, Yohann; Suisse, Stéphane; Jornea, Ludmila; Bouteiller, Delphine; Touboul, Cyril; Puchar, Anne; Daraï, Emile title: MicroRNome analysis generates a blood-based signature for endometriosis date: 2022-03-08 journal: Sci Rep DOI: 10.1038/s41598-022-07771-7 sha: b7d6cf0956cce3d0bacbf24fad6069904ff03f88 doc_id: 28204 cord_uid: s4gnprby Endometriosis, characterized by endometrial-like tissue outside the uterus, is thought to affect 2–10% of women of reproductive age: representing about 190 million women worldwide. Numerous studies have evaluated the diagnostic value of blood biomarkers but with disappointing results. Thus, the gold standard for diagnosing endometriosis remains laparoscopy. We performed a prospective trial, the ENDO-miRNA study, using both Artificial Intelligence (AI) and Machine Learning (ML), to analyze the current human miRNome to differentiate between patients with and without endometriosis, and to develop a blood-based microRNA (miRNA) diagnostic signature for endometriosis. Here, we present the first blood-based diagnostic signature obtained from a combination of two robust and disruptive technologies merging the intrinsic quality of miRNAs to condense the endometriosis phenotype (and its heterogeneity) with the modeling power of AI. The most accurate signature provides a sensitivity, specificity, and Area Under the Curve (AUC) of 96.8%, 100%, and 98.4%, respectively, and is sufficiently robust and reproducible to replace the gold standard of diagnostic surgery. Such a diagnostic approach for this debilitating disorder could impact recommendations from national and international learned societies. genome-wide miRNA expression profiling by small RNA sequencing from plasma available in a biobank, Vanhie et al. identified a set of 42 miRNAs with discriminative power to differentiate between patients with and without endometriosis. Expression of 41 of these miRNAs was confirmed by RT-qPCR and three diagnostic models were built to discriminate between controls and all stages of endometriosis: minimal-mild endometriosis, and moderate to severe endometriosis. Only the model for minimal-mild endometriosis (miR-125b-5p, miR-28-5p and miR-29a-3p) exhibited an AUC of 60%, and while its sensitivity was acceptable at 78% the specificity was only 37% 14 . Selecting some miRNAs altered in endometriosis from a large screen, Moustafa et al. reported increased expression of four serum miRNAs (miR-125b-5p, miR-150-5p, miR-342-3p, miR-451a) and decreased expression of two (miR-3613-5p, let-7b). The authors concluded that their 6-miRNA signature was able to differentiate patients with endometriosis from those with other gynecologic disorders with an accuracy > 0.9 15 . However, overall, the studies in this field are based on small sample sizes limiting the validation of the signatures. Furthermore, discrepancies in methodology (study design, collection, storage, sequencing techniques, and statistical approach) have a particularly strong influence on the results of small studies 4, 16, 17, 20, 26 . In addition, miRNA selection based on the highest AUC is of low accuracy since the extreme variability of the endometriosis phenotypes has a major impact on the AUC. This may explain why signatures composed of a small selection of miRNAs are of low validity, stability, and reproducibility 4, 16, 17, 20, 26 . Thus, despite the findings of these studies, no new blood-based biomarkers are currently used in clinical practice for the diagnosis of endometriosis. Therefore, the aim of the prospective ENDO-miRNA study, using both Artificial Intelligence (AI) and Machine Learning (ML), was to analyze the current human miRNAome to differentiate between patients with and without endometriosis, and to develop a blood-based miRNA diagnostic signature for endometriosis with internal cross-validation. Ethics statement. Data and plasma collection were from the prospective ENDO-miRNA study (Clinical-Trials.gov Identifier: NCT04728152). The Research Protocol (n° ID RCB: 2020-A03297-32) was approved by the ethics committee "Comité de Protection des Personnes (C.P.P.) Sud-Ouest et Outre-Mer 1" (CPP 1-20-095 ID 10476). All participants included in the study gave their written and informed consent for the use of their data. All the procedures were performed in accordance with the relevant guidelines and regulations. The study and data analysis followed the STAndards for the Reporting of Diagnostic accuracy studies (STARD) guidelines 27 (Annex 1). The study consisted of two parts: (i) biomarker discovery based on genomewide miRNA expression profiling by small RNA sequencing using next generation sequencing (NGS), and (ii) development of a miRNA diagnostic signature according to expression and accuracy profiling using an ML algorithm 28-38 . Study population. The prospective ENDO-miRNA study included 200 plasma samples obtained from women with chronic pelvic pain suggestive of endometriosis. All the plasma samples were collected from the participants between January and June 2021. All the patients underwent either a laparoscopic procedure (operative or diagnostic) and/or MRI imaging [9] [10] [11] [12] . The laparoscopic procedures were systematically videoed and then analyzed by two operators (CT, YD) who were blinded to the symptoms and imaging findings, to confirm the presence or absence of endometriosis. For the patients who underwent laparoscopy, diagnosis was confirmed by histology. Patients who were diagnosed with endometriosis without laparoscopic evaluation, all had MRI findings with features of deep endometriosis with colorectal involvement, and/or endometrioma confirmed by a multidisciplinary endometriosis committee. Following exploration by laparoscopy or MRI, the women were classified into two groups: an endometriosis group; and a control group of women with various benign pathologies other than endometriosis or with symptoms suggestive of endometriosis but without clinical or MRI features and no endometriosis lesions found during laparoscopic inspection (complex patients). The study flow chart is reported in Fig. 1 . The patients with endometriosis were stratified according to the revised American Society of Reproductive Medicine (rASRM) classification 39 . Plasma sample collection. The blood samples (4 mL) were collected in EDTA tubes (BD, Franklin Lakes, NJ, USA) before the surgery. The plasma was isolated from whole blood within 2 h after blood sampling by two successive centrifugations at 4 °C (first at 1900g (3000 rpm) for 10 min, followed by 13,000-14,000g for 10 min to remove all cell debris), then aliquoted, labeled and stored at − 80 °C until analysis as previously described [40] [41] [42] . The miRNAs were automatically extracted with a Promega Maxwell ® Instrument to avoid cross contamination. Extractions and quality control (QC) were conducted in an accredited biobank (NFS96-900) to guarantee good processes. The samples were anonymized. NGS library preparation was performed individually under ISO-9001-2015 certification. QC was performed before pooling the indexed samples. After sequencing, demultiplexing was done with ILLUMINA bcl2fastq. To avoid mixing, exchanging or cross-contamination, each sample or preparation was followed with its own Laboratory Information Management System (LIMS). RNA sample extraction, preparation and quality control. RNA Differential expression analysis of miRNA. Expression level quantification of the miRNAs was first determined by miRDeep2 47 . Differential expression tests were then conducted in DESeq2 only for the miRNAs with read counts in ≥ 1 of the samples. DESeq2 integrates methodological advances with several novel features to facilitate a more quantitative analysis of comparative RNA-seq data using shrinkage estimators for dispersion and fold change 48, 49 . miRNAs were considered as differentially expressed if the absolute value of log2-fold change was > 1.5 (upregulated) and < 0.5 (downregulated). The P value adjusted for multiple testing was < 0.05 48 Description of the ENDO-miRNA cohort. The ENDO-miRNA study included 200 patients, with 76.5% (n = 153) who were diagnosed with endometriosis, and 23.5% (n = 47) without (controls), respectively. Among patients with endometriosis, 52% (n = 80) and 48% (n = 73) were staged rASRM stage I-II versus with III-IV. The control group is composed in majority (51% (n = 24)) by women with no abnormality after laparoscopic diagnostic. The clinical and demographics characteristics of patients are summarized in Table 1 . There were no significant differences in terms of age and body mass index (BMI) between the groups. Compared to the control group, the endometriosis group had higher rates of sciatica pain (p = 0.021), dyspareunia (p < 0.001), lower back pain outside menstruation (p = 0.049), and urinary pain during menstruation (p < 0.001). Global overview of the miRNA transcriptome. The sequencing of the 200 plasma samples for small RNA-seq provided ~ 4228 M raw sequencing reads (from ~ 11.7 M to ~ 34.98 M reads/sample). After filtering steps, we retained 39% (~ 1639 M) of initial raw reads. Among those, the majority of were described as 20-23 nt length which corresponds to mature miRNA sequences. The identification of known miRNAs provided ~ 2588 M sequences which have been mapped to 2633 known miRNAs from miRbase (v22). The expressed miRNAs ranged from 666 to 1274 per blood sample. The overall composition of processed reads is shown in Annex 2. Accuracy of the miRNAs to diagnose endometriosis. Of the 2561 miRNAs known to be related to endometriosis, the feature selection generated a subset of 86 miRNAs. According to the F1-score, sensitivity, specificity and AUC values ranged from 0-88.2%, 0-99.4%, 4-100%, and 50-68%, respectively. Among the 86 miRNAs selected, 20% (n = 69) had an AUC value < 60%, and 80% (n = 17) a value ≥ 60%; for the FI-scores, 50% (n = 43) and 50% (n = 43) had a value ranging between 0-79%, and ≥ 80%, respectively; 51% (n = 44) and 49% (n = 42) had a sensitivity ranging between 0-79%, and ≥ 80%, respectively; and 77% (n = 94) and 23% (n = 20) had a specificity ranging between 0-79%, and ≥ 80%, respectively. Among these, 42% (n = 36) were identified as being downregulated, 6% (n = 5) as being upregulated, and 52% (n = 45) as being unregulated. Annex 3 summarizes the relative expression of a panel of the most accurate miRNAs for dysmenorrhea, hormonal treatment status, and rASRM stage (I-II vs III-IV). The signature composition and a summary of the diagnostic accuracy of each of the 86 miRNAs selected is reported in Table 2 . www.nature.com/scientificreports/ Diagnostic importance of the miRNAs for blood signature. Among the 86 miRNAs composing the blood signature, 10 have the greatest potential value: namely, miRNAs 124-3p, 6509-5p, 548l, 26a-2-3p, 3622a-3p, 3168, 29b-1-5p, 30e-3p, 3124-5p, 4511. The diagnostic importance of the miRNAs is reported in Fig. 2 . Among these 10 miRNAs, one (miRNA124-3p) has been previously reported in the setting of endometriosis. miRNA blood-based diagnostic signature for endometriosis. The overall performance of the ML models against the 10 datasets are reported in Table 3 . Against the 10 datasets randomly generated, the sensitivity, specificity, and AUC ranged from 80.6 to 96.8%, 77.8 to 100%, and 76.2 to 98.4%, respectively. The most accurate signature (n°3) after internal cross-validation provides a sensitivity, specificity, and AUC of 96.8%, 100%, and 98.4%, respectively (Table 3) . miRNAs composing the diagnostic signature, 40.7% (35/86) have not been previously described in the human. The remaining have been described in both benign and malignant conditions (Table 4 ). Almost 30% of the 86 miRNAs are downregulated, and many of them are related to the PI3K/Akt and MAPK pathways. Figure 3 illustrates the network, pathways, and functions for the relevant miRNAs associated with these pathways 55, 56 . Only miR-124-3p has previously been reported in patients with endometriosis. Details concerning the exhaustive signaling pathways and targeted regulators are summarized in Annex 4. We present here a blood-based diagnostic signature combining a selected panel of 86 miRNAs extracted from patients with chronic pelvic pain suggestive of endometriosis participating in the prosspective ENDO-miRNA study. To the best of our knowledge, this is the first blood-based diagnostic signature obtained from a combination of two robust and disruptive technologies merging the intrinsic quality of miRNAs to condense the endometriosis phenotype (and its heterogeneity) with the modeling power of AI. The most accurate signature provides a www.nature.com/scientificreports/ sensitivity, specificity, and AUC of 96.8%, 100%, and 98.4%, respectively, and is sufficiently robust and reproducible to replace the gold standard of diagnostic surgery. www.nature.com/scientificreports/ We hypothesize that this signature could have large implications for clinical practice in improving endometriosis care pathways by significantly reducing time to diagnosis and therapeutic wandering. In the specific setting of endometriosis, multiple biomarkers 13, 18, 64 , genomic analyses 32,57 , questionnaires 5,58,59 , symptom-based algorithms 5 , and imaging techniques 12 have been advocated as screening and triage tests for endometriosis. However, to date, none have demonstrated sufficient clinical accuracy, i.e., a sensitivity of 0.94 and specificity of 0.79 12, 13, 18 . The present signature composed of 86 miRNAs exceeds the required sensitivity and specificity metrics suggesting high clinical value. In addition, as stated by Agrawal et al. 4 the main characteristic's for relevant biomarker for clinical use is one which is (i) specific to the disorder, (ii) associated with early stage of the disease, (iii) accessible and acceptable with non-invasive procedure, (iv) biologically stable and clinically reproducible, and (v) associated with known or potential pathophysiological mechanisms. Therefore, to subscribe to Agrawal et al. 's criteria and improving endometriosis diagnosis, the prospective ENDO-miRNA study was designed to analyze the entire humain miRNome especially for (i) complex women (women with chronic pelvic pain suggestive of endometriosis and both negative clinical examination and imaging findings), (ii) women various phenotypes based on early and advanced stages (I-II vs III-IV rASRM) and (v) women with other gynecologic disorders sharing the symptoms of endometriosis. The exhaustive analyze of all miRNAs (n = 2633) from 200 blood samples of patients with without endometriosis allow to capture the complexity of the disease and in fine to illustrate its heterogeneity. The data that emerged from this analysis, resulted in the combination of a large set of 86 miRNAs robustly selected by 10 reproducible statistical methods (and not only based on the AUC criteria as previous reports). miRNA selection based purely on the highest AUC is of low accuracy because the extreme variability of endometriosis has a major impact on AUC. This point may explain the low validity, stability and reproducibility of using a few miRNAs to design a signature. To date, only studies evaluating a limited number of mi-RNAs 14,17,20,21,26 using classic logistic regression have been published. These studies show that some miRNAs are deregulated in patients with endometriosis. For example, in a retrospective study using blood samples from a biobank, Vanhie et al. 14 failed to build a signature based on 42 miRNAs divided into three models of three miRNAs each, mainly because the authors focused on the accuracy of each miRNA to design a signature. In agreement with Lopez-Rincon et al. [36] [37] [38] it would appear illusory that endometriosis-a highly heterogeneous multifactorial disorder with various phenotypes and characterized by incomplete knowledge of the various pathologic pathways-could be reflected by a few miRNAs. Therefore, we decided (i) to select specific miRNAs based on 10 statistical methods (resulting in a selection of 86 miRNAs), and (ii) to use several highly accurate ML models which support the value of AI technology as a disruptive approach. Such an approach has been previously validated in cancer showing that a 100-miRNA signature was sufficiently stable to provide almost the same classification accuracy across different types of cancers and platforms 36, 37 . Numerous studies have evaluated blood or plasma miRNA expression as potential biomarkers for endometriosis but with discordant results, probably because of study design issues but also because of limitations inherent to the biological techniques used 17 . For example, Yang et al. 60 found 61 miRNAs (36 downregulated and 25 upregulated) significantly expressed in the serum of patients with endometriosis by array analysis, but only five were validated by qRT-PCR. These data underline the importance of NGS platforms for miRNA profiling. Although considerable computational support is needed, these platforms are of high sensitivity and resolution, and of excellent reproducibility allowing the analysis of millions of RNA fragments. As described by A C 't Hoen et al. 61 , bioinformatics allows the exhaustive analysis of all RNA fragments that can be aligned and mapped, and their expression levels quantified, thus eliminating the need for sequence specific hybridization probes or qRT-PCR which are required in a microarray 62 . From a pathophysiologic point of view, a systematic review revealed that 45% of the 86 miRNAs composing our endometriosis signature have not previously been reported in the human. Only miR-124-3p has previously been reported in patients with endometriosis, and is involved in ectopic endometrial cell proliferation and invasion in both benign and malignant disorders 63 . In addition, miR-124-3p has been found to be involved in various signaling pathways such as mTOR STAT3, PI3K/Akt, NF-κB, ERK, PLGF-ROS, FGF2-FGFR, MAPK, GSK3B/β -catenin 64, 65 . The remaining miRNAs of the signature have previously been identified as being involved in both benign and malignant disorders with the main signaling pathways being JAK/STAT, NF-KB, YAP/TAZ, PIK3/Akt, www.nature.com/scientificreports/ Wnt/β-catenin, FOXO, MAPK, p53, mTOR and TGF-ß. All these data open new avenues to better understand the pathophysiology of endometriosis and to develop new therapeutic options already used in other pathologies. Some limits of the present study deserve to be discussed. First, some of our patients-in both the endometriosis and control group-had a prior hormonal treatment that may have affected miRNA expression. However, Vanhie et al. reported that no miRNAs changed significantly with the menstrual cycle 14 . Moreover, Moustafa et al. found that miRNAs remained unchanged both throughout the menstrual cycle and in response to sex steroid hormone treatment 15 . Second, among the 10 miRNAs with the most important diagnostic value only miRNA124-3p has been previously reported in the setting of endometriosis which suggests that external validation is required. Third, our signature was based on patients aged between 18 and 43 years excluding adolescents with pelvic pain. Therefore, an additional study should be performed for adolescent patients. Fourth, although no difference was observed in miRNA expression between patients with dysmenorrhea under or over VAS 7, no attempt was made to correlate symptoms with the various locations of endometriosis. Finally, some patients with deep endometriosis and/or endometrioma were included in the endometriosis group without having undergone laparoscopy and this represents a potential bias. However, the meta-analysis by Nisenblat et al. demonstrated that MRI fulfills the criteria for a replacement and SnNout triage test for endometrioma, colorectal and pouch of Douglas obliteration related to endometriosis 12 . www.nature.com/scientificreports/ Clinical practice. Endometriosis The burden of endometriosis: Costs and quality of life of women with endometriosis and treated in referral centres The miRNA mirage: How close are we to finding a non-invasive diagnostic biomarker in endometriosis? A systematic review Patient-completed or symptom-based screening tools for endometriosis: A scoping review Barriers and facilitators to the timely diagnosis of endometriosis in primary care in the Netherlands Challenges in uncovering non-invasive biomarkers of endometriosis Endometriosis Priority Setting Partnership Steering Group (appendix) Diagnostic accuracy of physical examination, transvaginal sonography, rectal endoscopic sonography, and magnetic resonance imaging to diagnose deep infiltrating endometriosis Deep pelvic endometriosis: MR imaging for diagnosis and prediction of extension of disease Magnetic resonance imaging for deep infiltrating endometriosis: Current concepts, imaging technique and key findings Imaging modalities for the non-invasive diagnosis of endometriosis Combination of the non-invasive tests for the diagnosis of endometriosis Plasma miRNAs as biomarkers for endometriosis Accurate diagnosis of endometriosis using serum microRNAs miRNAs regulation and its role as biomarkers in endometriosis Overview of miRNAs for the non-invasive diagnosis of endometriosis: evidence, challenges and strategies. A systematic review. Einstein Sao Paulo Braz. 19, eRW5704 Blood biomarkers for the non-invasive diagnosis of endometriosis Biomarkers for the noninvasive diagnosis of endometriosis: State of the art and future perspectives Analysis of exosomal lncRNA, miRNA and mRNA expression profiles and ceRNA network construction in endometriosis A panel of plasma miRNAs 199b-3p, 224-5p and Let-7d-3p as non-invasive diagnostic biomarkers for endometriosis MicroRNAs: Genomics, biogenesis, mechanism, and function The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14 Target recognition and regulatory functions RNAi, microRNAs, and human disease Role of non-coding RNAs in the pathogenesis of endometriosis An updated list of essential items for reporting diagnostic accuracy studies Logistic regression and artificial neural network classification models: A methodology review Potential application of machine learning in health outcomes research and some statistical cautions A review of challenges and opportunities in machine learning for health Artificial intelligence and deep learning: The future of medicine and medical practice GenomeForest: An ensemble machine learning classifier for endometriosis Machine learning predicts live-birth occurrence before in-vitro fertilization treatment Precision medicine in the era of artificial intelligence: Implications in chronic disease management Circulating microRNA profile as a potential biomarker for obstructive sleep apnea diagnosis Machine learning-based ensemble recursive feature selection of circulating miRNAs for cancer tumor classification Automatic discovery of 100-miRNA signature for cancer classification using ensemble feature selection Recursive ensemble feature selection provides a robust mRNA expression signature for myalgic encephalomyelitis/chronic fatigue syndrome EQUSUM: Endometriosis QUality and grading instrument for SUrgical performance: Proof of concept study for automatic digital registration and classification scoring for r-ASRM, EFI and Enzian Micro-RNA signature of lymphovascular space involvement in type 1 endometrial cancer Identification of microRNA expression profile related to lymph node status in women with early-stage grade 1-2 endometrial cancer Identification of micro-RNA expression profile related to recurrence in women with ESMO low-risk endometrial cancer Integrating miRNA and gene expression profiling analysis revealed regulatory networks in gastrointestinal stromal tumors MiRNA profiling of gastrointestinal stromal tumors by next-generation sequencing Identification of long intergenic non-coding RNAs (lincRNAs) deregulated in gastrointestinal stromal tumors (GISTs) A bioinformatics approach to microRNA-sequencing analysis Evaluation and application of tools for the identification of known microRNAs in plants Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Consensus miRNA expression profiles derived from interplatform normalization of microarray data Random forest of perfect trees: Concept, performance, applications, and perspectives Status of surgical management of borderline ovarian tumors in France: Are recommendations being followed? Multicentric French Study by the FRANCOGYN Group Fertility preservation in women with malignant and borderline ovarian tumors: Experience of the French ESGOcertified center and pregnancy-associated cancer network (CALG) Multivariable prognostic models: Issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors Prognostic modelling with logistic regression analysis: A comparison of selection and estimation methods in small data sets Kyoto encyclopedia of genes and genomes Toward understanding the origin and evolution of cellular organisms Machine learning classifiers for endometriosis using transcriptomics and methylomics data Development and content validation of two new patient-reported outcome measures for endometriosis: The Endometriosis Symptom Diary (ESD) and Endometriosis Impact Scale (EIS) Development of a prediction model to aid primary care physicians in early identification of women at high risk of developing endometriosis: Cross-sectional study Microarray analysis of microRNA deregulation and angiogenesis-related proteins in endometriosis Deep sequencing-based expression analysis shows major advances in robustness, resolution and inter-lab portability over five microarray platforms Salivary microRNA for diagnosis of cancer and systemic diseases: A systematic review LncRNA-H19 regulates cell proliferation and invasion of ectopic endometrium by targeting ITGB3 via modulating miR-124-3p miR-124-3p inhibits microglial secondary inflammation after basal ganglia hemorrhage by targeting TRAF6 and repressing the activation of NLRP3 inflammasome miR-124-3p ameliorates isoflurane-induced learning and memory impairment via targeting STAT3 and inhibiting neuroinflammation /scientificreports/ on the DNA and cell bank and the iGenSeq core facilities of ICM. We gratefully acknowledge their contribution for sample management and analysis All authors would sincerely like to thank F. Neilson for revising the English of the manuscript. All authors would sincerely like to thank S. Boubat for her valuable help in conducting the study. Part of this work was carried out S.B., Y.D., S.S., E.D. conceived and designed the study. S.B., Y.D., C.T., A.P., E.D. included patients and performed the surgical procedures. All authors analyzed the data and wrote the manuscript. All authors have read and agreed to the published version of the manuscript. Part of this work was funded by a grant from the Conseil Régional d'Ile de France (Grant number EX024087) and from Ziwig, Inc. S. Suisse is a former employee of Ziwig, Inc. The remaining authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript. Supplementary Information The online version contains supplementary material available at https:// doi. org/ 10. 1038/ s41598-022-07771-7.Correspondence and requests for materials should be addressed to S.B.Reprints and permissions information is available at www.nature.com/reprints. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.