key: cord-0254093-39hcdj4t authors: Fitt, B.; Loy, G.; Christopher, E.; Brennan, P. M.; Poon, M. T. title: Analytic approaches to clinical validation of results from preclinical models of glioblastoma: a systematic review date: 2021-09-08 journal: nan DOI: 10.1101/2021.09.04.21263119 sha: db3eb5b2252d8edb2a736240a32298096bb44847 doc_id: 254093 cord_uid: 39hcdj4t Background: Analytic approaches to clinical validation of results from preclinical models are important in assessment of their relevance to human disease. This systematic review examined consistency in reporting of glioblastoma cohorts from The Cancer Genome Atlas (TCGA) and assessed whether studies included patient characteristics in their survival analyses. Methods: We searched Embase and Medline on 02Feb21 for studies using preclinical models of glioblastoma published after Jan2008 that used data from TCGA to validate the association between at least one molecular marker and overall survival in adult patients with glioblastoma. Main data items included cohort characteristics, statistical significance of the survival analysis, and model covariates. Results: There were 58 eligible studies from 1,751 non-duplicate records investigating 126 individual molecular markers. In 14 studies published between 2017 and 2020 using TCGA RNA microarray data that should have the same cohort, the median number of patients was 464.5 (interquartile range 220.5-525). Of the 15 molecular markers that underwent more than one univariable or multivariable survival analyses, five had discrepancies between studies. Covariates used in the 17 studies that used multivariable survival analyses were age (76.5%), pre-operative functional status (35.3%), sex (29.4%) MGMT promoter methylation (29.4%), radiotherapy (23.5%), chemotherapy (17.6%), IDH mutation (17.6%) and extent of resection (5.9%). Conclusions: Preclinical glioblastoma studies that used TCGA for validation did not provide sufficient information about their cohort selection and there were inconsistent results. Transparency in reporting and the use of analytic approaches that adjust for clinical variables can improve the reproducibility between studies. • Despite using the same data from The Cancer Genome Atlas, translational preclinical studies in glioblastoma research included different numbers of patients into their analyses and their results were inconsistent. • Fewer than a third of the studies used multivariable survival analysis to adjust for clinical variables but most did not take treatment factors into account. • Greater transparency in cohort selection from open access data and integration of clinical variables into analyses will help improve reproducibility in glioblastoma research. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) Introduction Glioblastoma, the most common brain tumour, 1,2 is lethal and therapeutic options have only a modest and temporary impact on survival. 3, 4 Discovery science has advanced our understanding of cancer cell biology and is a step towards developing novel therapies. 5 These discoveries are usually based on preclinical models, from which the relevance to human disease must be established. Demonstrating relevance requires quality clinical and biological data. The Cancer Genome Atlas (TCGA) 6 and the Chinese Glioma Genome Atlas (CGGA) 7 are two open-access resources from which laboratory scientists can interrogate human data to verify their findings in glioblastoma research. These resources are valuable for the molecular characterisation of glioblastoma and can also be used to examine the associations between molecular markers of interest and survival. An association with survival might implicate a molecular marker as a potential drug target. Isolated analyses of genomic data are unlikely to provide an adequate assessment of the role of molecular features in patient outcomes. Univariable survival analyses that take on only one molecular marker do not account for other markers or clinical features. 8 The resulting associations from such analyses are subjected to confounding effects, which may render them unreliable. Multivariable analyses are preferable and should be facilitated by open access policies that permit researchers to use the same set of data for different analyses. 9 This is crucial for replicability and comparison of analyses, and to ensure the science that progresses to clinical trials is well founded. This systematic review examined the consistency in reporting of cohorts from TCGA and CGGA and whether studies included patient characteristics in their survival analyses. This review included studies that used data from TCGA or CGGA to examine the association between at least one molecular marker and overall survival in adult patients aged ≥ 18 years diagnosed with non-recurrent histopathologically confirmed glioblastoma. Studies using both TCGA and CGGA were eligible if results were stratified by the data resources. We only included studies that used cell or animal models to first identify molecular markers associated with tumour biology, . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted September 8, 2021. ; https://doi.org/10.1101/2021.09.04.21263119 doi: medRxiv preprint then examined the association between these markers and overall survival in humans using TCGA or CGGA data. We excluded case reports, reviews, editorials and conference abstracts. We searched Embase and Medline on 02 February 2021 for potentially eligible studies published after January 2008 using search terms relating to "glioma", "survival", "TCGA" and "CGGA" (Supplementary Materials). The lower limit of the search period was set because data from TCGA first became available in 2008. After removing duplicate studies, two independent reviewers (B.F. and G.L.) performed screening using titles and abstracts followed by full-text eligibility assessment. Any disagreements at each stage were resolved through discussion with a third reviewer (M.T.C.P.). Two reviewers (B.F. and G.L.) independently collected data from each study using the online systematic review management software Covidence. Disagreements were resolved by discussion between the two reviewers or by involving a third reviewer (M.T.C.P.). Data items included study characteristics, TCGA cohort characteristics, CGGA cohort characteristics, genomic data used, molecular markers, and details of survival analysis. Molecular markers included expression, variants, or methylation of genes, RNAs and microRNAs. A set of molecular markers was defined by a grouping and analysis of >1 molecular markers together. We categorised survival analysis into univariable and multivariable analysis, and we collected the covariates entered into the multivariable analysis. To describe the association between molecular markers and survival, we considered the reported p value of <0.05 as statistical significance. If a study reported results from both TCGA and CGGA cohorts, we extracted the statistical significance of these results separately. Data on effect sizes and their corresponding 95% confidence intervals (CI) were not collected because studies using log-rank (Mantel-Cox) tests to compare survival between study-specific groups do not provide these data and there was no plan for meta-analysis. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted September 8, 2021. ; https://doi.org/10.1101/2021.09.04.21263119 doi: medRxiv preprint There was no risk of bias assessment tool directly relevant to studies in this review. However, we assessed components of the study design relating to risk of bias. These measures of quality included types and size of cohorts used for survival analysis, types of genomic data used from TCGA or CGGA, and the criteria used to select patients for survival analysis. We presented study characteristics, results and quality measures using descriptive statistics with stratification by type of survival analysis, univariable and multivariable, where available. The availability of data in TCGA increased over time and there are different numbers of patients in whom various types of data are available. To assess the reproducibility of cohort selection from TCGA, we summarised the number of patients in studies published between 2017-2020 using TCGA RNA microarray data because these specifications identified studies that have used the same cohort of patients. There was no meta-analysis of any association between molecular markers and overall survival. The pre-clinical glioblastoma models used were cell lines and orthotopic mouse models in 51.7% and 48.3% studies, respectively. All studies used a form of data from TCGA with various combination with other data sources and two studies used data from CGGA (Table 1) . RNA microarray data was the most common data type, used in 45 (77.6%) studies. When investigating the association between their markers of interest from pre-clinical models and survival using genomic data, more studies used univariable survival analyses only (70.7%) compared to those that used multivariable analyses (29.3%). All univariable analyses used the non-parametric logrank (Mantel-Cox) method and all multivariable analyses used the Cox proportional hazards . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted September 8, 2021. ; regression. There were 16 (27.6%) studies that described additional criteria for patient inclusion within the selected TCGA cohort. The date and requested data type of query in TCGA can result in a different number of patients available for survival analysis. To assess reproducibility of cohort selection from TCGA in the included studies, we summarised the numbers of patients in studies with similar data specifications. Among the 126 distinct molecular markers investigated in the included studies, 15 markers underwent more than one univariable or multivariable survival analysis ( Table 2 ). The association of these markers with outcomes were consistent between different analyses most of the time. However, there were discrepancies between results for C-X-C Motif Chemokine Ligan 14 (CXCL14), epidermal growth factor receptor (EGFR), netrin 4 (NTN4), SRY-Box transcription factor 2 (SOX2), serglycin (SRGN) and miRNA-17-5p microRNA ( Table 2 ). These discrepancies appear to relate to the type of survival analysis used (CXCL14, SOX2, SRGN) or the data type (EGFR, NTN4). There were 17 studies that investigated the association between their molecular markers of interest and overall survival using a multivariable survival analysis. All these studies used TCGA data, which have clinical data available. The most frequently included clinical variable in the multivariable model was age (76.5%) (Figure 1 ). Other variables included pre-operative functional status (35.3%), sex (29.4%), MGMT promoter methylation (29.4%), radiotherapy (23.5%), chemotherapy (17.6%), IDH mutation (17.6%) and extent of resection (5.9%). . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted September 8, 2021. ; https://doi.org/10.1101/2021.09.04.21263119 doi: medRxiv preprint There were studies in glioblastoma research that used data from publicly available genomic repositories to correlate pre-clinical experimental findings with clinical survival benefit in humans. These studies often had different numbers of patients included despite using the same data source and data type. Survival analyses often did not include other critical clinical variables associated with survival such as extent of resection, 10 chemotherapy and radiotherapy. 3, 11 In studies that performed a multivariable survival analysis, most clinical variables such as extent of resection and oncological treatment were not included. This yielded some inconsistent results between studies. Other results were subject to confounding effects by clinical variables that were not accounted for. Development of novel cancer therapies relies on reproducible results from preclinical research. The need for improving reproducibility is not new. 12 In cancer research, there is a heavy reliance on the preclinical literature for drug development. 13 However, issues with reporting bias, suboptimal reporting quality, varying reproducibility and preclinical model representation of disease impede the success in finding new therapies. 14 The availability of survival data in publicly available data from cancer genomics programmes presents an opportunity for researchers to assess the association between molecular markers and patient survival in a reproducible manner. These open access data sources provide data on the same cohort of patients, which encourages reproducibility between studies. However, our findings demonstrate that patient selection was not adequately described, resulting in different numbers of patients between studies that supposedly used the same dataset. There are reproducible ways of querying TCGA data, for example, using the . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted September 8, 2021. ; https://doi.org/10.1101/2021.09.04.21263119 doi: medRxiv preprint Most studies did not consider clinical variables as potential confounders to the association between the molecular marker of interest and survival. There are nevertheless examples of associations that no longer exhibit a statistical significance after adjustment to clinical variables in a multivariable analysis (Table 2) . Therefore, it is important to explore and consider confounders when assessing the effect of molecular markers on survival. 19 This is not a simple task because of data missingness, relatively small numbers of patients available, as well as correlations between clinical variables. Both data driven and clinically informed choice of covariates would be a reasonable approach. 20 This systematic review assessed all pre-clinical studies that used data from TCGA or CGGA to validate findings from their laboratory experiments. Our data collection allowed comparison of findings between and within studies, which allowed our evaluation of replicability. Clinical studies that examined associations of previously investigated molecular markers with survival were not included in this review. These studies may provide more detailed descriptions of cohort selection and may be more likely to consider confounding effects from clinical variables. This would mean an overestimation of inconsistencies and suboptimal analytic approaches in our review. However, any omission of consideration about patients being more than their tumours should be highlighted to re-orientate research focus to patient benefits. Collecting data on p values only to denote statistical significance was a pragmatic approach to describing associations reported in the included studies, since most studies did not report any effect sizes. This does not represent our views on the appropriate statistical approach and reporting of findings. We advocate reporting of effect sizes with their corresponding precision, adjusting for confounders. P values should not be used as a cut-off for the significance of an association. 21 There are other aspects of survival analyses that we did not assess, such as whether included studies tested for the proportional hazard assumption when using a Cox regression. 22 While these analytic procedures are important, reporting of these would not affect our findings. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted September 8, 2021. ; . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted September 8, 2021. ; https://doi.org/10.1101/2021.09.04.21263119 doi: medRxiv preprint CBTRUS Statistical Report: Primary Brain and Other Central Nervous System Tumors Diagnosed in the United States in 2013-2017 Might changes in diagnostic practice explain increasing incidence of brain and central nervous system tumors? A population-based study in Wales (United Kingdom) and the United States Radiotherapy plus concomitant and adjuvant temozolomide for glioblastoma Longer-term (≥ 2 years) survival in patients with glioblastoma in population-based studies pre-and post-2005: a systematic review and meta-analysis Accelerating glioblastoma drug discovery: Convergence of patient-derived models, genome editing and phenotypic screening The somatic genomic landscape of glioblastoma Chinese Glioma Genome Atlas (CGGA): A Comprehensive Resource with Functional Genomic Data from Chinese Glioma Patients Survival Analysis Part II: Multivariate data analysis -an introduction to concepts and methods The FAIR Guiding Principles for scientific data management and stewardship Association of Maximal Extent of Resection of Contrast-Enhanced and Non-Contrast-Enhanced Tumor With Survival Within Molecular Subgroups of Patients With Newly Diagnosed Glioblastoma Short-Course Radiation plus Temozolomide in Elderly Patients with Glioblastoma 1,500 scientists lift the lid on reproducibility Drug development and clinical trials-the path to an approved cancer drug Raise standards for preclinical cancer research TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement Reporting Recommendations for Tumor Marker Prognostic Studies (REMARK): An Abridged Explanation and Elaboration Detecting disease-associated genes with confounding variable adjustment and the impact on genomic meta-analysis: With application to major depressive disorder Survival Analysis Part III: Multivariate data analysis -choosing a model and assessing its adequacy and fit Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity