key: cord-0272140-ru971eeq authors: Wang, Bo; Law, Andy; Regan, Tim; Parkinson, Nicholas; Cole, Joby; Russell, Clark D.; Dockrell, David H.; Gutmann, Michael U.; Baillie, J. Kenneth title: Systematic comparison of ranking aggregation methods for gene lists in experimental results date: 2022-01-10 journal: bioRxiv DOI: 10.1101/2022.01.09.475491 sha: 9f5a74318ea47955202ea44d430be95ce35e0aab doc_id: 272140 cord_uid: ru971eeq A common experimental output in biomedical science is a list of genes implicated in a given biological process or disease. The results of a group of studies answering the same, or similar, questions can be combined by meta-analysis to find a consensus or a more reliable answer. Ranking aggregation methods can be used to combine gene lists from various sources in meta-analyses. Evaluating a ranking aggregation method on a specific type of dataset before using it is required to support the reliability of the result since the property of a dataset can influence the performance of an algorithm. Evaluation of aggregation methods is usually based on a simulated database especially for the algorithms designed for gene lists because of the lack of a known truth for real data. However, simulated datasets tend to be too small compared to experimental data and neglect key features, including heterogeneity of quality, relevance and the inclusion of unranked lists. In this study, a group of existing methods and their variations which are suitable for meta-analysis of gene lists are compared using simulated and real data. Simulated data was used to explore the performance of the aggregation methods as a function of emulating the common scenarios of real genomics data, with various heterogeneity of quality, noise level, and a mix of unranked and ranked data using 20000 possible entities. In addition to the evaluation with simulated data, a comparison using real genomic data on the SARS-CoV-2 virus, cancer (NSCLC), and bacteria (macrophage apoptosis) was performed. We summarise our evaluation results in terms of a simple flowchart to select a ranking aggregation method for genomics data. • Implementation of investigated ranking aggregation methods and variations 141 of them that are suitable for dealing with genomics data was carried out m a c r o p h a g e a p o p t o s i s r a n k L a r g e l o w H e t r a n k L a r g e m i d H e t r a n k L a r g e h i g h H e t r a n k S m a l l l o w H e t r a n k S m a l l m i d H e t r a n k S m a l l h i g h H e t The name of each investigated method is shown along y-axis. For simulated data, it includes top-1000 accuracy for scenarios with the default mean noise level M = 3 and absent gene rate γ = 0. The dataset label for simulated data shown along x-axis is the combination of dataset type and heterogeneity. Simulated dataset types are shown as {mixLarge, mixSmall, rankLarge, rankS-mall} corresponding to S ∈ {0m, 1m, 0r, 1r} to show whether unranked lists are included and the number of included lists. The quality heterogeneity is recorded as lowHet, midHet and highHet, corresponding to the small quality heterogeneity D = 0.1, the medium one D = 1 and the large one D = 3 separately. For each simulated data setting, the mean value of the results from 100 repeated experiments is plotted. Top-1000 recall for 3 experiments using real datasets(SARS-CoV-2, macrophage apoptosis and NSCLC) are also included. The measurement used to show the accuracy of methods in gene data should 160 be able to weight top-ranked genes more than bottom ones since they are usu- gives the best performance for nearly all the cutoffs from top-1 to top-1000 in 186 the evaluation of SARS-CoV-2 datasets. Another one of the best-performing 187 methods for the SARS-CoV-2 dataset is rMixGEO, which reaches the best 188 level when heterogeneity is small and mean noise level is large in the simulated 189 data, like D = 0.1 and M = 3 whose specific value can be seen in Figure 1 . It keeps being one of the best methods when M is small for low heterogene- The results for sources with only ranked sources is shown in Figure 3 . In this Table 1 showing the similar best performance for them among investigated methods. Among them, the accuracy of BiGbottom is also one of the highest when The difference between the performance of investigated ranking aggregation 253 methods on the proposed simulated datasets and 3 real datasets were com-254 pared. The results show that whether to include unranked lists for input data and the heterogeneity of quality for sources can largely influence the 256 performance of the investigated ranking aggregation methods. The evaluation result in this study can provide some insights on the data 258 selection and method selection for ranking aggregation of genomic problems. Table 1 . The virus data comes from a SARS-CoV-2 related meta-analysis study for In order to better explore the properties of real data, which usually includes a where N (a, b 2 ) denotes a Gaussian distribution with mean a and variance b 2 . The number of entities which will be finally included in list L i is denoted by N i ∈ [L, U ] and generated as follows: . ln(C i ) is used instead of C i to make the perturbation on larger scale easier than smaller scale, because the difference on larger scale of length is considered to be less significant. For example, the difference between length 2 and 102 are more significant than the difference between length 19000 and 19100. After the generation, all the potential entities within list L i are ranked by score Z ki . Then, X entities will be removed randomly from list L i as controlled by a ratio γ, In a real study, the length of the reported list is usually a result of two fac-381 tors. The first one is that bottom-ranked genes are removed and not reported, 382 whereas the second one is that some genes are not included in the study in Table 2 Different types of simulated data and the corresponding real datasets that they emulate. Emulated real dataset 0r 11 ranked lists large dataset: SARS-CoV-2 0m 11 ranked lists + 21 unranked lists 1r 5 ranked lists small datasets: NSCLC, macrophage apoptosis 1m 5 ranked lists + 5 unranked lists The different types are distinguished by the length of the list (small "0" or large "1") and whether unranked lists are included ("m") or not ("r"). The parameters were set to emulate properties of the real data. Table 3 , the properties of real data were explored to set the Table 4 except for BARD, which has available code implemented in Thurstone-based model [37] , which assumes there is a parameter for each entity 441 and each ranking source is ranked by these parameters subject to noise. Given probability of their stationary distributions [38] . Scholarship. A comparative study of rank aggregation 520 methods for partial and top ranked lists in genomic applications Supervised rank aggretuberculosis clinical status Nf-κb translocation prevents host cell death 610 after low-dose challenge by legionella pneumophila Human gene expression profiles of 614 susceptibility and resistance in tuberculosis Illuminating host-mycobacterial interactions with 618 genome-wide crispr knockout and crispri screens Non-small-cell lung cancer molecular signatures recapitulate 622 lung developmental pathways Hid-625 den treasures in "ancient" microarrays: gene-expression portrays biology 626 and potential resistance pathways of major lung cancer subtypes and 627 normal tissue Rna-seq analysis of lung ade-629 nocarcinomas reveals different gene expression profiles between smoking 630 and nonsmoking patients micrornas with aagugc seed motif constitute an inte-633 gral part of an oncogenic signaling network A high-dimensional, deep-sequencing study of lung 637 adenocarcinoma in female never-smokers A powerful bayesian meta-639 analysis method to integrate multiple gene set enrichment studies A law of comparative judgment Rank aggregation meth-644 ods for the web A gene-coexpression net-647 work for global discovery of conserved genetic modules