key: cord-0767676-wd87dhrd
authors: Misra, Biswapriya B.
title: New software tools, databases, and resources in metabolomics: updates from 2020
date: 2021-05-11
journal: Metabolomics
DOI: 10.1007/s11306-021-01796-1
sha: 1aecaf1df8a00b57f0b0c6ef0e67569454d1caba
doc_id: 767676
cord_uid: wd87dhrd

BACKGROUND: Precision medicine, space exploration, drug discovery to characterization of dark chemical space of habitats and organisms, metabolomics takes a centre stage in providing answers to diverse biological, biomedical, and environmental questions. With technological advances in mass-spectrometry and spectroscopy platforms that aid in generation of information rich datasets that are complex big-data, data analytics tend to co-evolve to match the pace of analytical instrumentation. Software tools, resources, databases, and solutions help in harnessing the concealed information in the generated data for eventual translational success. AIM OF THE REVIEW: In this review, ~ 85 metabolomics software resources, packages, tools, databases, and other utilities that appeared in 2020 are introduced to the research community. KEY SCIENTIFIC CONCEPTS OF REVIEW: In Table 1 the computational dependencies and downloadable links of the tools are provided, and the resources are categorized based on their utility. The review aims to keep the community of metabolomics researchers updated with all the resources developed in 2020 at a collated avenue, in line with efforts form 2015 onwards to help them find these at one place for further referencing and use.

High-resolution mass spectrometry HR MS/MS High-resolution tandem mass spectrometry Q- Triple quadruple UPLC-TOF Ultra-performance liquid chromatographytime-of-flight mass spectrometry XCMS Various forms (X) of chromatography mass spectrometry

The year 2020 has seen an enormous rise in applications of ion mobility mass-spectrometry (IMS), and data-independent acquisition (DIA) methods of analyses in both metabolomics and lipidomics. In terms of application, mass spectrometry as a technology promises advance care for cancer patients in clinical and intraoperative use (J. Zhang, Ge, et al., 2020; Zhang, Sans, et al., 2020) , imaging mass spectrometry (MSI) based natural products (NPs) discovery (Spraker et al. 2020) , nanoscale secondary ion mass spectrometry (nanoSIMS) usage in subcellular MS imaging and quantitative analysis in organelles (Thomen et al. 2020) , capturing urban sources of contamination from high resolution mass spectrometry (HRMS) (Bowen et al., 2020) to detection of COVID-19 disease signatures (Mahmud & Garrett, 2020) . From an analytical method development stand point, interesting developments such as plasma pseudotargeted metabolomics method using ultra-high-performance liquid chromatography-mass spectrometry (UHPLC-MS) ) and the need for combined use of nuclear magnetic resonance spectroscopy and mass spectrometry approaches in metabolomics (Letertre et al. 2020 ) are notable. For volume-limited samples, solutions such as subnanoliter metabolomics via LC-MS/MS such as pulsed MS ion generation method known as triboelectric nanogenerator inductive nanoelectrospray ionization (TENGi nanoESI) MS ) was introduced. Flow-injection Orbitrap mass spectrometry (FI-MS) enabled reproducible detection of ~ 9,000 and ~ 10,000 m/z features in metabolomics and lipidomics analysis of serum samples, respectively, with a sample scan time of ~ 15 s and duty time of ~ 30 s; a ~ 50% increase versus current spectral-stitching FI-MS methods (Sarvin et al. 2020) . A spatial metabolomics pipeline (metaFISH) that combined fluorescence in situ hybridization (FISH) microscopy and high-resolution atmosphericpressure matrix-assisted laser desorption/ionization mass spectrometry to image host-microbe symbioses and their metabolic interactions (Geier et al. 2020) was also reported. Another study that compared the full-scan, data-dependent acquisition (DDA), and data-independent acquisition (DIA) methods in HR LC-MS/MS based metabolomics to reveal that spectra quality is better in DDA with average dot product score 83.1% higher than DIA and the number of MS 2 spectra (spectra quantity) is larger in DIA (Guo & Huan, 2020a) . Furthermore, it was shown that DDA mode consistently generated fewer uniquely found significant features than full-scan and DIA modes (Guo & Huan, 2020b) .

Using with Raman spectroscopy, followed by stimulated Raman scattering (SRS) microscopy and Ramanguided subcellular pharmaco-metabolomics in metastatic melanoma cells revealed intracellular lipid droplets that helped identify a previously unknown susceptibility of lipid mono-unsaturation within de-differentiated mesenchymal cells with innate resistance to BRAF inhibition . Application of 31 P NMR was shown to hold potential of expanding the coverage of the metabolome by detecting phosphorus-containing metabolites (Bhinderwala et al. 2020) .

The effectiveness of the flow injection analysis-continuous accumulation of selected ions Fourier transform ion cyclotron resonance mass spectrometry (FIA-CASI-FTMS) workflow utilizing isotopic fine structure (IFS) for molecular formula assignment was realized for metabolomics applications (Thompson et al. 2020) . A buffer modification workflow (BMW) in which the same sample is run by LC-MS in both liquid chromatography solvent with 14 NH 3 -acetate buffer and in solvent with the buffer modified with 15 NH 3 -formate, resulted in characteristic mass and signal intensity changes for adduct peaks, facilitating their annotation . Towards reference materials standardization, quantitative measures of approximately 200 metabolites for each of three pooled reference materials (220 metabolites for Qstd3, 211 metabolites for CHEAR, 204 metabolites for NIST1950) were obtained and supported harmonization of metabolomics data collected from 3677 human samples in 17 separate studies analyzed by two complementary HRMS methods (K. H. Liu, Mrzic, et al., 2020; Liu, Nellis, et al., 2020) . Another review highlighted the recent progresses (since 2016) in the field of chemical derivatization LC-MS for both targeted and untargeted metabolome analysis (Zhao & Li, 2020) . The characterization of compounds by the number of labile hydrogen and oxygen atoms in the molecule, which can be measured using hydrogen/deuterium and 16 O/ 18 O-exchange approaches allows reduction of the search space by a factor of 10 and considerably increases the reliability of the compound identification (Kostyukevich et al. 2020) . Preference for monophasic methods that are quicker and simpler than biphasic methods for their amenability and integration into future automation for hydrophilic interaction chromatography (HILIC) ultrahigh-performance liquid chromatography-mass spectrometry (UHPLC-MS) and nonpolar extracts by C18 reversed-phase UHPLC-MS based metabolomics in animal tissues and biofluids (Southam et al. 2020) was also demonstrated. In other innovative applications, use of short columns and direct solvent switches allowed for fast screening (3 min per polarity), where a total of 50 commonly reported diagnostic or explorative biomarkers were validated with a limit of quantification that was comparable with conventional LC-MS/MS (van der Laan et al. 2020) .

From the stand point of data analysis, metabolomics as a field is starting to benefit by applying machine learning (ML) (Liebal et al. 2020 ) and deep learning (DL) (Pomyen et al. 2020; Sen et al. 2020) approaches to address diverse challenges from data preprocessing to biological interpretation. In the context of systems and personalized medicine LION-ESS (Linear Interpolation to Obtain Network Estimates for Single Samples) and ssPCC (single sample network based on Pearson correlation) were evaluated and compared in the context of metabolite-metabolite association networks (Jahagirdar & Saccenti, 2020) . In annotation domains for low resolution GC-MS data, usage of DL ranking for small molecules identification, a deep learning ranking model outperformed other approaches and enabled reducing a fraction of wrong answers (at rank-1) by 9-23% depending on the used data set (Matyushin et al. 2020) . In the age of artificial intelligence, spatial metabolomics and IMS promise to revolutionize biology and healthcare (Alexandrov, 2020) . Approaches such as an integrated strategy of fusing features and removing redundancy based on graph density (FRRGD) were proposed that greatly enhanced the metabolome detection coverage with low abundance (Ju et al. 2020) .

For a software survey of other mass-spectrometry derived omics tools, packages, resources, softwares and databases, readers can consult other treatise for metaproteomics (Sajulga et al. 2020) , data-independent acquisition mass spectrometry-based proteomics (F. Zhang, Ge, et al., 2020; Zhang, Sans, et al., 2020) , single cell and single cell-type metabolomics (B. B. Misra, 2020a) among others.

Diverse online resources such as OMICtools (http:// omict ools. com/) (Henry et al. 2014 ), Fiehn laboratory pages (http:// fiehn lab. ucdav is. edu/ and http:// metab olomi cs. ucdav is. edu/ Downl oads), the International Metabolomics Society's resource pages, software repositories such as Comprehensive R Archive Network (CRAN) (https:// cran.r-proje ct. org/ web/ packa ges/ avail able_ packa ges_ by_ name. html), Bioconductor (https:// www. bioco nduct or. org/), the Python Package Index (PyPI) (https:// www. pypi. org), GitLab (https:// www. gitlab. com), and GitHub (https:// www. github. com/) are excellent resources to obtain software tools, databases and resources for metabolomics research. Metabolomics Tools Wiki claimed to be an updated resource for metabolomics tools, databases and software resources has ceased to be updated since 2017 (Spicer et al. 2017) . Whilst there exists a plethora of programming languages, modern interpreted scripting languages such as R, Python, Raku, Ruby, and MATLAB are evidently popular in metabolomics.

Building on the previously established review structure this overview of major tools and resources in metabolomics, spanning (B. Misra & van der Hooft, 2015 O'Shea & Misra, 2020) is organized into the following sections: (1) Platform-specific tools, (2) Preprocessing and QC tools, (3) Annotation tools, (4) Multifunctional tools, (5) Tools for statistical analysis and visualization, (6) Databases, and (7) Other specialized tools. Table 1 provides a summary of all reviewed resources and their availability. Furthermore, in Table 2 , highlighted are unpublished tools that can be found in the CRAN and PyPI software repositories that are deemed useful for the metabolomics research community, but are not associated with a scholarly article that is published.

Metabolomics as a discipline depends on mass spectrometry and spectroscopy analytical platforms to generate high through put omics scale data. These include, and are not limited to liquid chromatography-mass spectrometry (LC-MS), gas chromatography-mass spectrometry (GC-MS), capillary electrophoresis-mass spectrometry (CE-MS), and spectroscopic methods such as 1 H-NMR, 13 C-NMR, Raman, and Fourier transform infrared (FTIR) among others. In this section, I discuss all the tools that appeared in 2020 for analyses of datasets that are specific to a metabolomics platform or technology, i.e., LC-MS, GC-MS, and NMR. Automated spectraL processing system for NMR (Alp-sNMR), is an R-package that provides automated signal processing for untargeted NMR metabolomics datasets by performing region exclusion, spectra loading, metadata handling, automated outlier detection, spectra alignment and peak-picking, integration and normalization (Madrid-Gambin et al. 2020) . The tool can load Bruker and JDX samples and can preprocess them for downstream statistical analysis.

Signature mapping (SigMa), developed as a standalone tool using MATLAB dependencies, for processing raw urine 1 H-NMR spectra into a metabolite table (Khakimov et al. 2020) . SigMa relies on the division of the urine NMR spectra into Signature Signals (SS), Signals of Unknown spin Systems (SUS) and bins of complex unresolved regions (BINS), thus allowing simultaneous detection of urinary Table 1 The entire list of reviewed tools is organized by important analytical steps in metabolomics data analysis and includes details regarding their platform dependency, and implementation, e.g., programming language (R, Python, Java, C/C ++, etc.) or web browser based and their availability Name of the Software Tool Category The tools generally follow their order of appearance in the manuscript text metabolites in large-scale NMR metabolomics studies using a SigMa chemical shift library and a new automatic peak picking algorithm. NMR filter, is a stand-alone interactive software for highconfidence NMR compound identification that runs NMR chemical shift predictions and matches them with the experimental data, where it defines the identity of compounds using a list of matching rates and correlating parameters of accuracy together with figures for visual validation (Kuhn et al. 2020) .

Networking analysis, as a platform enables users to store, process, share, annotate, compare and perform molecular networking of both unit/ low resolution and GC-HRMS data . GNPS-MassIVE is a public data repository for untargeted MS 2 data, EI-MS data, with sample information (metadata) and annotated MS 2 spectra . MSHub performs the auto-deconvolution of compound fragmentation patterns via unsupervised non-negative matrix factorization and quantifies the reproducibility of fragmentation patterns across samples, followed by GNPS molecular networking analyses.

RGCxGC toolbox, is an R-package that aids in analysis of two dimensional gas chromatography-mass spectrometry (2D GC-MS) data by offering pre-processing algorithms for signal enhancement, such as baseline correction based on asymmetric least squares, smoothing based on the Whittaker smoother, and peak alignment 2D Correlation Optimized Warping and multiway principal component analysis (Quiroz-Moreno et al. 2020).

In untargeted metabolomics workflows that use either LC-MS/MS, GC-MS or NMR, depend a lot on pre-processing of the acquired raw datasets prior to statistical analyses and interpretation. Preprocessing typically involves tools that aid in the detection of masses (as m/z's) from mass spectra (i.e., feature detection), construct and display extracted ion chromatograms, detect chromatographic peaks, deconvolution, peak alignment, data matrix curation steps such as batch and blank corrections to filtration and normalization steps, and quality assessments. Though, there are decade old popular preprocessing tools available to the community in the form of xcms (Tautenhahn et al. 2008) , MZmine 2 (MZmine Development Team 2015), MS-DIAL (Tsugawa et al. 2015) there is a consistent effort to improve the workflows-from reducing computational time, to developing graphical user interfaces (GUIs) for users to render them user friendly to addressing challenges associated with interpretation of data from advanced platforms such as HRMS data or those from IMS, MSI etc. In fact, a recent comparative effort (among software tools such as software packages MZmine 2, enviMass, Compound Discoverer™, and XCMS Online) demonstrated a low coherence between the four processing tools, as overlap of features between all four programs was only about 10%, and for each software between 40 and 55% of features did not match with any other program (Hohrenk et al. 2020) . Moreover, quality control (QC) tools are important to take care of systematic and random variations/ errors induced during experimental and analytical workflows. Batch effects can pose a lot of challenges, i.e., introduction of experimental artifacts that can interfere with the measurement of phenotype-related metabolome changes in metabolomics data (Han & Li, 2020) , and data normalization strategies, tools, and software solutions available are reviewed to circumvent some of these challenges (B. B. Misra, 2020b) . In this section, I cover the preprocessing and the QC tools that appeared in 2020.

implemented as an R-package is a visual post-processing tool that removes redundant features from LC-MS/MS based untargeted metabolomic data sets (Kouřil et al. 2020) , where it groups highly correlated features within a defined retention time (RT) window avoiding the condition of specific m/z difference making it a second-tier strategy for multiplicities reduction. The output is a graphical representation of correlation network allowing a good understanding of the clusters composition that can aid in further parameter tuning.

neighbor-wise compound-specific Graphical Time Warping (ncGTW), is an integrated reference-free profile alignment method, implemented as an R-package and is available as a plugin for xcms that aids in detecting and fixing the bad alignments (misaligned feature groups) in the LC-MS data to render accurate grouping and peak-filling .

TidyMS, is a Python package for preprocessing of untargeted LC-MS/MS derived metabolomics data that reads raw data fro-m a .mzML file format, generates spectra and total ion chromatograms (TICs), allows peak picking, feature detection, reads processed data from xcms, MZmine 2 among others, offers functionalities for data matrix curation, normalization, imputation, scaling, quality metrics, QC-based batch corrections and interactive visualization of results .

AutoTuner, available as an R-package, is a parameter optimization algorithm that obtains parameter estimates from raw data in a single step as opposed to many iterations in a data-specific manner to generate robust features from untargeted LC-MS/MS runs (McLean & Kujawinski, 2020) . For input, AutoTuner requires at least 3 samples of raw data converted from proprietary instrument formats (e.g. .mzML, .mzXML, or .CDF). remove unwanted variation in a hierarchical structure (hRUV), is an R-package (also available as Shiny app) that aids in removal of unwanted variation from large scale LC-MS metabolomics studies which it accomplishes by progressively merging the adjustments in neighboring batches (Taiyun Kim, Owen Tang, Stephen T Vernon, Katharine A Kott, Yen Chin KoaTaiyun Kim, Owen Tang, Stephen T Vernon, Katharine A Kott, Yen Chin Koay, John Park, David James, Terence P Speed, Pengyi Yang, John F. O'Sullivan, Gemma A Figtree, Jean Yee Hwa Yangy, 2020). The package uses sample replicates to integrate data from several batches for removal of intra-batch signal drift and inter-batch unwanted variation and outperforms existing tools while retaining biological variation. For assessment of the results, a user can visualize results as three kinds of diagnostic plots, i.e., principal component analysis (PCA) plots, relative log expression (RLE) plots, and metabolite run plots.

MetumpX, is a Ubuntu-based R-package that facilitate easy download and installation of 103 tools spread across the standard untargeted MS-based metabolomics pipeline (Wajid et al. 2020) . The package can aid in automatically installation of software pipelines truly speeding up the learning curve to build software workstations.

MeTaQuaC, is an R-package and aids in implementation of concepts and methods for Biocrates kits and its application in targeted LC-MS metabolomics workflows and creates a QC report containing visualization and informative scores, and provides summary statistics, and unsupervised multivariate analysis methods among others (Kuhring et al. 2020) .

Dbnorm, is an R-package that allows visualization and removal of technical heterogeneity from large scale metabolomics dataset, after allowing inspection at both in macroscopic and microscopic scales at both sample batch and metabolic feature levels, respectively (Bararpour et al., 2020) . dbnorm includes several statistical models such as, ComBat (parametric and non-parametric)-model from sva package that are already in use for metabolomics data normalization, and ber function.

MetaClean, available as an R-package, uses 11 peak quality metrics and 8 diverse ML algorithms to build a classifier for the automatic assessment of peak integration quality of peaks from untargeted metabolomics datasets (Chetnik et al. 2020) . It was shown that AdaBoost algorithm and a set of 11 peak quality metrics were best performing classifiers, and applying this framework to peaks retained after filtering by 30% relative standard deviation (RSD) across pooled QC samples was able to further distinguish poorly integrated peaks that were not removed from filtering alone.

NeatMS, is a Python package that is available for untargeted LC-MS signal labelling and filtering, which enables automated filtering out of false positive MS1 peaks reported by routine LC-MS data processing pipelines. It relies on neural networking-based classification, and can process outputs from MZMine 2 and xcms analysis.

Metabolite annotation remains a critical step that defines the success or failure of untargeted metabolomics efforts. With newer technologies such as collision cross section (CCS) data for ion mobility, high resolution mass spectra from Orbitrap, direct injection data, data independent acquisition (DIA)/ all ion fragmentation (AIF), imaging MS and multi-dimensional chromatography the annotation results have gained additional impetus in compound identification, but these methods have offered newer challenges in themselves for tool development. False discovery rates (FDRs) of annotations indicate that low FDRs yield low number yet reliable annotations, whereas higher FDR report high number of annotations by those of poor-quality annotations. Though metabolite annotation efforts can benefit from RT as an orthogonal information, efforts for combining RT predictions with MS/MS data is currently lacking (Witting & Böcker, 2020) . Clearly reference spectra and spectral DBs/ libraries are not enough to annotate roughly 5-30% of the total features captured (depending on the environmental/ biological matrices in question) in a given mass spectrometry-based metabolomics dataset. Though experimentally obtained MS/MS data and NMR data on pure standards are precious, and aid in development of computational solutions for compound identification, they do not suffice at their current numbers, accessibility, and availability. Moreover, in 2020, the Metabolite Identification Task Group of the International Metabolomics Society assessed and proposed a set of revised reporting standards for metabolite annotation/ identification and requested community feedback for levels from A-G, from defining an enantiomer or a chiral metabolite (level A) (to unknown molecular formula with specific spectral features (G). Once formalized, these would positively affect and improve reporting standards in studies and the publication landscape in metabolomics research. In Fig. 1, 2, 3 , shown are the software interfaces and analysis outputs for some of the annotation tools discussed in the following sections.

MEtabolite SubStructure Auto-Recommender (MES-SAR), is a web-based tool that provides an automated method for substructure recommendation guided by association rule mining, captures potential relationships between spectral features and substructures as learned from public spectral libraries for suggesting substructures for any unknown mass spectrum (Y. Liu, Mrzic, et al., 2020; Liu, Nellis, et al., 2020) . Though the interface does not perform batch processing currently, it provides an open-source approach to annotate substructures. .0) , is an artificial intelligence (AI) -based ML tool for mixture analysis in NMR data analysis workflow that aid in subsequent accelerated discovery and characterization of new NPs. SMART 2.0 generates structure hypotheses from two dimensional NMR data [ 1 H-13 C-Hetero-nuclear Single Quantum Coherence (HSQC) spectra], then compares with a query HSQC spectrum against a library of > 100,000 NPs to provide outputs as simplified molecular-input lineentry system (SMILES), structures, cosine similarity, and molecular weights for a given compound of interest.

MetFID, is a tool that uses an artificial neural network (ANN) trained for predicting molecular fingerprints based on experimental MS/MS data (Fan et al. 2020) . MetFID retrieves candidates from metabolite databases using molecular formula or m/z value of the precursor ion of the analyte and the candidate whose fingerprint is most analogous to the predicted fingerprint which is used for metabolite annotation. However, no codes or accessible tools/ repositories are provided with the published scholarly article.

CPVA, is a web-based tool that is aimed at the analyses of untargeted LC-MS/MS generated metabolomics data for visualization and annotation of LC peaks, where the tool performs functions such as annotation of adducts, isotopes and contaminants, and allows visualization of peak morphology metrics (Luan et al. 2020 ). Further, the tool aids in capturing potential noises and contaminants encountered in chromatographic peak lists generated from LC-MS/MS data, thus resulting in a reduced false positive peak calling in order to help data quality and downstream data processing.

NRPro, is a web-based application dedicated for dereplication and characterization of peptidic natural products (PNPs) from LC-MS/MS datasets that performs automatic peak annotation through a statistically validated scoring system (Ricart et al. 2020 ). An example NRPro dereplication effort revealed that the software was able to identify 169 PNPs in a dataset of 352 spectra with an FDR of 3.55.

MetENP/MetENPWeb, is available as an R-package on the Metabolomics Workbench repository, also deployed as a web-based application that allows extending the metabolomics data enrichment analysis to include Kyoto Encyclopedia of Genes and Genomes (KEGG)-based species-specific pathway analysis, pathway enrichment scores, gene-enzyme data, and enzymatic activities of the significantly altered metabolites on any Metabolomics Workbench submitted studies/ datasets (Choudhary et al. 2020) . Various plots and visualizations such as volcano plots and bar graphs are available to the user of the tool after the analyses. (Dührkop et al. 2019 ) suite of software, is a computational tool for systematic compound class annotation from fragmentation spectra . CANOPUS uses 

Independent Criteria) provides a function (mosaic_ find()) designed to find rhythmic and non-rhythmic trends in multi-omics time course data using model selection and joint modelling https:// cran.r-proje ct. org/ web/ packa ges/ mosaic. find/ index. html

Integrative pathway enrichment analysis of multivariate omics data A framework for analysing multiple omics datasets in the context of molecular pathways, biological processes and other types of gene sets. The tool uses p-value merging to combine gene-or protein-level signals, followed by ranked hypergeometric tests to determine enriched pathways and processes https:// cran.r-proje ct. org/ web/ packa ges/ Activ ePath ways/ index. html wilson

Provides modules for creating web-based applications that use plot-based strategies to visualize and analyse multi-omics data https:// cran.r-proje ct. org/ web/ packa ges/ wilson/ index. html mixKernel Omics data integration using kernel methods

The package aims at providing methods to combine kernel for unsupervised exploratory analysis, that can help integration of heterogenous types of data https:// cran.r-proje ct. org/ web/ packa ges/ mixKe rnel/ index. html a deep neural network to predict 2,497 compound classes from fragmentation spectra, including all biologically relevant classes, and explicitly targets compounds for which neither spectral nor structural reference data are available in addition to predicting compound classes lacking MS/MS training data. Recently, CANOPUS was made available for analysis of MS/MS spectra obtained from both positive and negative mode ionization datasets. molDiscovery, is a mass spectral database search method that improves both efficiency and accuracy of small molecule identification by (i) using an efficient algorithm to generate mass spectrometry fragmentations, and (ii) learning a probabilistic model to match small molecules with their mass spectra (Mohimani et al. 2020) . A search of over 8 million spectra from the GNPS molecular networking infrastructure demonstrated that this probabilistic model can correctly identify nearly six times more unique compounds than other previously reported methods.

MetIDfyR, developed as an R-package that aids in in silico drug phase I/II biotransformation prediction and massspectrometric data mining from untargeted LC-HRMS/MS datasets (Delcourt et al. 2020) to help with feature annotation. With the ability to predict drug metabolism products from in vitro and in vivo studies, this tool holds potential in annotation workflows in drug discovery programs.

Qemistree, is a cheminformatics tool available as an advanced analysis workflow on GNPS infrastructure that allows mass spectrometry data to be represented in the context of sample metadata and chemical ontologies (Tripathi et al. 2020 ). This tree-guided data exploration tool allows comparison of metabolomics samples across different experimental conditions such as chromatographic shifts. The Qemistree software pipeline is freely available to the microbiome and metabolomics communities in the form of a QIIME2 plugin as well.

Ion identity molecular networking (IIMN), a workflow available within the GNPS ecosystem that complements the feature based molecular networking (FBMN) by aiding in annotating and connecting related ion species in featurebased molecular networks (Schmid et al. 2020 ). Though, MS1-based ion identity networks (IIN), are well-known, IIMN helps to integrate IIN into MS2-based molecular networks in the GNPS environment, thus adding MS/MS information on top of MS1 characteristics of ions.

, is a tool developed in R language, is a web-based analysis and visualization package that is focused on interactive visualization of the FOBI structure (Castellano-Escuder et al. 2020) . FOBI (Food-Biomarker Ontology) is a new ontology that describes food and their associated metabolite entities and is composed of two interconnected sub-ontologies, the 'Food Ontology' consisting of raw foods and 'multi-component foods' and a second: 'Biomarker Ontology' containing food intake biomarkers classified by their chemical classes. These two sub-ontologies are conceptually independent but interconnected by different properties. Functionalities of the tool include static and dynamic network visualization, downloadable tables, compound ID conversions, classical and food enrichment analyses.

BioDendro, is a Python package, for feature analysis of LC-MS/MS metabolomics data as a workflow that enables users to flexibly cluster and interrogate thousands of MS/ MS spectra and quickly identify the core fragment patterns causing groupings leading to identification of core chemical backbones of a larger class, even when the individual metabolite of interest is not found in public databases (Rawlinson et al. 2020) .

AllCCS, is a freely accessible database/ CCS atlas that covers vast chemical structures with > 5000 experimental CCS records and ~ 12 million calculated CCS values for > 1.6 million small molecules, with medium relative errors of 0.5-2% for a broad spectrum of small molecules for annotation of both known and unknown structures. Further, the tool facilitates a strategy for metabolite annotation using known or unknown chemical structures in IMS metabolomics workflows.

Binner, implemented as a standalone Java executable software package eliminates degenerate feature signals present in untargeted ESI-LC-MS/MS metabolomic datasets (Kachman et al. 2019) . When a user provides an aligned feature table, with unique compound (feature) identifier, m/z, RT, and feature intensities, the Binner annotation file specifies information on annotation, mass, mode, charge, and tier information when annotating the final set of features.

MS-CleanR, is an R-package that provides functions for feature filtering and annotation of LC-MS data, that depends on the outputs of an need MS-DIAL (v4.00 or higher) and MS-FINDER (3.30 or higher) (Fraisier-Vannier et al. 2020) . It uses MS-DIAL peak list processed in DDA or DIA obtained using either positive ionization mode (PI) or negative ionization mode (NI) or both as its input. MS-CleanR applies generic filters encompassing blank injection signal subtraction, background ions drift removal, unusual mass defect filtering, RSD threshold based on sample class and relative mass defect (RMD) window filtering. Furthermore, all selected features are exported to MS-FINDER program for in silico-based annotation using hydrogen rearrangement rules (HRR) scoring system and it was shown that implementation of MS-CleanR reduced the number of signals by nearly 80% while retaining 95% of unique metabolite features.

, is an R-package, for predicting RT for small molecules in high pressure liquid chromatography (HPLC) MS data analysis workflows (Bonini et al. 2020) . In order to help annotate unknowns and removing false positive annotations, it uses five different machine learning algorithms [i.e., random forest (RF), Bayesian regularized neural network, XGBoost, lightGBM, and Keras] to build a stable, accurate and fast RT prediction model. It also includes useful biochemical (structural) databases like: BMDB, ChEBI, DrugBank, ECMDB, FooDB, HMDB, KNApSAcK, PlantCyc, SMPDB, T3DB, UNPD, YMDB and STOFF.

QSRR Automator, is a software package that helps automate RT prediction model creation that's been tested with metabolomics and lipidomics data from multiple chromatography columns from published literature and in-house work from the authors (Naylor et al. 2020) .

MFAssignR, is an R-package that performs noise estimation, 13 C and 34 S polyisotopic mass filtering, mass measurement recalibration, and molecular formula assignment for UHPLC-MS data analysis in environmental complex mixtures (Schum et al. 2020) . The function of this tool includes determination of noise, S/N threshold, identification of isotopes, potential recalibrant series, mass list recalibration, assignment of molecular formula to the recalibrated mass list, and output plots to evaluate the quality of the assignments.

Metabolite core structure-based Search (McSearch), is a program available both as an R-package and a webbased tool for automated metabolite annotation for LC-MS/ MS data. It utilizes a Core Structure-based Search (CSS) algorithm, hypothetical neutral loss (HNL) library and biotransformation database to achieve metabolite annotation using the structural analogs of query compounds ). The tool is available both as single search mode (.csv files) and batch search mode (.mgf or .mzXML formats). The input for single search mode is a Core Structurebased Search (see input_single_search.csv as a template). For batch mode, we currently accept raw data of .mgf or .mzXML format as input.

ReDU, is a GNPS based system for metadata capture of public deposited MS-based metabolomics data, with validated controlled vocabularies that captures knowledge by enabling reanalysis of public data and/or co-analysis of one's own data for finding chemicals and associated metadata for Fig. 1 Snapshots of a subset of tools and resources discussed in this review. a SMART 2.0 outputs for a demo metabolite swinholide A, b Outputs from MESSAR for corosollic acid, c Demo analysis on FOBI a repository-scale molecular networking . Currently, 38,305 files in GNPS (19.6% of GNPS) are ReDU compatible which includes data collected from natural and human-built environments, human and animal tissues, biofluids and food from all over the world.

, is a webbased MS search engine avaialble within the GNPS infrastructure that enables searches of all small-molecule MS/MS data in public metabolomics repositories Wang, Leber, et al., 2020) . MASST comprises a web-based system to search the public data repository part of the GNPS/MassIVE knowledge base and an analysis infrastructure for a single MS/MS spectrum. All public data submitted to/ available in GNPS/MassIVE becomes MASST-searchable. MASST searches yield results according to user-defined search parameters.

NPClassifier, is a DL tool for automated structural classification of NPs (Wang, Leber, et al. 2020) . Currently available as a web-based tool for a simple search effort. The tool aims to accelerate NP discovery by facilitating and enabling large-scale genome and metabolome mining efforts and linking NP structures to their bioactivity.

LipidLynxX, developed in Python and available both as a standalone tool and web interface, is a software to convert diverse lipid annotations to unified identifiers and cross-ID matching (Ni & Fedorova, 2020) . It primarily offers three models, the Converter, that allows conversion of different abbreviations to uniformed LipidLynxX IDs, an Equalizer that allows cross comparison of different levels of IDs on selected levels, and a Linker module that allows linking abbreviations to available resources. patRoon, is a R-package that aids in non-target HR MS data analysis workflows (Helmus et al. 2021 ). The tool offers various functionality and strategies to simplify and perform automated processing of complex (environmental) data effectively using well-tested algorithms by harmonizing various open-source software tools and with reduced computational times. 

Multifunctional tools, in this review are defined as tools that allow an user to start with raw mass spectrometric or spectroscopic data and go through pre-processing steps, QCs, statistical analyses, data visualization and interpretation. In this section I cover software solutions that surfaced in 2020. Fig. 3 Snapshots of a subset of tools and resources discussed in this review. Web interfaces and snapshot of outputs for demo data on a VIIME, b MetaboliteAutoPlotter, and c SUMMER Skyline, originally developed for SWATH (and DIA) and targeted proteomics workflows, has now expanded to data analysis for small molecule analysis, including selected reaction monitoring (SRM), HRMS datasets, and calibrated quantification, for data visualization and interrogation features already available in Skyline, such as peak picking, chromatographic alignment, and transition selection among others (Adams et al. 2020) .

notame, available as an R-package and a Wranglr Shiny web application for automation of worklist files, is a multifunctional tool for untargeted LC-MS/MS metabolomics data analyses that aids in reading outputs from MS-DIAL, allows drift correction, flagging and removal of low-quality features, imputation of missing values, batch effect correction, offers novel clustering methods, statistical analyses and visualization for QC, explorative analyses aiding in interpretation of statistical tests (Klåvus et al. 2020) .

Breath AnaLysis viSualizAtion Metabolite discovery (BALSAM), is an interactive web-based tool that integrates state-of-the-art preprocessing and analysis techniques for supervised feature extraction and visualization of multi capillary column-ion mobility spectrometry (MCC-IMS) data preprocessing workflows that deals with breath analysis ). In addition, it supports peak detection and peak alignment as well as RT based GC-MS and LC-MS data analysis.

MRMkit, is a software solution for data processing of large scale targeted LC-MS -based metabolomics data that performs automated peak detection, peak integration, normalization, batch effect correction, quality metric calculations, visualizations of chromatograms, and removal of redundant peaks from multimodal classes by RT selection (Teo et al. 2020) .

MetaboShiny, available as a MetaboShiny, a R/Shinybased package featuring data analysis, database-and formula prediction -based annotation and visualization on diverse MS datasets (Wolthuis et al. 2020) . MetaboShiny allows a diverse set of customization and global settings to an user, in addition to adding databases, data normalization and filtering, statistical functions ranging from dimension reduction [from PCA, partial least-squares-discriminant analysis (PLS-DA) to t-distributed stochastic neighbor embedding (t-SNE)], univariate analyses on sets of features, range of visualizations from volcano plots to heatmaps, Venn diagrams, power calculations, and metabolite enrichment analyses.

SmartPeak, is a programmable software application that offers novel algorithms for RT alignment, calibration curve fitting, and peak interrogation for facilitating reproducibility by reducing operator bias to ensure high QC/ quality assurance (QA) for automated processing of CE-, GC-and LC-MS(/MS) data, and HPLC data for targeted and semi-targeted metabolomics, lipidomics, and fluxomics experiments (Kutuzova et al. 2020) .

MS-DIAL 4, is a standalone DIA software tool that provides a comprehensive lipidome atlas with RT, CCS, and MS/MS information encapsulating mass spectral fragmentations of lipids across 117 lipid subclasses and includes analysis of ion mobility MS/MS data ). Using lipidomics data from diverse samples the study reported semi quantified 8,051 lipids using MS-DIAL 4 with a 1-2% estimated FDR.

Integrated mass spectrometry-based untargeted metabolomics data mining (IP4M), is a multifunctional tool for untargeted MS-based metabolomics data processing and analysis that has 62 functions categorized into 8 modules (Liang et al. 2020) . The modules cover all majority of the steps of metabolomics data mining, including raw data preprocessing (alignment, peak de-convolution, peak picking, and isotope filtering), peak annotation, peak table preprocessing, basic statistical description, classification and biomarker detection, correlation analysis, cluster and subcluster analysis, regression analysis, receiver operating characteristic (ROC) analysis, pathway and enrichment analysis, and sample size and power analysis.

DropMS, is a online tool with a user-friendly and browser-based interface to facilitate the processing of high resolution and precision oil mass spectrometry data for petroleomics applications (Rosa et al. 2020) . Uploaded mass spectra to the server are processed using various algorithms reported in the literature, such as S/N ratio filters, recalibrations, chemical formula assimilations and data visualization using graphs and diagrams popularly known in mass spectrometry such as Van Krevelen and Kendrick diagrams among other visualizations.

In this section, described are tools dedicated for statistical analyses and visualization of metabolomics data visualization. EpiMetal, is a web-based application that allows statistical analyses and visualization of large datasets for epidemiological analyses and self-organizing maps (SOMs) for metabolomics (Ekholm et al. 2020) . A pilot data with > 500 quantitative molecular measurements for each sample and two large-scale epidemiological cohorts (N > 10,000) are available to users on the interface as well.

Metabolite AutoPlotter, is an R-package and wrapped into a Shiny web application that can be run online in a web browser, which uses pre-processed metabolite-intensity tables as inputs and accepts different experimental designs, with respect to the number of metabolites, conditions and replicates and process and plots metabolite data sets (different types), converts and cleans-up the data, allows data normalization for sample descriptions and metabolite names as well as sorting experimental conditions (Pietzke & Vazquez, 2020) .

Metabolite-Investigator, is a free and open web-based tool and stand-alone Shiny application, that provides a scalable analysis workflow for quantitative metabolomics data from multiple studies by performing data integration, cleaning, transformation, batch analysis and multiple statistical analysis methods including uni-and multivariable factor-metabolite associations, network analysis, and factor prioritization in one or more cohorts (Beuchel et al. 2020) .

, available as a web server, provides a workflow for metabolomics research by offering state-of-the-art integration algorithms and visualizations (Choudhury et al. 2020) . A user starts with an uploaded spreadsheet of quantitative metabolomics data and runs a semi-automated process which informs about low-variance and high-missingness data, allows arbitrary sample and metabolite exclusion, and performs adjustable missing data imputation, informs about data pretreatment, runs PCA and block PCA, statistical analyses such as Wilcoxon and analysis of variance (ANOVA), and finally provides interactive tables, charts, heatmaps and networks diagrams as outputs on a given metabolomics data.

, is an R/ Bioconductor package that defines a suite of classbased templates to allow users to develop and implement standardized and readable statistical analysis workflows for metabolomics and other omics technologies (Lloyd et al. 2020) . Struct integrates with the STATistics Ontology to ensure consistent reporting and maximizes semantic interoperability. A related package, the structToolbox, which includes an extensive set of commonly used data analysis methods using the templates provided in the struct package. struct includes a suite of S4 class-based templates (i.e., model, sequence, iterator, chart and metric classes) to facilitate the standardization of R-based workflows for statistics and ML. The toolbox includes pre-processing methods (e.g. signal drift and batch correction, normalisation, missing value imputation and scaling), univariate (e.g. t-test, various forms of ANOVA, Kruskal-Wallis test and more) and multivariate statistical methods [e.g. PCA and partial least squares (PLS), including cross-validation and permutation testing] as well as machine learning methods (e.g. support vector machines).

lipidr, is an R/Bioconductor-package for data mining and analysis of lipidomics datasets that implements a lipidomicfocused analysis workflow for targeted and untargeted lipidomics (Mohamed et al. 2020) . lipidr imports numerical matrices, Skyline exports, and Metabolomics Workbench files directly into R interface, and allows thorough data inspection, normalization, and uni-and multivariate analyses, resulting in interactive visualizations as well as a novel lipid set enrichment analysis.

NORmalization and EVAluation (NOREVA 2.0), is a web-server (also available as a standalone R-package) for normalization of metabolomics data, with the latest version's capabilities to deal with time-course and multi-class metabolomics datasets ). In addition, NOREVA 2.0 integrates a total of 168 normalization methods and combinations thereof leading to removal of unwanted variations, correction of signal drifts based on QCs, performance evaluation of the datasets, thus, pointing to the best normalization methods for a given dataset.

%polynova_2way, is a Macro written for the statistical software Statistical Analysis System (SAS) to help identify metabolites differentially expressed in study designs with a two-way factorial treatment and hierarchical design structure (Manjarin et al. 2020) . The Macro calculates the least squares means using a linear mixed model with fixed and random effects, runs a 2-way ANOVA, corrects the P-values for the number of metabolites using the FDR or Bonferroni procedure, and calculates the P-value for the least squares mean differences for each metabolite. rawR, available as an R-package that provides operating system (OS) independent access to all spectral data and chromatograms logged in the mass spectrometry vendor, Thermo Fisher Scientific's .RAW files obtained from MS runs (Kockmann & Panse, 2020) .

Metaboverse, is an interactive standalone software tool for the exploration and automated extraction of potential regulatory events, patterns, and trends from multi-omic data within the context of a metabolic network and other global reaction networks (Berg et al. 2020 ). The tool aids in analysis of Reactome knowledgebase derived networks for 90 + model organisms, helps integrate multi-condition and time course data, in addition to facilitating exploration of super-pathway specific reaction perturbation networks among others.

JavaScript mass spectrometry (JS-MS) 2.0, is a standalone visualization GUI software suite that provides a dependency-free, browser-based, one click, cross-platform solution for creating MS1 ground truth set of features (i.e., defined as raw data points manually curated into features, whether extracted ion chromatograms or isotopic envelopes) (Henning & Smith, 2020) . The tool enables loading, viewing, and navigating MS1 data in 2-and 3-dimensions, and adds tools for capturing, editing, saving, and viewing isotopic envelope and extracted isotopic chromatogram features. It further interfaces via Hypertext Transfer Protocol (HTTP) to the MsDataServer application programming interface (API) for access to the MS data stored in the MzTree format.

In this section, I discuss the databases (both spectral and structural) that have appeared or updated in 2020.

COlleCtion of Open Natural prodUcTs (COCONUT), is available as a webserver (with downloadable structural data on NPs) an aggregated dataset of NPs from different open resources and offers a subsequent web interface to browse, search and easily and quickly download NPs (Sorokina & Steinbeck, 2020) . The DB contains structures and sparse annotations for over 400,000 non-redundant NPs.

METLIN MS2, is chemical standards spectral DB that is well annotated and structurally diverse database consisting of over 850,000 chemical standards with MS/MS data generated in both positive and negative ionization modes at multiple collision energies (CEs), collectively containing over 4,000,000 curated HR MS/MS data that covers almost 1% of PubChem's 93 million compounds (Xue et al. 2020) .

is an open LC-MS/MS spectral library that currently contains over 1600 fragmentation spectra obtained from 435 authentic standards of endogenous metabolites and lipids (Phapale et al. 2021) . The EMBL-MCF spectral library is created and shared using an in-house developed web-application.

The Wake Forest CPM GC-MS spectral and RT libraries consist of HR EI-MS and HR chemical ionization (CI)-MS/MS spectra obtained from silylated chemical standards obtained from the Mass Spectrometry Metabolite Library of Standards (MSMLS Kit™) (B. B. Misra & Olivier, 2020) .

Chemical Shift Multiplet Database (CSMDB), is a database that uses JRES spectra obtained from the Birmingham Metabolite Library (BML), to provide scores by accounting for both matched and unmatched peaks from a query list and the database hits (Charris-Molina et al. 2020) . This input list is generated from a projection of a 2D statistical correlation analysis on the J-RESolved (JRES) spectra, p-[JRES-Statistical TOtal Correlation SpectroscopY (STOCSY)], being able to compare the multiplets for the matched peaks. The CSMDB is complemented with "consecutive queries to assess biological correlation" (ConQuer ABC), a simple inspection of peaks left unmatched from the query list and consecutive queries to assign all (or most) peaks in the original query list.

This section covers numerous tools that did not quite fall into the six categories listed above, and are developed with a purpose to address a specialized application to facilitate metabolomics data analysis. These tools include the ones developed for isotopic data analysis in stable isotope labelling experiments, softwares for analysis of lipidomics data, mass spectrometry imaging data, and multiomics/ integrated omics analysis.

, is a tool that uses MetaboliteDetector (https:// md. tu-bs. de/) and non-targeted tracer fate detection (NTFD) libraries (http:// ntfd. mit. edu/), combines the strengths of targeted and non-targeted efforts for estimation of metabolic flux changes in GC-MS datasets (Dudek et al. 2020) . In stable isotope labeling experimental data, MIAMI determines a mass isotopomer distribution-based (MID) similarity network and incorporates the data into metabolic reference networks and aids in the identification of MID variations of all labeled metabolites across conditions, targets of metabolic changes are detected.

isoSCAN, is an R-package that automatically quantifies all isotopologues of intermediate metabolites of glycolysis, tricarboxylic acid (TCA) cycle, amino acids, pentose phosphate pathway, and urea cycle, from low resolution (LR) MS and HRMS data (i.e., GC-chemical ionization -MS) in stable isotope labeling experiments (Capellades et al. 2020) .

LiPydomics, is available as a Python package which performs statistical and multivariate analyses ("stats" module), generates informative plots ("plotting" module), identifies lipid species at different confidence levels ("identification" module), and performs a text-based interface ("interactive" module) aiding in further interpretation (Ross et al. 2020) .

LipidCreator, is available both as a Skyline plugin and a standalone/command-line operation, is a lipid building block-based workbench and knowledgebase for semi-automatic generation of targeted lipidomics MS assays and in silico spectral libraries (Peng et al. 2020) . It can support diverse lipid categories, allows SRM/ parallel reaction monitoring (PRM) assay generation for both labeled and unlabeled lipid species and their derived fragment ions, allows in silico spectral library generation and CEs optimization and the entire workflow can be integrated into Konstanz Information Miner (KNIME™) and Galaxy workflows as a native node.

Lipid Annotator, is a standalone software for lipidomic analysis of data collected by HR LC-MS/MS (Koelmel et al. 2020) . Lipid Annotator algorithm, intended for lipid annotation based on in-silico libraries, consists of five general steps: feature finding, association of MS/MS scans with features, annotation of possible lipids for each feature, calculation of the percent abundance of each fatty acyl constituent under a single chromatographic peak in the case of mixed spectra, and filtration of final annotated features. Lipid Annotator can be used on large datasets for rapid annotation, relative quantification, and statistics (using a downstream workflow with commercial tools such as MassHunter Profinder (Agilent Technologies) and MassHunter Mass Profiler Professional softwares (Agilent Technologies).

Raman2imzML, available as an R-package is a converter that transforms Raman imaging data in text format exported from WiRe 5.2 (Renishaw) and FIVE 5.1 (WiTec) into the .imzML data format (Iakab et al. 2020) . The .mzML is a standardized common data format created and adopted by the mass spectrometry community and this tool exclusively handles imaging data for further exploratory imaging analysis.

Metabolomics datasets play an indispensable role in multi-omics data integration and analytics workflows as metabolites are the closest to the phenotype and helps connect with the genotype (Fiehn, 2002) . Recent efforts in multi-omics domain encompass harmonization of quality metrics and power calculation in multi-omic studies (Tarazona et al. 2020) to standardized data sharing guidelines (Krassowski et al. 2020) . A recent review introduced the tools for computational methods and resources in metabolomics and multiomics integration (Eicher et al. 2020) . Another review focused on metabolomics-centric integration of data for biomedical research (Wörheide et al. 2021 ). Integration of omics datasets, such as those of metabolomics and microbiome/ metagenomics present challenges of their own (B. B. Misra, 2020c) , and hence, more effective tools are necessary to address the challenges in this area. In this section, I capture a couple of the multi-omics tools developed in 2020.

, is a Shiny-based tool that enables mechanistic interpretation of steady-state metabolomics data by integrating transcriptomics or proteomics data with metabolomics datasets by helping capture enzyme activities estimated from transcriptomics or proteomics data by calculating changes in reaction rate potentials ). The tool offers several modules to perform PCA, differential expression analysis, pathway analysis, and network analysis. metPropagate, is a network-based approach that uses untargeted metabolomics data from a single patient and a group of controls to prioritize candidate genes in patients with suspected inborn errors of metabolism (IEMs) (Graham Linck et al. 2020 ). This approach determines whether metabolomic evidence could be used to prioritize the causative gene from this list of candidate genes, where each gene in a patient's candidate gene list is ranked using a per-gene metabolomic score termed the "metPropagate score", which represented the likely metabolic relevance of a particular gene to each patient.

In this section, I summarize the observed trends for the tools reported in 2020, which are: a. Majority of the software tools and packages focus on 'annotations', i.e., almost 35% of the total 72 tools reported for the year deal with untargeted metabolomics data annotation. b. 82% of the total tools reported are concerned with data analysis challenges with "LC-MS/MS", mostly untargeted LC-HRMS/MS efforts. c. Programming languages used for these tools mostly are R language packages (28 tools), Python language packages (11 tools), Java language (5 tools) or are webservers/ web-based tools (23 tools). d. 48% of the reported tools are 'easy to use' (click to start, web-based, or plug-and-play type tools) from a user stand point for community of biologists and chemists who are not computational savvy. e. Of the total tools reported here, 57% of the tools have a GitHub repository associated with them. f. Couple of tools are improved versions, suggesting these are active tools that are being developed/maintained. g. Lot of tools reported in the year deal with specialized applications: ranging from data integration (i.e., metabolomics data with proteomics/transcriptomics data), epidemiological metabolomics data, lipidomics, MSI data.

In summary, one can observe that there are numerous tools that were either developed from scratch or evolved from their previous versions in 2020 alone. Some tools and approaches found new applications, such as GNPS in the domain of GC-MS-based metabolomics , or released as a beta/ advanced version, i.e., MS-DIAL for lipidomics workflows. Only the future years will dictate as to which of these 2020 tools live on to see another year in terms of utility/ application, stays maintained and remain available, get improved, and get adopted by the metabolomics research community. Irrespective, all these tools help understanding metabolomics data from diverse stand points and are welcome additions to the community going forward into the big data-driven precision medicine era. In general, the trend is to develop, fast, computationally less intensive, robust, open-source, user-friendly tools that can adhere to findable, accessible, interoperable, and reproducible (FAIR) guidelines. Undoubtedly, the metabolomics research community needs more of these improved tools, and in the coming years the tools, resources, and databases will keep coming and getting better.

with their codes, packages, tools, and resources that enable the metabolomists, biologists and analytical chemists to keep pace with the volume and complexity of the metabolomics data generated. I do also apologize to all investigators whose tools and resources might have been missed in this review, inadvertently. I would like to acknowledge the independent reviewers and the editor for their comments to help improve this manuscript.

Funding None.

Conflict of interest None.

Ethical approval This article does not contain any studies with human and/or animal participants performed by the authors.

Research involving human and/or animal participants None.

Skyline for small molecules: A unifying software package for quantitative metabolomics

Auto-deconvolution and molecular networking of gas chromatography-mass spectrometry data

Spatial metabolomics and imaging mass spectrometry in the age of artificial intelligence

Reproducible molecular networking of untargeted mass spectrometry data using GNPS

Visualization and normalization of drift effect across batches in metabolome-wide association studies. biorx, 914051

Gazing into the Metaboverse: Automated exploration and contextualization of metabolic data. bioRxiv, 171850

Metabolite-Investigator: An integrated user-friendly workflow for metabolomics multi-study analysis

Phosphorus NMR and its application to metabolomics

Retip: Retention time prediction for compound annotation in untargeted metabolomics

Developing unique nontarget high-resolution mass spectrometry signatures to track contaminant sources in urban waters

Exploring the use of gas chromatography coupled to chemical ionization mass spectrometry (GC-CI-MS) for stable isotope labeling in metabolomics

FOBI: An ontology to represent food intake data and associate it with metabolomic data. Database: The journal of biological databases and curation

Consecutive queries to assess biological correlation in NMR metabolomics: Performance of comprehensive search of multiplets over typical 1D 1H NMR database search

MetaClean: A machine learning-based classifier for reduced false positive peak detection in untargeted LC-MS metabolomics data

MetENP/MetENPWeb: An R package and web application for metabolomics enrichment and pathway analysis in Metabolomics Workbench. bioRxiv

Viime: Visualization and Integration of Metabolomics Experiments

MetIDfyR: An open-source r package to decipher small-molecule drug metabolism through high-resolution mass spectrometry

Ramanguided subcellular pharmaco-metabolomics for metastatic melanoma cells

MIAMI--a tool for non-targeted detection of metabolic flux changes for mode of action identification

SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information

Systematic classification of unknown metabolites using high-resolution fragmentation mass spectra

Metabolomics and multi-omics integration: A survey of computational methods and resources

EpiMetal: An open-source graphical web browser tool for easy statistical analyses in epidemiology and metabolomics

MetFID: artificial neural network-based compound fingerprint prediction for metabolite annotation

Metabolomics-The link between genotypes and phenotypes

MS-CleanR: A feature-filtering workflow for untargeted LC-MS based metabolomics

Spatial metabolomics of in situ host-microbe interactions at the micrometre scale

metPropagate: network-guided propagation of metabolomic information for prioritization of metabolic disease genes

Comparison of full-scan, data-dependent, and data-independent acquisition modes in liquid chromatographymass spectrometry based untargeted metabolomics

Evaluation of significant features discovered from different data acquisition modes in mass spectrometry-based untargeted metabolomics

Evaluating and minimizing batch effects in metabolomics

patRoon: Open source software platform for environmental mass spectrometry based non-target screening

A web-based system for creating, viewing, and editing precursor mass spectrometry ground truth data

OMICtools: an informative directory for multi-omic data analysis

Comparison of software tools for liquid chromatography-high-resolution mass spectrometry data processing in nontarget screening of environmental samples

MolDiscovery: Learning Mass Spectrometry Fragmentation of Small Molecules

SUMMER, a shiny utility for metabolomics and multiomics exploratory research

Raman2imzML converts Raman imaging data into the standard mass spectrometry imaging format

Evaluation of single sample network inference methods for metabolomics-based systems medicine

ReDU: a framework to find and reanalyze public mass spectrometry data

A graph density-based strategy for features fusion from different peak extract software to achieve more metabolites in metabolic profiling from high-resolution mass spectrometry

Deep annotation of untargeted LC-MS metabolomics data with Binner

Signature mapping (SigMa): An efficient approach for processing complex human urine 1H NMR metabolomics data

Notame": Workflow for non-targeted LC-MS metabolic profiling

rawR -Direct access to raw mass spectrometry data in R. bioRxiv

Lipid annotator: Towards accurate annotation in non-targeted liquid chromatography high-resolution tandem mass spectrometry (LC-HRMS/MS) lipidomics using a rapid and user-friendly software

O/ 18 O-exchange mass spectrometry boosting the reliability of compound identification

CROP: correlation-based reduction of feature multiplicities in untargeted metabolomic data

State of the field in multi-omics research: From computational needs to data mining and sharing

Applying NMR compound identification using NMRfilter to match predicted to experimental data

Concepts and software package for efficient quality control in targeted metabolomics studies: MeTa-QuaC

SmartPeak automates targeted and quantitative metabolomics data processing

Combined nuclear magnetic resonance spectroscopy and mass spectrometry approaches for metabolomics

Sub-nanoliter metabolomics via mass spectrometry to characterize volume-limited samples

IP4M: an integrated platform for mass spectrometry-based metabolomics data mining

Machine learning applications for mass spectrometrybased metabolomics

Reference standardization for quantification and harmonization of large-scale metabolomics

MESSAR: Automated recommendation of metabolite substructures from tandem mass spectra

struct: An R/ Bioconductor-based framework for standardized metabolomics data analysis and beyond

Improved annotation of untargeted metabolomics data through buffer modifications that shift adduct mass and intensity

CPVA: A web-based metabolomic tool for chromatographic peak visualization and annotation

AlpsNMR: Asn R package for signal processing of fully untargeted NMR-based metabolomics

Mass spectrometry techniques in emerging pathogens studies: COVID-19 Perspectives

%polynova_2way: A SAS macro for implementation of mixed models for metabolomics data

Deep learning driven GC-MS library search and its application for metabolomics

AutoTuner: High fidelity and robust parameter selection for metabolomics data processing

Open-source software tools, databases, and resources for single-cell and single-cell-type metabolomics

Data normalization strategies in metabolomics: Current challenges, approaches, and tools

The connection and disconnection between microbiome and metabolome: A critical appraisal in clinical research

High resolution GC-orbitrap-MS metabolomics using both electron ionization and chemical ionization for analysis of human plasma

Updates in metabolomics tools and resources

lipidr: A software tool for data mining and analysis of lipidomics datasets

MZmine 2 manual

QSRR automator: A tool for automating retention time prediction in lipidomics and metabolomics

LipidLynxX: a data transfer hub to support integration of large scale lipidomics datasets

Software tools, databases and resources in metabolomics: Updates

LipidCreator workbench to probe the lipidomic landscape

Public LC-orbitrap tandem mass spectral library for metabolite identification

Metabolite AutoPlotter-an application to process and visualise metabolite data in the web browser

Deep metabolome: Applications of deep learning in metabolomics

RGCxGC toolbox: An R-package for data processing in comprehensive two-dimensional gas chromatography-mass spectrometry

Hierarchical clustering of MS/MS spectra from the firefly metabolome identifies new lucibufagin compounds

Automatic annotation and dereplication of tandem mass spectra of peptidic natural products

A python-based pipeline for preprocessing LC-MS data for untargeted metabolomics workflows

Ion Identity Molecular Networking in the GNPS Environment. bioRxiv, 088948

DropMS: Petroleomics data treatment based in web server for high-resolution mass spectrometry

LiPydomics: A python package for comprehensive prediction of lipid collision cross sections and retention times and analysis of ion mobility-mass spectrometry-based lipidomics data

Survey of metaproteomics software tools for functional microbiome analysis

Fast and sensitive flow-injection mass spectrometry metabolomics by analyzing sample-specific ion distributions

MFAssignR: Molecular formula assignment software for ultrahigh resolution mass spectrometry analysis of environmental complex mixtures

Review on natural products databases: where to find data in 2020

Characterization of monophasic solventbased tissue extractions for the detection of polar metabolites and lipids applying ultrahigh-performance liquid chromatography-mass spectrometry clinical metabolic phenotyping assays

Navigating freely-available software tools for metabolomics analysis

Imaging mass spectrometry for natural products discovery: A review of ionization methods

hRUV: Hierarchical approach to removal of unwanted variation for large-scale metabolomics data

Harmonization of quality metrics and power calculation in multi-omic studies

Highly sensitive feature detection for high resolution LC/MS

MRMkit: Automated data processing for large-scale targeted metabolomics analysis

Subcellular mass spectrometry imaging and absolute quantitative analysis across organelles

An enhanced isotopic fine structure method for exact mass analysis in discovery metabolomics: FIA-CASI-FTMS

Chemically informed analyses of metabolomics mass spectrometry data with Qemistree

MS-DIAL: Data-independent MS/MS deconvolution for comprehensive metabolome analysis

High-throughput fractionation coupled to mass spectrometry for improved quantitation in metabolomics

MetumpX-a metabolomics support package for untargeted mass spectrometry

Mass spectrometry searches using MASST

NPClassifier: A deep neural network-based structural classification tool for natural products

BALSAM-An interactive online platform for breath analysis visualization and classification

Current status of retention time prediction in metabolite identification

MetaboShiny: Interactive analysis and metabolite annotation of mass spectrometry-based metabolomics data

Multi-omics integration in biomedical research-A metabolomicscentric review

Targeted realignment of LC-MS profiles by neighbor-wise compound-specific graphical time warping with misalignment detection

Retrieving and utilizing hypothetical neutral losses from tandem mass spectra for spectral similarity analysis and unknown metabolite annotation

METLIN MS2 molecular standards database: a broad chemical and biological resource

NOREVA: Enhanced normalization and evaluation of timecourse and multi-class metabolomic data

Data-independent acquisition mass spectrometry-based proteomics and software tools: A glimpse in 2020

Mass spectrometry technologies to advance care for cancer patients in clinical and intraoperative use

Chemical derivatization in LC-MS-based metabolomics study

Development of a plasma pseudotargeted metabolomics method based on ultra-high-performance liquid chromatographymass spectrometry

Ion mobility collision cross-section atlas for known and unknown metabolite annotation in untargeted metabolomics

Acknowledgements I acknowledge the efforts of the informatics and computational resource developers who help drive the field forward