Motivation

Liquid chromatography coupled with high-resolution mass spectrometry is widely used in composition profiling in untargeted metabolomics research. While retaining complete sample information, mass spectrometry (MS) data naturally have the characteristics of high dimensionality, high complexity, and huge data volume. In mainstream quantification methods, none of the existing methods can perform direct 3D analysis on lossless profile MS signals. All software simplify calculations by dimensionality reduction or lossy grid transformation, ignoring the full 3D signal distribution of MS data and resulting in inaccurate feature detection and quantification.

Results

On the basis that the neural network is effective for high-dimensional data analysis and can discover implicit features from large amounts of complex data, in this work, we propose 3D-MSNet, a novel deep learning-based model for untargeted feature extraction. 3D-MSNet performs direct feature detection on 3D MS point clouds as an instance segmentation task. After training on a self-annotated 3D feature dataset, we compared our model with nine popular software (MS-DIAL, MZmine 2, XCMS Online, MarkerView, Compound Discoverer, MaxQuant, Dinosaur, DeepIso, PointIso) on two metabolomics and one proteomics public benchmark datasets. Our 3D-MSNet model outperformed other software with significant improvement in feature detection and quantification accuracy on all evaluation datasets. Furthermore, 3D-MSNet has high feature extraction robustness and can be widely applied to profile MS data acquired with various high-resolution mass spectrometers with various resolutions.

Availability and implementation

3D-MSNet is an open-source model and is freely available at https://github.com/CSi-Studio/3D-MSNet under a permissive license. Benchmark datasets, training dataset, evaluation methods, and results are available at https://doi.org/10.5281/zenodo.6582912.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.