Motivation

Clustering is an unsupervised method for identifying structure in unlabelled data. In the context of cytometry, it is typically used to categorize cells into subpopulations of similar phenotypes. However, clustering is greatly dependent on hyperparameters and the data to which it is applied as each algorithm makes different assumptions and generates a different ‘view’ of the dataset. As such, the choice of clustering algorithm can significantly influence results, and there is often not one preferred method but different insights to be obtained from different methods. To overcome these limitations, consensus approaches are needed that directly address the effect of competing algorithms. To the best of our knowledge, consensus clustering algorithms designed specifically for the analysis of cytometry data are lacking.

Results

We present a novel ensemble clustering methodology based on geometric median clustering with weighted voting (GeoWaVe). Compared to graph ensemble clustering methods that have gained popularity in single-cell RNA sequencing analysis, GeoWaVe performed favourably on different sets of high-dimensional mass and flow cytometry data. Our findings provide proof of concept for the power of consensus methods to make the analysis, visualization and interpretation of cytometry data more robust and reproducible. The wide availability of ensemble clustering methods is likely to have a profound impact on our understanding of cellular responses, clinical conditions and therapeutic and diagnostic options.

Availability and implementation

GeoWaVe is available as part of the CytoCluster package https://github.com/burtonrj/CytoCluster and published on the Python Package Index https://pypi.org/project/cytocluster. Benchmarking data described are available from https://doi.org/10.5281/zenodo.7134723.

Supplementary information

Supplementary data are available at Bioinformatics online.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.