Motivation

Bioinformatic analysis of single-cell gene expression data is a rapidly evolving field. Hundreds of bespoke methods have been developed in the past few years to deal with various aspects of single-cell analysis and consensus on the most appropriate methods to use under different settings is still emerging. Benchmarking the many methods is therefore of critical importance and since analysis of single-cell data usually involves multi-step pipelines, effective evaluation of pipelines involving different combinations of methods is required. Current benchmarks of single-cell methods are mostly implemented with ad-hoc code that is often difficult to reproduce or extend, and exhaustive manual coding of many combinations is infeasible in most instances. Therefore, new software is needed to manage pipeline benchmarking.

Results

The CellBench R software facilitates method comparisons in either a task-centric or combinatorial way to allow pipelines of methods to be evaluated in an effective manner. CellBench automatically runs combinations of methods, provides facilities for measuring running time and delivers output in tabular form which is highly compatible with tidyverse R packages for summary and visualization. Our software has enabled comprehensive benchmarking of single-cell RNA-seq normalization, imputation, clustering, trajectory analysis and data integration methods using various performance metrics obtained from data with available ground truth. CellBench is also amenable to benchmarking other bioinformatics analysis tasks.

Availability and implementation

Available from https://bioconductor.org/packages/CellBench.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.