Motivation

In the analysis of RNA-Seq data, detecting differentially expressed (DE) genes has been a hot research area in recent years and many methods have been proposed. DE genes show different average expression levels in different sample groups, and thus can be important biological markers. While generally very successful, these methods need to be further tailored and improved for cancerous data, which often features quite diverse expression in the samples from the cancer group, and this diversity is much larger than that in the control group.

Results

We propose a statistical method that can detect not only genes that show different average expressions, but also genes that show different diversities of expressions in different groups. These ‘differentially dispersed’ genes can be important clinical markers. Our method uses a redescending penalty on the quasi-likelihood function, and thus has superior robustness against outliers and other noise. Simulations and real data analysis demonstrate that DiPhiSeq outperforms existing methods in the presence of outliers, and identifies unique sets of genes.

Availability and implementation

DiPhiSeq is publicly available as an R package on CRAN: https://cran.r-project.org/package=DiPhiSeq.

Supplementary information

Supplementary data are available at Bioinformatics online.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)