Motivation

Chromatin immunoprecipitation followed by sequencing (ChIP-seq) can detect read-enriched DNA loci for point-source (e.g. transcription factor binding) and broad-source factors (e.g. various histone modifications). Although numerous quality metrics for ChIP-seq data have been developed, the ‘peaks’ thus obtained are still difficult to assess with respect to signal-to-noise ratio (S/N) and the percentage of false positives.

Results

We developed a quality-assessment tool for ChIP-seq data, strand-shift profile (SSP), which quantifies S/N and peak reliability without peak calling. We validated SSP in-depth using  ≥ 1000 publicly available ChIP-seq datasets along with virtual data to demonstrate that SSP provides a quantifiable and sensitive score to different S/Ns for both point- and broad-source factors, which can be standardized across diverse cell types and read depths. SSP also provides an effective criterion to judge whether a specific normalization or a rejection is required for each sample, which cannot be estimated by quality metrics currently available. Finally, we show that ‘hidden-duplicate reads’ cause aberrantly high S/Ns, and SSP provides an additional metric to avoid them, which can also contribute to estimation of peak mode (point- or broad-source) of samples.

Availability and implementation

SSP is open source software written in C++ and can be downloaded at https://github.com/rnakato/SSP.

Supplementary information

Supplementary data are available at Bioinformatics online.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)