Motivation

Single-cell DNA methylation sequencing can assay DNA methylation at single-cell resolution. However, incomplete coverage compromises related downstream analyses, outlining the importance of imputation techniques. With a rising number of cell samples in recent large datasets, scalable and efficient imputation models are critical to addressing the sparsity for genome-wide analyses.

Results

We proposed a novel graph-based deep learning approach to impute methylation matrices based on locus-aware neighboring subgraphs with locus-aware encoding orienting on one cell type. Merely using the CpGs methylation matrix, the obtained GraphCpG outperforms previous methods on datasets containing more than hundreds of cells and achieves competitive performance on smaller datasets, with subgraphs of predicted sites visualized by retrievable bipartite graphs. Besides better imputation performance with increasing cell number, it significantly reduces computation time and demonstrates improvement in downstream analysis.

Availability and implementation

The source code is freely available at https://github.com/yuzhong-deng/graphcpg.git.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.