Abstract
Single-cell RNA sequencing (scRNA-seq) data, annotated by cell type, is useful in a variety of downstream biological applications, such as profiling gene expression at the single-cell level. However, manually assigning these annotations with known marker genes is both time-consuming and subjective.
We present a Graph Convolutional Network (GCN)-based approach to automate the annotation process. Our process builds upon existing labeling approaches, using state-of-the-art tools to find cells with highly confident label assignments through consensus and spreading these confident labels with a semi-supervised GCN. Using simulated data and two scRNA-seq datasets from different tissues, we show that our method improves accuracy over a simple consensus algorithm and the average of the underlying tools. We also compare our method to a nonparametric neighbor majority approach, showing comparable results. We then demonstrate that our GCN method allows for feature interpretation, identifying important genes for cell type classification. We present our completed pipeline, written in PyTorch, as an end-to-end tool for automating and interpreting the classification of scRNA-seq data.
Our code for conducting the experiments in this paper and using our model is available at https://github.com/lewinsohndp/scSHARP.