Motivation

Factor analysis is a widely used tool for unsupervised dimensionality reduction of high-throughput datasets in molecular biology, with recently proposed extensions designed specifically for spatial transcriptomics data. However, these methods expect (count) matrices as data input and are therefore not directly applicable to single molecule resolution data, which are in the form of coordinate lists annotated with genes and provide insight into subcellular spatial expression patterns. To address this, we here propose FISHFactor, a probabilistic factor model that combines the benefits of spatial, non-negative factor analysis with a Poisson point process likelihood to explicitly model and account for the nature of single molecule resolution data. In addition, FISHFactor shares information across a potentially large number of cells in a common weight matrix, allowing consistent interpretation of factors across cells and yielding improved latent variable estimates.

Results

We compare FISHFactor to existing methods that rely on aggregating information through spatial binning and cannot combine information from multiple cells and show that our method leads to more accurate results on simulated data. We show that our method is scalable and can be readily applied to large datasets. Finally, we demonstrate on a real dataset that FISHFactor is able to identify major subcellular expression patterns and spatial gene clusters in a data-driven manner.

Availability and implementation

The model implementation, data simulation and experiment scripts are available under https://www.github.com/bioFAM/FISHFactor.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.