Motivation

Given multi-platform genome data with prior knowledge of functional gene sets, how can we extract interpretable latent relationships between patients and genes? More specifically, how can we devise a tensor factorization method which produces an interpretable gene factor matrix based on functional gene set information while maintaining the decomposition quality and speed?

Results

We propose GIFT, a Guided and Interpretable Factorization for Tensors. GIFT provides interpretable factor matrices by encoding prior knowledge as a regularization term in its objective function. We apply GIFT to the PanCan12 dataset (TCGA multi-platform genome data) and compare the performance with P-Tucker, our baseline method without prior knowledge constraint, and Silenced-TF, our naive interpretable method. Results show that GIFT produces interpretable factorizations with high scalability and accuracy. Furthermore, we demonstrate how results of GIFT can be used to reveal significant relations between (cancer, gene sets, genes) and validate the findings based on literature evidence.

Availability and implementation

The code and datasets used in the paper are available at https://github.com/leesael/GIFT.

Supplementary information

Supplementary data are available at Bioinformatics online.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected]