Abstract
Advances in RNA sequencing technologies have achieved an unprecedented accuracy in the quantification of mRNA isoforms, but our knowledge of isoform-specific functions has lagged behind. There is a need to understand the functional consequences of differential splicing, which could be supported by the generation of accurate and comprehensive isoform-specific gene ontology annotations.
We present isoform interpretation, a method that uses expectation–maximization to infer isoform-specific functions based on the relationship between sequence and functional isoform similarity. We predicted isoform-specific functional annotations for 85 617 isoforms of 17 900 protein-coding human genes spanning a range of 17 430 distinct gene ontology terms. Comparison with a gold-standard corpus of manually annotated human isoform functions showed that isoform interpretation significantly outperforms state-of-the-art competing methods. We provide experimental evidence that functionally related isoforms predicted by isoform interpretation show a higher degree of domain sharing and expression correlation than functionally related genes. We also show that isoform sequence similarity correlates better with inferred isoform function than with gene-level function.
Source code, documentation, and resource files are freely available under a GNU3 license at https://github.com/TheJacksonLaboratory/isopretEM and https://zenodo.org/record/7594321.