Motivation

Stitching together trans-omics data is a powerful approach to assess the complex mechanisms of cancer occurrence, progression and treatment. However, the integration process suffers from the ‘block missing’ phenomena when part of individuals lacks some omics data.

Results

We proposed a k-nearest neighbor (kNN) weighted imputation method for trans-omics block missing data (TOBMIkNN) to handle gene-absence individuals in RNA-seq datasets using external information obtained from DNA methylation probe datasets. Referencing to multi-hot deck, mean imputation and missing cases deletion, we assess the relative error, absolute error, inter-omics correlation structure change and variable selection.

The proposed method, TOBMIkNN reliably imputed RNA-seq data by borrowing information from DNA methylation data, and showed superiority over the other three methods in imputation error and stability of correlation structure. Our study indicates that TOBMIkNN can be used as an advisable method for trans-omics block missing data imputation.

Availability and implementation

TOBMIkNN is freely available at https://github.com/XuesiDong/TOBMI.

Supplementary information

Supplementary data are available at Bioinformatics online.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)