Predicting the survival time of a cancer patient based on his/her genome-wide gene expression remains a challenging problem. For certain types of cancer, the effects of gene expression on survival are both weak and abundant, so identifying non-zero effects with reasonable accuracy is difficult. As an alternative to methods that use variable selection, we propose a Gaussian process accelerated failure time model to predict survival time using genome-wide or pathway-wide gene expression data. Using a Monte Carlo expectation–maximization algorithm, we jointly impute censored log-survival time and estimate model parameters. We demonstrate the performance of our method and its advantage over existing methods in both simulations and real data analysis. The real data that we analyze were collected from 513 patients with kidney renal clear cell carcinoma and include survival time, demographic/clinical variables, and expression of more than 20 000 genes. In addition to the right-censored survival time, our method can also accommodate left-censored or interval-censored outcomes; and it provides a natural way to combine multiple types of high-dimensional -omics data. An R package implementing our method is available in the Supplementary material available at Biostatistics online.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (