DNA methylation is a key epigenetic factor regulating gene expression. While promoter methylation has been well studied, recent publications have revealed that functionally important methylation also occurs in intergenic and distal regions, and varies across genes and tissue types. Given the growing importance of inter-platform integrative genomic analyses, there is an urgent need to develop methods to discover and characterize gene-level relationships between methylation and expression.


We introduce a novel sequential penalized regression approach to identify methylation-expression quantitative trait loci (methyl-eQTLs), a term that we have coined to represent, for each gene and tissue type, a sparse set of CpG loci best explaining gene expression and accompanying weights indicating direction and strength of association. Using TCGA and MD Anderson colorectal cohorts to build and validate our models, we demonstrate our strategy better explains expression variability than current commonly used gene-level methylation summaries. The methyl-eQTLs identified by our approach can be used to construct gene-level methylation summaries that are maximally correlated with gene expression for use in integrative models, and produce a tissue-specific summary of which genes appear to be strongly regulated by methylation. Our results introduce an important resource to the biomedical community for integrative genomics analyses involving DNA methylation.

Availability and implementation

We produce an R Shiny app ( that interactively presents methyl-eQTL results for colorectal, breast and pancreatic cancer. The source R code for this work is provided in the Supplementary Material.

Supplementary information

Supplementary data are available at Bioinformatics online.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (