Motivation

Even within well-studied organisms, many genes lack useful functional annotations. One way to generate such functional information is to infer biological relationships between genes/proteins, using a network of gene coexpression data that includes functional annotations. However, the lack of trustworthy functional annotations can impede the validation of such networks. Hence, there is a need for a principled method to construct gene coexpression networks that capture biological information and are structurally stable even in the absence of functional information.

Results

We introduce the concept of signed distance correlation as a measure of dependency between two variables, and apply it to generate gene coexpression networks. Distance correlation offers a more intuitive approach to network construction than commonly used methods, such as Pearson correlation and mutual information. We propose a framework to generate self-consistent networks using signed distance correlation purely from gene expression data, with no additional information. We analyse data from three different organisms to illustrate how networks generated with our method are more stable and capture more biological information compared to networks obtained from Pearson correlation or mutual information.

Availability and implementation

Code is available online (https://github.com/javier-pardodiaz/sdcorGCN).

Supplementary information

Supplementary data are available at Bioinformatics online.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.