Microbes are essential components in the ecosystem and participate in most biological procedures in environments. The high-throughput sequencing technologies help researchers directly quantify the abundance of microbes in a natural environment. Microbiome studies explore the construction, stability, and function of microbial communities with the aid of sequencing technology. However, sequencing technologies only provide relative abundances of microbes, and this kind of data is called compositional data in statistics. The constraint of the constant-sum requires flexible statistical methods for analyzing microbiome data. Current statistical analysis of compositional data mainly focuses on one compositional vector such as bacterial communities. The fungi are also an important component in microbial communities and are always measured by sequencing internal transcribed spacer instead of 16S rRNA genes for bacteria. The different sequencing methods between fungi and bacteria bring two compositional vectors in microbiome studies.


We propose a novel statistical method, called gmcoda, based on an additive logistic normal distribution for estimating the partial correlation matrix for cross-domain interactions. A majorization–minimization algorithm is proposed to solve the optimization problem involved in gmcoda. Through simulation studies, gmcoda is demonstrated to work well in estimating partial correlations between two compositional vectors. Gmcoda is also applied to infer cross-domain interactions in a real microbiome dataset and finds potential interactions between bacteria and fungi.

Availability and implementation

Gmcoda is open source and freely available from under GNU LGPL v3.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.