Comparisons of microbiome communities across populations are often based on pairwise distance measures (beta-diversity). Standard analyses (principal coordinate plots, permutation tests, kernel methods) require access to primary data if another investigator wants to add or compare independent data. We propose using standard reference measurements to simplify microbiome beta-diversity analyses, to make them more transparent, and to facilitate independent validation and comparisons across studies.


Using stool and nasal reference sets from the Human Microbiome Project (HMP), we computed mean distances (actually Bray-Curtis or Pearson correlation dissimilarities) to each reference set for each new sample. Thus, each new sample has two mean distances that can be plotted and analyzed with classical statistical methods. To test the approach, we studied independent (not reference) HMP subjects. Simple Hotelling tests demonstrated statistically significant differences in mean distances to reference sets between all pairs of body sites (stool, skin, nasal, saliva and vagina) at the phylum, class, order, family and genus levels. Using the distance to a single reference set was usually sufficient, but using both reference sets always worked well. The use of reference sets simplifies standard analyses of beta-diversity and facilitates the independent validation and combining of such data because others can compute distances to the same reference sets. Moreover, standard statistical methods for survival analysis, logistic regression and other procedures can be applied to vectors of mean distances to reference sets, thereby greatly expanding the potential uses of beta-diversity information. More work is needed to identify the best reference sets for particular applications.

Availability and implementation

Supplementary information

Supplementary data are available at Bioinformatics online.

This work is written by US Government employees and is in the public domain in the US.