Summary

Heterogeneous knowledge graphs (KGs) have enabled the modeling of complex systems, from genetic interaction graphs and protein-protein interaction networks to networks representing drugs, diseases, proteins, and side effects. Analytical methods for KGs rely on quantifying similarities between entities, such as nodes, in the graph. However, such methods must consider the diversity of node and edge types contained within the KG via, for example, defined sequences of entity types known as meta-paths. We present metapaths, the first R software package to implement meta-paths and perform meta-path-based similarity search in heterogeneous KGs. The metapaths package offers various built-in similarity metrics for node pair comparison by querying KGs represented as either edge or adjacency lists, as well as auxiliary aggregation methods to measure set-level relationships. Indeed, evaluation of these methods on an open-source biomedical KG recovered meaningful drug and disease-associated relationships, including those in Alzheimer’s disease. The metapaths framework facilitates the scalable and flexible modeling of network similarities in KGs with applications across KG learning.

Availability and implementation

The metapaths R package is available via GitHub at https://github.com/ayushnoori/metapaths and is released under MPL 2.0 (Zenodo DOI: 10.5281/zenodo.7047209). Package documentation and usage examples are available at https://www.ayushnoori.com/metapaths.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.