Logic regression, an extension of generalized linear models with Boolean combinations of binary variables as predictors, is a useful tool in exploring interactions among single-nucleotide polymorphisms (SNPs) in genome-wide association studies. However, since the search space defined by all possible combinations of SNPs, their complements, and logical operators in Boolean expressions can be exceedingly large in such studies, objective function optimization is slow and likely to be trapped in many local solutions, resulting in model over-fitting. We introduce a new search algorithm, parallel repulsive logic regression (PRLR), to efficiently estimate parameters of a logic regression to find a best model within the large space of SNP interactions by incorporating: (i) relevant biological adjacency matrix between SNPs to define similarity of estimation paths or trees, which are derived from physical SNP positions on chromosomes and/or memberships in biological gene pathways; and (ii) two repulsive forces to counter the similarity between and within estimation paths considered in parallel, which are introduced as penalty terms in the objective function. We compare our method’s performance for identifying biologically-meaningful SNP interactions through simulations and with real genetic-epidemiological data. PRLR’s detection-accuracy measures outperform existing approaches, especially in terms of positive predictive value and sensitivity for detecting SNP–SNP interactions.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)