Valid estimation of a causal effect using instrumental variables requires that all of the instruments are independent of the outcome conditional on the risk factor of interest and any confounders. In Mendelian randomization studies with large numbers of genetic variants used as instruments, it is unlikely that this condition will be met. Any given genetic variant could be associated with a large number of traits, all of which represent potential pathways to the outcome which bypass the risk factor of interest. Such pleiotropy can be accounted for using standard multivariable Mendelian randomization with all possible pleiotropic traits included as covariates. However, the estimator obtained in this way will be inefficient if some of the covariates do not truly sit on pleiotropic pathways to the outcome. We present a method that uses regularization to identify which out of a set of potential covariates need to be accounted for in a Mendelian randomization analysis in order to produce an efficient and robust estimator of a causal effect. The method can be used in the case where individual-level data are not available and the analysis must rely on summary-level data only. It can be used where there are any number of potential pleiotropic covariates up to the number of genetic variants less one. We show the results of simulation studies that demonstrate the performance of the proposed regularization method in realistic settings. We also illustrate the method in an applied example which looks at the causal effect of urate plasma concentration on coronary heart disease.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.