Abstract
The amount of genomic data is increasing exponentially. Using many genotyped and phenotyped individuals for genomic prediction is appealing yet challenging.
We present SLEMM (short for Stochastic-Lanczos-Expedited Mixed Models), a new software tool, to address the computational challenge. SLEMM builds on an efficient implementation of the stochastic Lanczos algorithm for REML in a framework of mixed models. We further implement SNP weighting in SLEMM to improve its predictions. Extensive analyses on seven public datasets, covering 19 polygenic traits in three plant and three livestock species, showed that SLEMM with SNP weighting had overall the best predictive ability among a variety of genomic prediction methods including GCTA’s empirical BLUP, BayesR, KAML, and LDAK’s BOLT and BayesR models. We also compared the methods using nine dairy traits of ∼300k genotyped cows. All had overall similar prediction accuracies, except that KAML failed to process the data. Additional simulation analyses on up to 3 million individuals and 1 million SNPs showed that SLEMM was advantageous over counterparts as for computational performance. Overall, SLEMM can do million-scale genomic predictions with an accuracy comparable to BayesR.
The software is available at https://github.com/jiang18/slemm.