Motivation

The quality of fragment library determines the efficiency of fragment assembly, an approach that is widely used in most de novo protein-structure prediction algorithms. Conventional fragment libraries are constructed mainly based on the identities of amino acids, sometimes facilitated by predicted information including dihedral angles and secondary structures. However, it remains challenging to identify near-native fragment structures with low sequence homology.

Results

We introduce a novel fragment-library-construction algorithm, LRFragLib, to improve the detection of near-native low-homology fragments of 7–10 residues, using a multi-stage, flexible selection protocol. Based on logistic regression scoring models, LRFragLib outperforms existing techniques by achieving a significantly higher precision and a comparable coverage on recent CASP protein sets in sampling near-native structures. The method also has a comparable computational efficiency to the fastest existing techniques with substantially reduced memory usage.

Availability and Implementation

The source code is available for download at http://166.111.152.91/Downloads.html

Supplementary information

Supplementary data are available at Bioinformatics online.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)