Abstract
The accuracy of predicting protein local and global structural properties such as secondary structure and solvent accessible surface area has been stagnant for many years because of the challenge of accounting for non-local interactions between amino acid residues that are close in three-dimensional structural space but far from each other in their sequence positions. All existing machine-learning techniques relied on a sliding window of 10–20 amino acid residues to capture some ‘short to intermediate’ non-local interactions. Here, we employed Long Short-Term Memory (LSTM) Bidirectional Recurrent Neural Networks (BRNNs) which are capable of capturing long range interactions without using a window.
We showed that the application of LSTM-BRNN to the prediction of protein structural properties makes the most significant improvement for residues with the most long-range contacts (|i-j| >19) over a previous window-based, deep-learning method SPIDER2. Capturing long-range interactions allows the accuracy of three-state secondary structure prediction to reach 84% and the correlation coefficient between predicted and actual solvent accessible surface areas to reach 0.80, plus a reduction of 5%, 10%, 5% and 10% in the mean absolute error for backbone , ψ, θ and τ angles, respectively, from SPIDER2. More significantly, 27% of 182724 40-residue models directly constructed from predicted Cα atom-based θ and τ have similar structures to their corresponding native structures (6Å RMSD or less), which is 3% better than models built by and ψ angles. We expect the method to be useful for assisting protein structure and function prediction.
The method is available as a SPIDER3 server and standalone package at http://sparks-lab.org.
Supplementary data are available at Bioinformatics online.