Abstract
Low-rank matrix completion has been demonstrated to be powerful in predicting antigenic distances among influenza viruses and vaccines from partially revealed hemagglutination inhibition table. Meanwhile, influenza hemagglutinin (HA) protein sequences are also effective in inferring antigenic distances. Thus, it is natural to integrate HA protein sequence information into low-rank matrix completion model to help infer influenza antigenicity, which is critical to influenza vaccine development.
We have proposed a novel algorithm called biological matrix completion with side information (BMCSI), which first measures HA protein sequence similarities among influenza viruses (especially on epitopes) and then integrates the similarity information into a low-rank matrix completion model to predict influenza antigenicity. This algorithm exploits both the correlations among viruses and vaccines in serological tests and the power of HA sequence in predicting influenza antigenicity. We applied this model into H3N2 seasonal influenza virus data. Comparing to previous methods, we significantly reduced the prediction root-mean-square error in a 10-fold cross validation analysis. Based on the cartographies constructed from imputed data, we showed that the antigenic evolution of H3N2 seasonal influenza is generally S-shaped while the genetic evolution is half-circle shaped. We also showed that the Spearman correlation between genetic and antigenic distances (among antigenic clusters) is 0.83, demonstrating a globally high correspondence and some local discrepancies between influenza genetic and antigenic evolution. Finally, we showed that 4.4%±1.2% genetic variance (corresponding to 3.11 ± 1.08 antigenic distances) caused an antigenic drift event for H3N2 influenza viruses historically.
The software and data for this study are available at http://bi.sky.zstu.edu.cn/BMCSI/.
Supplementary data are available at Bioinformatics online.