Abstract
Interpretable deep learning (DL) models that can provide biological insights, in addition to accurate predictions, are of great interest to the biomedical community. Recently, interpretable DL models that incorporate signaling pathways have been proposed for drug response prediction (DRP). While these models improve interpretability, it is unclear whether this comes at the cost of less accurate DRPs, or a prediction improvement can also be obtained.
We comprehensively and systematically assessed four state-of-the-art interpretable DL models using three pathway collections to assess their ability in making accurate predictions on unseen samples from the same dataset, as well as their generalizability to an independent dataset. Our results showed that models that explicitly incorporate pathway information in the form of a latent layer perform worse compared to models that incorporate this information implicitly. However, in most evaluation setups, the best performance was achieved using a black-box multilayer perceptron, and the performance of a random forests baseline was comparable to those of the interpretable models. Replacing the signaling pathways with randomly generated pathways showed a comparable performance for the majority of the models. Finally, the performance of all models deteriorated when applied to an independent dataset. These results highlight the importance of systematic evaluation of newly proposed models using carefully selected baselines. We provide different evaluation setups and baseline models that can be used to achieve this goal.
Implemented models and datasets are provided at https://doi.org/10.5281/zenodo.7787178 and https://doi.org/10.5281/zenodo.7101665, respectively.