The identification and discovery of phenotypes from high content screening images is a challenging task. Earlier works use image analysis pipelines to extract biological features, supervised training methods or generate features with neural networks pretrained on non-cellular images. We introduce a novel unsupervised deep learning algorithm to cluster cellular images with similar Mode-of-Action (MOA) together using only the images’ pixel intensity values as input. It corrects for batch effect during training. Importantly, our method does not require the extraction of cell candidates and works from the entire images directly.


The method achieves competitive results on the labeled subset of the BBBC021 dataset with an accuracy of 97.09% for correctly classifying the MOA by nearest neighbors matching. Importantly, we can train our approach on unannotated datasets. Therefore, our method can discover novel MOAs and annotate unlabeled compounds. The ability to train end-to-end on the full resolution images makes our method easy to apply and allows it to further distinguish treatments by their effect on proliferation.

Availability and implementation

Our code is available at

Supplementary information

Supplementary data are available at Bioinformatics online.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (