Translation is a key biological process controlled in eukaryotes by the initiation AUG codon. Variations affecting this codon may have pathological consequences by disturbing the correct initiation of translation. Unfortunately, there is no systematic study describing these variations in the human genome. Moreover, we aimed to develop new tools for in silico prediction of the pathogenicity of gene variations affecting AUG codons, because to date, these gene defects have been wrongly classified as missense.


Whole-exome analysis revealed the mean of 12 gene variations per person affecting initiation codons, mostly with high (>0.01) minor allele frequency (MAF). Moreover, analysis of Ensembl data (December 2017) revealed 11 261 genetic variations affecting the initiation AUG codon of 7205 genes. Most of these variations (99.5%) have low or unknown MAF, probably reflecting deleterious consequences. Only 62 variations had high MAF. Genetic variations with high MAF had closer alternative AUG downstream codons than did those with low MAF. Besides, the high-MAF group better maintained both the signal peptide and reading frame. These differentiating elements could help to determine the pathogenicity of this kind of variation.

Availability and implementation

Data and scripts in Perl and R are freely available at

Supplementary information

Supplementary data are available at Bioinformatics online.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (