Abstract
The number of significantly associated regions reported in genome-wide association studies (GWAS) for polygenic traits typically increases with sample size. A traditional tool for quality control and identification of significant regions has been a visual inspection of how significant and correlated genetic variants cluster within a region. However, while inspecting hundreds of regions, this subjective method can misattribute significance to some loci or neglect others that are significant.
The GWAS quality score (GQS) identifies suspicious regions and prevents erroneous interpretations with an objective, quantitative and automated method. The GQS assesses all measured single nucleotide polymorphisms (SNPs) that are linked by inheritance to each other [linkage disequilibrium (LD)] and compares the significance of trait association of each SNP to its LD value for the reported index SNP. A GQS value of 1.0 ascribes a high level of confidence to the entire region and its underlying gene(s), while GQS values <1.0 indicate the need to closely inspect the outliers. We applied the GQS to published and non-published genome-wide summary statistics and report suspicious regions requiring secondary inspection while supporting the majority of reported regions from large-scale published meta-analyses.
The GQS code/scripts can be cloned from GitHub (https://github.com/Xswapnil/GQS/). The analyst can use whole-genome summary statistics to estimate GQS for each defined region. We also provide an online tool (http://35.227.18.38/) that gives access to the GQS. The quantitative measure of quality attributes by GQS and its visualization is an objective method that enhances the confidence of each genomic hit.
Supplementary data are available at Bioinformatics online.