By Lerato E. Magosi, Scott Hazelhurst and the Wits Bioinformatics team
witsGWAS is a simple human GWAS analysis workflow built at the Sydney Brenner Institute for data quality control (QC) and basic association testing. It takes away the need for having to enter individual commands at the unix prompt and rather organizes GWAS tasks sequentially (facilitated via Ruffus) for submission to a distributed PBS Torque cluster (managed via Rubra). Furthermore, witsGWAS monitors (using flag files) the progress of jobs/tasks submitted to the cluster on behalf of the user, courteously waiting for one job to finish before sending another one.
Features
QC of Affymetrix array data (SNP6 raw .CEL files)
- genotype calling
- converting birdseed calls to plink format
Sample and SNP QC of PLINK Binaries
Sample QC tasks checking:
- discordant sex information
- calculating missingness
- heterozygosity scores
- relatedness
SNP QC tasks checking:
- minor allele frequencies
- SNP missingness
- differential missingness
- Hardy Weinberg Equilibrium deviations
Association testing
- Basic PLINK association tests, producing manhattan and qqplots
- CMH association test - Association analysis, accounting for clusters
Getting started
- Read up about its Usage and Configuration on the wiki
- Need a refresher, see the cheat sheet
- Got some ideas, feel free to fork and contribute your modifications
- Have questions, see FAQ
References
Anderson, C. et al. Data quality control in genetic case-control association studies. Nature Protocols. 5, 1564-1573, 2010
Sloggett, Clare; Wakefield, Matthew; Philip, Gayle; Pope, Bernard (2014): Rubra - flexible distributed pipelines. figshare. http://dx.doi.org/10.6084/m9.figshare.895626
License
The code is available under the MIT license.