By Lerato E. Magosi, Scott Hazelhurst and the Wits Bioinformatics team

witsGWAS is a simple human GWAS analysis workflow built at the Sydney Brenner Institute for data quality control (QC) and basic association testing. It takes away the need for having to enter individual commands at the unix prompt and rather organizes GWAS tasks sequentially (facilitated via Ruffus) for submission to a distributed PBS Torque cluster (managed via Rubra). Furthermore, witsGWAS monitors (using flag files) the progress of jobs/tasks submitted to the cluster on behalf of the user, courteously waiting for one job to finish before sending another one.

Features

QC of Affymetrix array data (SNP6 raw .CEL files)

  1. genotype calling
  2. converting birdseed calls to plink format

Sample and SNP QC of PLINK Binaries

Sample QC tasks checking:

  1. discordant sex information
  2. calculating missingness
  3. heterozygosity scores
  4. relatedness

SNP QC tasks checking:

  1. minor allele frequencies
  2. SNP missingness
  3. differential missingness
  4. Hardy Weinberg Equilibrium deviations

Association testing

  1. Basic PLINK association tests, producing manhattan and qqplots
  2. CMH association test - Association analysis, accounting for clusters

Getting started

References

Anderson, C. et al. Data quality control in genetic case-control association studies. Nature Protocols. 5, 1564-1573, 2010

Sloggett, Clare; Wakefield, Matthew; Philip, Gayle; Pope, Bernard (2014): Rubra - flexible distributed pipelines. figshare. http://dx.doi.org/10.6084/m9.figshare.895626

License

The code is available under the MIT license.