GWAS Tutorial

This quickstart tutorial shows how to perform genome-wide association studies using Glow.

Glow implements a distributed version of the Regenie method. Regenie’s domain of applicability falls in analyzing data with extreme case/control imbalances, rare variants and/or diverse populations. Therefore it is suited for working with population-scale biobank exome or genome sequencing data.


Other bioinformatics libraries for GWAS can be distributed using the Glow Pipe Transformer.

You can view html versions of the notebooks and download them from the bottom of this page.

The notebooks are written in Python, with some visualization in R.


We recommend running the Data Simulation notebooks first to prepare data for this tutorial before trying with your own data.


Please sort phenotypes and covariates by sample ID in the same order as genotypes.

1. Quality Control

The first notebook in this series prepares data by performing standard quality control procedures on simulated genotype data.

2. Glow Whole Genome Regression (GloWGR)

GloWGR implements a distributed version of the Regenie method. Please review the Regenie paper in Nature Genetics and the Regenie Github repo before implementing this method on real data.

3. Regression

The GloWGR notebook calculated offsets that are used in the genetic association study below to control for population structure and relatedness.

Quality control

Quantitative glow whole genome regression

Linear regression

Binary glow whole genome regression

Logistic regression