Data Simulation

These data simulation notebooks generate phenotypes, covariates and genotypes at a user-defined scale. This dataset can be used for integration and scale-testing.

Simulate Covariates & Phenotypes

This data simulation notebooks uses Pandas to simulate quantitative and binary phenotypes and covariates. Please ensure n_samples is the same as the genotype simulation notebook.

Simulate Genotypes

This data simulation notebook downloads chromosomes 21 and 22 from the 1000 Genomes Project, and returns a Delta Lake table with simulated genotypes for n_samples and n_variants, maintaining hardy-weinberg equilibrium and allele frequency for each variant.