Glow Logo
  • Introduction to Glow
  • Getting Started
  • GWAS Tutorial
  • Customizing Glow
  • Variant Data Manipulation
    • Data Simulation
    • Read and Write VCF, Plink, and BGEN with Spark
    • Read Genome Annotations (GFF3) as a Spark DataFrame
    • Create a Genomics Delta Lake
    • Variant Quality Control
    • Sample Quality Control
    • Liftover
    • Variant Normalization
    • Split Multiallelic Variants
    • Merging Variant Datasets
    • Utility Functions
  • Tertiary Analysis
  • Troubleshooting
  • Contributing
  • Blog Posts
  • Additional Resources
  • Python API
Glow
  • Variant Data Manipulation
  • View page source

Variant Data Manipulation

Glow offers functionalities to extract, transform and load (ETL) genomic variant data into Spark DataFrames, enabling manipulation, filtering, quality control and transformation between file formats.

  • Data Simulation
    • Simulate Covariates & Phenotypes
    • Simulate Genotypes
  • Read and Write VCF, Plink, and BGEN with Spark
    • VCF
    • BGEN
    • PLINK
    • Manually defining read schema
  • Read Genome Annotations (GFF3) as a Spark DataFrame
    • Schema
  • Create a Genomics Delta Lake
    • Explode pVCF variant dataframe and write to Delta Lake
    • Create database for variants and annotations
    • Query variant database
  • Variant Quality Control
    • Notebook
  • Sample Quality Control
    • Computing user-defined sample QC metrics
  • Liftover
    • Create a liftOver cluster
    • Coordinate liftOver
    • Variant liftOver
  • Variant Normalization
    • normalize_variants Transformer
    • Usage
    • Options
    • normalize_variant Function
  • Split Multiallelic Variants
    • Options
    • Usage
  • Merging Variant Datasets
    • Aggregating INFO fields
    • Joint genotyping
  • Utility Functions
    • Struct transformations
    • Spark ML transformations
    • Variant data transformations
Previous Next

© Copyright 2019, Glow Authors.

Built with Sphinx using a theme provided by Read the Docs.