Hail Interoperation
Glow includes functionality to enable conversion between a Hail MatrixTable and a Spark DataFrame, similar to one created with the native Glow datasources.
Create a Hail cluster
To use the Hail interoperation functions, you need Hail to be installed on the cluster. On a Databricks cluster, install Hail with an environment variable. See the Hail installation documentation to install Hail in other setups.
Convert to a Glow DataFrame
Convert from a Hail MatrixTable to a Glow-compatible DataFrame with the function from_matrix_table
.
from glow.hail import functions
df = functions.from_matrix_table(mt, include_sample_ids=True)
By default, the genotypes contain sample IDs. To remove the sample IDs, set the parameter include_sample_ids=False
.
Schema mapping
The Glow DataFrame variant fields are derived from the Hail MatrixTable row fields.
Required |
Glow DataFrame variant field |
Hail MatrixTable row field |
---|---|---|
Yes |
|
|
Yes |
|
|
Yes |
|
|
Yes |
|
|
No |
|
|
No |
|
|
No |
|
|
No |
|
|
No |
|
|
The Glow DataFrame genotype sample IDs are derived from the Hail MatrixTable column fields.
All of the other Glow DataFrame genotype fields are derived from the Hail MatrixTable entry fields.
Glow DataFrame genotype field |
Hail MatrixTable entry field |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|