Hail Interoperation

Glow includes functionality to enable conversion between a Hail MatrixTable and a Spark DataFrame, similar to one created with the native Glow datasources.

Create a Hail cluster

To use the Hail interoperation functions, you need Hail to be installed on the cluster. On a Databricks cluster, install Hail with an environment variable. See the Hail installation documentation to install Hail in other setups.

Convert to a Glow DataFrame

Convert from a Hail MatrixTable to a Glow-compatible DataFrame with the function from_matrix_table.

from glow.hail import functions
df = functions.from_matrix_table(mt, include_sample_ids=True)

By default, the genotypes contain sample IDs. To remove the sample IDs, set the parameter include_sample_ids=False.

Schema mapping

The Glow DataFrame variant fields are derived from the Hail MatrixTable row fields.

Required

Glow DataFrame variant field

Hail MatrixTable row field

Yes

contigName

locus.contig

Yes

start

locus.position - 1

Yes

end

info.END or locus.position - 1 + len(alleles[0])

Yes

referenceAllele

alleles[0]

No

alternateAlleles

alleles[1:]

No

names

[rsid, varid]

No

qual

qual

No

filters

filters

No

INFO_<ANY_FIELD>

info.<ANY_FIELD>

The Glow DataFrame genotype sample IDs are derived from the Hail MatrixTable column fields.

All of the other Glow DataFrame genotype fields are derived from the Hail MatrixTable entry fields.

Glow DataFrame genotype field

Hail MatrixTable entry field

phased

GT.phased

calls

GT.alleles

depth

DP

filters

FT

genotypeLikelihoods

GL

phredLikelihoods

PL

posteriorProbabilities

GP

conditionalQuality

GQ

haplotypeQualities

HQ

expectedAlleleCounts

EC

mappingQuality

MQ

alleleDepths

AD

<ANY_FIELD>

<ANY_FIELD>