Glow Top-Level Functions

glow.register(session, new_session=True)[source]

Register SQL extensions and py4j converters for a Spark session.

Parameters

session (SparkSession) – Spark session
new_session (bool) – If True, create a new Spark session using session.newSession() before registering extensions. This may be necessary if you’re using functions that register new analysis rules. The new session has isolated UDFs, configurations, and temporary tables, but shares the existing SparkContext and cached data.

Example

>>> import glow
>>> spark = glow.register(spark)

Return type: SparkSession

glow.transform(operation, df, arg_map=None, **kwargs)[source]

Apply a named transformation to a DataFrame of genomic data. All parameters apart from the input data and its schema are provided through the case-insensitive options map.

There are no bounds on what a transformer may do. For instance, it’s legal for a transformer to materialize the input DataFrame.

Parameters

operation (str) – Name of the operation to perform
df (DataFrame) – The input DataFrame
arg_map (Optional[Dict[str, Any]]) – A string -> any map of arguments
kwargs (Any) – Named arguments. If the arg_map is not specified, transformer args will be pulled from these keyword args.

Example

>>> df = spark.read.format('vcf').load('test-data/1kg_sample.vcf')
>>> piped_df = glow.transform('pipe', df, cmd=["cat"], input_formatter='vcf', output_formatter='vcf', in_vcf_header='infer')

Return type: DataFrame
Returns: The transformed DataFrame