Glow Top-Level Functions

glow.register(session, new_session=True)[source]

Register SQL extensions and py4j converters for a Spark session.

Parameters
  • session (SparkSession) – Spark session

  • new_session (bool) – If True, create a new Spark session using session.newSession() before registering extensions. This may be necessary if you’re using functions that register new analysis rules. The new session has isolated UDFs, configurations, and temporary tables, but shares the existing SparkContext and cached data.

Example

>>> import glow
>>> spark = glow.register(spark)
Return type

SparkSession

glow.transform(operation, df, arg_map=None, **kwargs)[source]

Apply a named transformation to a DataFrame of genomic data. All parameters apart from the input data and its schema are provided through the case-insensitive options map.

There are no bounds on what a transformer may do. For instance, it’s legal for a transformer to materialize the input DataFrame.

Parameters
  • operation (str) – Name of the operation to perform

  • df (DataFrame) – The input DataFrame

  • arg_map (Optional[Dict[str, Any]]) – A string -> any map of arguments

  • kwargs (Any) – Named arguments. If the arg_map is not specified, transformer args will be pulled from these keyword args.

Example

>>> df = spark.read.format('vcf').load('test-data/1kg_sample.vcf')
>>> piped_df = glow.transform('pipe', df, cmd=["cat"], input_formatter='vcf', output_formatter='vcf', in_vcf_header='infer')
Return type

DataFrame

Returns

The transformed DataFrame