Glow Top-Level Functions
- glow.register(session, new_session=True)[source]
Register SQL extensions and py4j converters for a Spark session.
- Parameters
session (
SparkSession
) – Spark sessionnew_session (
bool
) – IfTrue
, create a new Spark session usingsession.newSession()
before registering extensions. This may be necessary if you’re using functions that register new analysis rules. The new session has isolated UDFs, configurations, and temporary tables, but shares the existingSparkContext
and cached data.
Example
>>> import glow >>> spark = glow.register(spark)
- Return type
SparkSession
- glow.transform(operation, df, arg_map=None, **kwargs)[source]
Apply a named transformation to a DataFrame of genomic data. All parameters apart from the input data and its schema are provided through the case-insensitive options map.
There are no bounds on what a transformer may do. For instance, it’s legal for a transformer to materialize the input DataFrame.
- Parameters
Example
>>> df = spark.read.format('vcf').load('test-data/1kg_sample.vcf') >>> piped_df = glow.transform('pipe', df, cmd=["cat"], input_formatter='vcf', output_formatter='vcf', in_vcf_header='infer')
- Return type
DataFrame
- Returns
The transformed DataFrame