Glow Top-Level Functions

glow.glow.register(session: pyspark.sql.session.SparkSession)[source]

Register SQL extensions and py4j converters for a Spark session.

Parameters

session – Spark session

Example

>>> import glow
>>> glow.register(spark)
glow.glow.transform(operation: str, df: pyspark.sql.dataframe.DataFrame, arg_map: Dict[str, Any] = None, **kwargs: Any) → pyspark.sql.dataframe.DataFrame[source]

Apply a named transformation to a DataFrame of genomic data. All parameters apart from the input data and its schema are provided through the case-insensitive options map.

There are no bounds on what a transformer may do. For instance, it’s legal for a transformer to materialize the input DataFrame.

Parameters
  • operation – Name of the operation to perform

  • df – The input DataFrame

  • arg_map – A string -> any map of arguments

  • kwargs – Named arguments. If the arg_map is not specified, transformer args will be pulled from these keyword args.

Example

>>> df = spark.read.format('vcf').load('test-data/1kg_sample.vcf')
>>> piped_df = glow.transform('pipe', df, cmd=["cat"], input_formatter='vcf', output_formatter='vcf', in_vcf_header='infer')
Returns

The transformed DataFrame