Troubleshooting =============== - Job is slow or OOMs (throws an ``OutOfMemoryError``) while using an aggregate like ``collect_list`` or ``sample_call_summary_stats`` * Try disabling the `ObjectHashAggregate `_ by setting ``spark.sql.execution.useObjectHashAggregateExec`` to ``false`` - Job is slow or OOMs while writing to partitioned table * This error can occur when reading from highly compressed files. Try decreasing ``spark.files.maxPartitionBytes`` to a smaller value like ``33554432`` (32MB) - My VCF looks weird after merging VCFs and saving with ``bigvcf`` * When saving to a VCF, the samples in the genotypes array must be in the same order for each row. This ordering is not guaranteed when using ``collect_list`` to join multiple VCFs. Try sorting the array using ``sort_array``. - Glow's behavior changed after a release * See the Glow `release notes `_. If the Glow release involved a Spark version change, see the `Spark migration guide `_. - ``com.databricks.sql.io.FileReadException: Error while reading file`` * When Glow is registered to access transform functions this also overrides the Spark Context. This can interfere with the checkpointing functionality in Delta Lake in a Databricks environment. To resolve please reset the runtime configurations via ``spark.sql("RESET")`` after running Glow transform functions and before checkpointing to Delta Lake, then try again.