Troubleshooting
Job is slow or OOMs (throws an
OutOfMemoryError) while using an aggregate likecollect_listorsample_call_summary_statsTry disabling the ObjectHashAggregate by setting
spark.sql.execution.useObjectHashAggregateExectofalse
Job is slow or OOMs while writing to partitioned table
This error can occur when reading from highly compressed files. Try decreasing
spark.files.maxPartitionBytesto a smaller value like33554432(32MB)
My VCF looks weird after merging VCFs and saving with
bigvcfWhen saving to a VCF, the samples in the genotypes array must be in the same order for each row. This ordering is not guaranteed when using
collect_listto join multiple VCFs. Try sorting the array usingsort_array.
Glow’s behavior changed after a release
See the Glow release notes. If the Glow release involved a Spark version change, see the Spark migration guide.
com.databricks.sql.io.FileReadException: Error while reading fileWhen Glow is registered to access transform functions this also overrides the Spark Context. This can interfere with the checkpointing functionality in Delta Lake in a Databricks environment. To resolve please reset the runtime configurations via
spark.sql("RESET")after running Glow transform functions and before checkpointing to Delta Lake, then try again.