airflow.providers.databricks.utils.openlineage¶
Attributes¶
Functions¶
|
Emit OpenLineage events for executed Databricks queries. |
Module Contents¶
- airflow.providers.databricks.utils.openlineage.emit_openlineage_events_for_databricks_queries(task_instance, hook=None, query_ids=None, query_source_namespace=None, query_for_extra_metadata=False, additional_run_facets=None, additional_job_facets=None)[source]¶
Emit OpenLineage events for executed Databricks queries.
Metadata retrieval from Databricks is attempted only if get_extra_metadata is True and hook is provided. If metadata is available, execution details such as start time, end time, execution status, error messages, and SQL text are included in the events. If no metadata is found, the function defaults to using the Airflow task instance’s state and the current timestamp.
Note that both START and COMPLETE event for each query will be emitted at the same time. If we are able to query Databricks for query execution metadata, event times will correspond to actual query execution times.
- Args:
task_instance: The Airflow task instance that run these queries. hook: A supported Databricks hook instance used to retrieve query metadata if available. If omitted, query_ids and query_source_namespace must be provided explicitly and query_for_extra_metadata must be False. query_ids: A list of Databricks query IDs to emit events for, can only be None if hook is provided and hook.query_ids are present (DatabricksHook does not store query_ids). query_source_namespace: The namespace to be included in ExternalQueryRunFacet, can be None only if hook is provided. query_for_extra_metadata: Whether to query Databricks for additional metadata about queries. Must be False if hook is not provided. additional_run_facets: Additional run facets to include in OpenLineage events. additional_job_facets: Additional job facets to include in OpenLineage events.