Airflow Summit 2025 is coming October 07-09. Register now for early bird ticket!

airflow.providers.snowflake.utils.openlineage

Attributes

log

Functions

fix_account_name(name)

Fix account name to have the following format: <account_id>.<region>.<cloud>.

fix_snowflake_sqlalchemy_uri(uri)

Fix snowflake sqlalchemy connection URI to OpenLineage structure.

emit_openlineage_events_for_snowflake_queries(...[, ...])

Emit OpenLineage events for executed Snowflake queries.

Module Contents

airflow.providers.snowflake.utils.openlineage.log[source]
airflow.providers.snowflake.utils.openlineage.fix_account_name(name)[source]

Fix account name to have the following format: <account_id>.<region>.<cloud>.

airflow.providers.snowflake.utils.openlineage.fix_snowflake_sqlalchemy_uri(uri)[source]

Fix snowflake sqlalchemy connection URI to OpenLineage structure.

Snowflake sqlalchemy connection URI has the following structure: ‘snowflake://<user_login_name>:<password>@<account_identifier>/<database_name>/<schema_name>?warehouse=<warehouse_name>&role=<role_name>’ We want account identifier normalized. It can have two forms: - newer, in form of <organization_id>-<account_id>. In this case we want to do nothing. - older, composed of <account_locator>.<region>.<cloud> where region and cloud can be optional in some cases. If <cloud> is omitted, it’s AWS. If region and cloud are omitted, it’s AWS us-west-1

Current doc on Snowflake account identifiers: https://docs.snowflake.com/en/user-guide/admin-account-identifier

airflow.providers.snowflake.utils.openlineage.emit_openlineage_events_for_snowflake_queries(task_instance, hook=None, query_ids=None, query_source_namespace=None, query_for_extra_metadata=False, additional_run_facets=None, additional_job_facets=None)[source]

Emit OpenLineage events for executed Snowflake queries.

Metadata retrieval from Snowflake is attempted only if get_extra_metadata is True and hook is provided. If metadata is available, execution details such as start time, end time, execution status, error messages, and SQL text are included in the events. If no metadata is found, the function defaults to using the Airflow task instance’s state and the current timestamp.

Note that both START and COMPLETE event for each query will be emitted at the same time. If we are able to query Snowflake for query execution metadata, event times will correspond to actual query execution times.

Args:

task_instance: The Airflow task instance that run these queries. hook: A supported Snowflake hook instance used to retrieve query metadata if available. If omitted, query_ids and query_source_namespace must be provided explicitly and query_for_extra_metadata must be False. query_ids: A list of Snowflake query IDs to emit events for, can only be None if hook is provided and hook.query_ids are present. query_source_namespace: The namespace to be included in ExternalQueryRunFacet, can be None only if hook is provided. query_for_extra_metadata: Whether to query Snowflake for additional metadata about queries. Must be False if hook is not provided. additional_run_facets: Additional run facets to include in OpenLineage events. additional_job_facets: Additional job facets to include in OpenLineage events.

Was this entry helpful?