Supported classes¶
Below is a list of Operators and Hooks that support OpenLineage extraction, along with specific DB types that are compatible with the supported SQL operators.
Important
While we strive to keep the list of supported classes current, please be aware that our updating process is automated and may not always capture everything accurately. Detecting hook level lineage is challenging so make sure to double check the information provided below.
What does “supported operator” mean?¶
All Airflow operators will automatically emit OpenLineage events, (unless explicitly disabled or skipped during scheduling, like EmptyOperator) regardless of whether they appear on the “supported” list. Every OpenLineage event will contain basic information such as:
Task and DAG run metadata (execution time, state, tags, parameters, owners, description, etc.)
Job relationship (DAG job that the task belongs to, upstream/downstream relationship between tasks in a DAG etc.)
Error message (in case of task failure)
Airflow and OpenLineage provider versions
“Supported” operators provide additional metadata that enhances the lineage information:
Input and output datasets (sometimes with Column Level Lineage)
Operator-specific details that may include SQL query text and query IDs, source code, job IDs from external systems (e.g., Snowflake or BigQuery job ID), data quality metrics and other information.
For example, a supported SQL operator will include the executed SQL query, query ID, and input/output table information in its OpenLineage events. An unsupported operator will still appear in the lineage graph, but without these details.
Tip
You can easily implement OpenLineage support for any operator. See Implementing OpenLineage in Operators.
Hook Level Lineage¶
Some operators (like PythonOperator) function as a “black box”
capable of running arbitrary code, which usually prevents the extraction of input/output datasets. To address this,
Airflow tracks hook-level lineage: when a supported hook method is invoked (even from within a Python callable)
the OpenLineage integration can automatically capture lineage from that execution. For example, reading a file
through a storage hook can report the file as an input dataset, while writing to an object store can report an
output dataset.
For hooks that execute SQL (mostly subclasses of DbApiHook),
the integration can go further. Besides recording which assets were read or written (by using SQL parsing),
it may also extract the executed SQL text, external query/job IDs. For each query a separate pair of child OpenLineage
events is emitted.
Important
The level of detail captured varies between hooks and methods. Some may only report dataset information, while others expose SQL text, query IDs and more. Review the hook implementation to confirm what lineage data is available.
Spark operators¶
SQL operators¶
Operators inheriting from BaseSQLOperator may be supported
out of the box. These operators can use SQL parsing and may query DB for lineage extraction.
To extract unique data from each database type, a dedicated Hook implementing OpenLineage methods is required.
Not all subclasses of BaseSQLOperator are automatically supported,
only those also using a supported Hook and similar attribute naming convention (e.g., storing query under self.sql).
Important
The level of OpenLineage extraction may vary between SQL operators. Most will provide the executed SQL text, while others may also expose additional metadata such as query IDs or other query-related information. Due to the automatic generation of this documentation, some operators listed as supported SQL operators may not contain full lineage information. Please review the implementation of your operator and its corresponding hook to confirm the level of OpenLineage support.
Currently, the following databases (hooks) are supported together with SQL operators:
BigQuery (via
BigQueryHook)Databricks (via
DatabricksSqlHook)MsSql (via
MsSqlHook)MySql (via
MySqlHook)Oracle (via
OracleHook)PgVector (via
PgVectorHook)Postgres (via
PostgresHook)Redshift (via
RedshiftSQLHook)Snowflake (via
SnowflakeHook)SnowflakeApi (via
SnowflakeSqlApiHook)Spanner (via
SpannerHook)Trino (via
TrinoHook)
Providers¶
The operators and hooks listed below from each provider are natively equipped with OpenLineage support.
apache-airflow-providers-amazon (9.22.0)¶
Operators¶
Hooks*¶
-
get_df()get_df_by_chunks()get_first()get_pandas_df()get_pandas_df_by_chunks()get_records()insert_rows()run()test_connection()
-
get_df()get_df_by_chunks()get_first()get_pandas_df()get_pandas_df_by_chunks()get_records()insert_rows()run()test_connection()
apache-airflow-providers-apache-drill (3.3.0)¶
Hooks*¶
-
get_df()get_df_by_chunks()get_first()get_pandas_df()get_pandas_df_by_chunks()get_records()run()test_connection()
apache-airflow-providers-apache-druid (4.5.0)¶
Hooks*¶
-
get_df()get_df_by_chunks()get_first()get_pandas_df()get_pandas_df_by_chunks()get_records()run()test_connection()
apache-airflow-providers-apache-hive (9.3.0)¶
Hooks*¶
-
get_df_by_chunks()get_first()get_pandas_df_by_chunks()insert_rows()run()test_connection()
apache-airflow-providers-apache-impala (1.9.0)¶
Hooks*¶
-
get_df()get_df_by_chunks()get_first()get_pandas_df()get_pandas_df_by_chunks()get_records()insert_rows()run()test_connection()
apache-airflow-providers-apache-pinot (4.10.0)¶
Hooks*¶
-
get_df()get_df_by_chunks()get_pandas_df()get_pandas_df_by_chunks()run()test_connection()
apache-airflow-providers-common-io (1.7.1)¶
Operators¶
apache-airflow-providers-common-sql (1.32.0)¶
SQL operators*¶
Hooks*¶
apache-airflow-providers-databricks (7.10.0)¶
Operators¶
SQL operators*¶
Hooks*¶
-
get_df()get_df_by_chunks()get_first()get_pandas_df()get_pandas_df_by_chunks()get_records()insert_rows()test_connection()
apache-airflow-providers-dbt-cloud (4.6.5)¶
Operators¶
apache-airflow-providers-elasticsearch (6.5.0)¶
Hooks*¶
-
get_df()get_df_by_chunks()get_first()get_pandas_df()get_pandas_df_by_chunks()get_records()insert_rows()run()test_connection()
apache-airflow-providers-exasol (4.10.0)¶
SQL operators*¶
Hooks*¶
-
get_df()get_df_by_chunks()get_pandas_df_by_chunks()insert_rows()test_connection()
apache-airflow-providers-ftp (3.14.1)¶
Operators¶
apache-airflow-providers-google (20.0.0)¶
Operators¶
SQL operators*¶
Hooks*¶
-
get_df_by_chunks()get_first()get_pandas_df_by_chunks()run()test_connection()
-
get_df()get_df_by_chunks()get_first()get_pandas_df()get_pandas_df_by_chunks()get_records()insert_rows()run()test_connection()
apache-airflow-providers-jdbc (5.4.0)¶
Hooks*¶
-
get_df()get_df_by_chunks()get_first()get_pandas_df()get_pandas_df_by_chunks()get_records()insert_rows()run()test_connection()
apache-airflow-providers-microsoft-mssql (4.5.0)¶
Hooks*¶
-
get_df()get_df_by_chunks()get_first()get_pandas_df()get_pandas_df_by_chunks()get_records()insert_rows()run()test_connection()
apache-airflow-providers-mysql (6.5.0)¶
Hooks*¶
-
get_df()get_df_by_chunks()get_first()get_pandas_df()get_pandas_df_by_chunks()get_records()insert_rows()run()test_connection()
apache-airflow-providers-odbc (4.12.0)¶
Hooks*¶
-
get_df()get_df_by_chunks()get_first()get_pandas_df()get_pandas_df_by_chunks()get_records()insert_rows()run()test_connection()
apache-airflow-providers-openlineage (2.11.0)¶
Operators¶
apache-airflow-providers-oracle (4.5.0)¶
Hooks*¶
-
get_df()get_df_by_chunks()get_pandas_df()get_pandas_df_by_chunks()run()test_connection()
apache-airflow-providers-pgvector (1.7.0)¶
SQL operators*¶
Hooks*¶
-
bulk_dump()bulk_load()copy_expert()get_df()get_df_by_chunks()get_first()get_pandas_df()get_pandas_df_by_chunks()get_records()insert_rows()run()test_connection()
apache-airflow-providers-postgres (6.6.0)¶
Hooks*¶
-
get_df_by_chunks()get_first()get_pandas_df()get_pandas_df_by_chunks()get_records()run()test_connection()
apache-airflow-providers-presto (5.11.0)¶
Hooks*¶
-
get_df()get_df_by_chunks()get_pandas_df_by_chunks()run()test_connection()
apache-airflow-providers-sftp (5.7.0)¶
Operators¶
apache-airflow-providers-snowflake (6.10.0)¶
Operators¶
SQL operators*¶
Hooks*¶
-
get_df()get_df_by_chunks()get_first()get_pandas_df()get_pandas_df_by_chunks()get_records()insert_rows()test_connection()
-
get_df()get_df_by_chunks()get_first()get_pandas_df()get_pandas_df_by_chunks()get_records()insert_rows()run()test_connection()
apache-airflow-providers-sqlite (4.3.0)¶
Hooks*¶
-
get_df()get_df_by_chunks()get_first()get_pandas_df()get_pandas_df_by_chunks()get_records()insert_rows()run()test_connection()
apache-airflow-providers-teradata (3.5.0)¶
SQL operators*¶
Hooks*¶
-
get_df()get_df_by_chunks()get_first()get_pandas_df()get_pandas_df_by_chunks()get_records()insert_rows()run()test_connection()
apache-airflow-providers-trino (6.5.0)¶
Hooks*¶
-
get_df()get_df_by_chunks()get_pandas_df_by_chunks()test_connection()
apache-airflow-providers-vertica (4.3.0)¶
Hooks*¶
-
get_df()get_df_by_chunks()get_first()get_pandas_df()get_pandas_df_by_chunks()get_records()insert_rows()test_connection()
apache-airflow-providers-ydb (2.5.0)¶
SQL operators*¶
Hooks*¶
-
get_df()get_df_by_chunks()get_first()get_pandas_df()get_pandas_df_by_chunks()get_records()insert_rows()run()test_connection()