Configuration Reference

This page contains the list of all available Airflow configurations for the apache-airflow-providers-openlineage provider that can be set in the airflow.cfg file or using environment variables.

Note

The configuration embedded in provider packages started to be used as of Airflow 2.7.0. Previously the configuration was described and configured in the Airflow core package - so if you are using Airflow below 2.7.0, look at Airflow documentation for the list of available configuration options that were available in Airflow core.

Note

For more information see Setting Configuration Options.

Sections:

[openlineage]

This section applies settings for OpenLineage integration. More about configuration and it’s precedence can be found at https://airflow.apache.org/docs/apache-airflow-providers-openlineage/stable/guides/user.html#transport-setup

config_path

Specify the path to the YAML configuration file. This ensures backwards compatibility with passing config through the openlineage.yml file.

Type

string

Default

''

Environment Variable

AIRFLOW__OPENLINEAGE__CONFIG_PATH

Example

full/path/to/openlineage.yml

custom_run_facets

New in version 1.10.0.

Register custom run facet functions by passing a string of semicolon separated full import paths.

Type

string

Default

''

Environment Variable

AIRFLOW__OPENLINEAGE__CUSTOM_RUN_FACETS

Example

full.path.to.custom_facet_function;full.path.to.another_custom_facet_function

dag_state_change_process_pool_size

New in version 1.8.0.

Number of processes to utilize for processing DAG state changes in an asynchronous manner within the scheduler process.

Type

integer

Default

1

Environment Variable

AIRFLOW__OPENLINEAGE__DAG_STATE_CHANGE_PROCESS_POOL_SIZE

debug_mode

New in version 1.11.0.

If true, OpenLineage events will include information useful for debugging - potentially containing large fields e.g. all installed packages and their versions.

Type

boolean

Default

False

Environment Variable

AIRFLOW__OPENLINEAGE__DEBUG_MODE

disable_source_code

Disable the inclusion of source code in OpenLineage events by setting this to true. By default, several Operators (e.g. Python, Bash) will include their source code in the events unless disabled.

Type

boolean

Default

False

Environment Variable

AIRFLOW__OPENLINEAGE__DISABLE_SOURCE_CODE

disabled

Disable sending events without uninstalling the OpenLineage Provider by setting this to true.

Type

boolean

Default

False

Environment Variable

AIRFLOW__OPENLINEAGE__DISABLED

disabled_for_operators

New in version 1.1.0.

Exclude some Operators from emitting OpenLineage events by passing a string of semicolon separated full import paths of Operators to disable.

Type

string

Default

''

Environment Variable

AIRFLOW__OPENLINEAGE__DISABLED_FOR_OPERATORS

Example

airflow.providers.standard.operators.bash.BashOperator; airflow.providers.standard.operators.python.PythonOperator

execution_timeout

New in version 1.9.0.

Maximum amount of time (in seconds) that OpenLineage can spend executing metadata extraction.

Type

integer

Default

10

Environment Variable

AIRFLOW__OPENLINEAGE__EXECUTION_TIMEOUT

extractors

Register custom OpenLineage Extractors by passing a string of semicolon separated full import paths.

Type

string

Default

None

Environment Variable

AIRFLOW__OPENLINEAGE__EXTRACTORS

Example

full.path.to.ExtractorClass;full.path.to.AnotherExtractorClass

include_full_task_info

New in version 1.10.0.

If true, OpenLineage event will include full task info - potentially containing large fields.

Type

boolean

Default

False

Environment Variable

AIRFLOW__OPENLINEAGE__INCLUDE_FULL_TASK_INFO

namespace

Set namespace that the lineage data belongs to, so that if you use multiple OpenLineage producers, events coming from them will be logically separated.

Type

string

Default

None

Environment Variable

AIRFLOW__OPENLINEAGE__NAMESPACE

Example

my_airflow_instance_1

selective_enable

New in version 1.7.0.

If this setting is enabled, OpenLineage integration won’t collect and emit metadata, unless you explicitly enable it per DAG or Task using enable_lineage method.

Type

boolean

Default

False

Environment Variable

AIRFLOW__OPENLINEAGE__SELECTIVE_ENABLE

spark_inject_parent_job_info

New in version 1.15.0.

Automatically inject OpenLineage’s parent job (namespace, job name, run id) information into Spark application properties for supported Operators.

Type

boolean

Default

False

Environment Variable

AIRFLOW__OPENLINEAGE__SPARK_INJECT_PARENT_JOB_INFO

transport

Pass OpenLineage Client transport configuration as JSON string. It should contain type of the transport and additional options (different for each transport type). For more details see: https://openlineage.io/docs/client/python/#built-in-transport-types

Currently supported types are:

  • HTTP

  • Kafka

  • Console

  • File

Type

string

Default

''

Environment Variable

AIRFLOW__OPENLINEAGE__TRANSPORT

Example

{"type": "http", "url": "http://localhost:5000", "endpoint": "api/v1/lineage"}

Was this entry helpful?