Configuration Reference

This page contains the list of all available Airflow configurations for the apache-airflow-providers-openlineage provider that can be set in the airflow.cfg file or using environment variables.

Note

The configuration embedded in provider packages started to be used as of Airflow 2.7.0. Previously the configuration was described and configured in the Airflow core package - so if you are using Airflow below 2.7.0, look at Airflow documentation for the list of available configuration options that were available in Airflow core.

Note

For more information see Setting Configuration Options.

[openlineage]

This section applies settings for OpenLineage integration. More about configuration and it’s precedence can be found at https://airflow.apache.org/docs/apache-airflow-providers-openlineage/stable/guides/user.html#transport-setup

config_path

Specify the path to the YAML configuration file. This ensures backwards compatibility with passing config through the openlineage.yml file.

Type:

string

Default:

''

Environment Variable:

AIRFLOW__OPENLINEAGE__CONFIG_PATH

Example:

full/path/to/openlineage.yml

custom_run_facets

Added in version 1.10.0.

Register custom run facet functions by passing a string of semicolon separated full import paths.

Type:

string

Default:

''

Environment Variable:

AIRFLOW__OPENLINEAGE__CUSTOM_RUN_FACETS

Example:

full.path.to.custom_facet_function;full.path.to.another_custom_facet_function

dag_state_change_process_pool_size

Added in version 1.8.0.

Number of processes to utilize for processing DAG state changes in an asynchronous manner within the scheduler process.

Type:

integer

Default:

1

Environment Variable:

AIRFLOW__OPENLINEAGE__DAG_STATE_CHANGE_PROCESS_POOL_SIZE

debug_mode

Added in version 1.11.0.

If true, OpenLineage events will include information useful for debugging - potentially containing large fields e.g. all installed packages and their versions.

Type:

boolean

Default:

False

Environment Variable:

AIRFLOW__OPENLINEAGE__DEBUG_MODE

disable_source_code

Disable the inclusion of source code in OpenLineage events by setting this to true. By default, several Operators (e.g. Python, Bash) will include their source code in the events unless disabled.

Type:

boolean

Default:

False

Environment Variable:

AIRFLOW__OPENLINEAGE__DISABLE_SOURCE_CODE

disabled

Disable sending events without uninstalling the OpenLineage Provider by setting this to true.

Type:

boolean

Default:

False

Environment Variable:

AIRFLOW__OPENLINEAGE__DISABLED

disabled_for_operators

Added in version 1.1.0.

Exclude some Operators from emitting OpenLineage events by passing a string of semicolon separated full import paths of Operators to disable.

Type:

string

Default:

''

Environment Variable:

AIRFLOW__OPENLINEAGE__DISABLED_FOR_OPERATORS

Example:

airflow.providers.standard.operators.bash.BashOperator; airflow.providers.standard.operators.python.PythonOperator

execution_timeout

Added in version 1.9.0.

Maximum amount of time (in seconds) that OpenLineage can spend executing metadata extraction.

Type:

integer

Default:

10

Environment Variable:

AIRFLOW__OPENLINEAGE__EXECUTION_TIMEOUT

extractors

Register custom OpenLineage Extractors by passing a string of semicolon separated full import paths.

Type:

string

Default:

None

Environment Variable:

AIRFLOW__OPENLINEAGE__EXTRACTORS

Example:

full.path.to.ExtractorClass;full.path.to.AnotherExtractorClass

include_full_task_info

Added in version 1.10.0.

If true, OpenLineage event will include full task info - potentially containing large fields.

Type:

boolean

Default:

False

Environment Variable:

AIRFLOW__OPENLINEAGE__INCLUDE_FULL_TASK_INFO

namespace

Set namespace that the lineage data belongs to, so that if you use multiple OpenLineage producers, events coming from them will be logically separated.

Type:

string

Default:

None

Environment Variable:

AIRFLOW__OPENLINEAGE__NAMESPACE

Example:

my_airflow_instance_1

selective_enable

Added in version 1.7.0.

If this setting is enabled, OpenLineage integration won’t collect and emit metadata, unless you explicitly enable it per DAG or Task using enable_lineage method.

Type:

boolean

Default:

False

Environment Variable:

AIRFLOW__OPENLINEAGE__SELECTIVE_ENABLE

spark_inject_parent_job_info

Added in version 2.0.0.

Automatically inject OpenLineage’s parent job (namespace, job name, run id) information into Spark application properties for supported Operators.

Type:

boolean

Default:

False

Environment Variable:

AIRFLOW__OPENLINEAGE__SPARK_INJECT_PARENT_JOB_INFO

transport

Pass OpenLineage Client transport configuration as JSON string. It should contain type of the transport and additional options (different for each transport type). For more details see: https://openlineage.io/docs/client/python/#built-in-transport-types

Currently supported types are:

  • HTTP

  • Kafka

  • Console

  • File

Type:

string

Default:

''

Environment Variable:

AIRFLOW__OPENLINEAGE__TRANSPORT

Example:

{"type": "http", "url": "http://localhost:5000", "endpoint": "api/v1/lineage"}

Was this entry helpful?