Configuration Reference¶
This page contains the list of all available Airflow configurations for the
apache-airflow-providers-openlineage
provider that can be set in the airflow.cfg
file or using environment variables.
Note
The configuration embedded in provider packages started to be used as of Airflow 2.7.0. Previously the configuration was described and configured in the Airflow core package - so if you are using Airflow below 2.7.0, look at Airflow documentation for the list of available configuration options that were available in Airflow core.
Note
For more information see Setting Configuration Options.
Sections:
[openlineage]¶
This section applies settings for OpenLineage integration. More about configuration and it’s precedence can be found at https://airflow.apache.org/docs/apache-airflow-providers-openlineage/stable/guides/user.html#transport-setup
config_path¶
Specify the path to the YAML configuration file. This ensures backwards compatibility with passing config through the openlineage.yml file.
- Type
string
- Default
''
- Environment Variable
AIRFLOW__OPENLINEAGE__CONFIG_PATH
- Example
full/path/to/openlineage.yml
custom_run_facets¶
New in version 1.10.0.
Register custom run facet functions by passing a string of semicolon separated full import paths.
- Type
string
- Default
''
- Environment Variable
AIRFLOW__OPENLINEAGE__CUSTOM_RUN_FACETS
- Example
full.path.to.custom_facet_function;full.path.to.another_custom_facet_function
dag_state_change_process_pool_size¶
New in version 1.8.0.
Number of processes to utilize for processing DAG state changes in an asynchronous manner within the scheduler process.
- Type
integer
- Default
1
- Environment Variable
AIRFLOW__OPENLINEAGE__DAG_STATE_CHANGE_PROCESS_POOL_SIZE
debug_mode¶
New in version 1.11.0.
If true, OpenLineage events will include information useful for debugging - potentially containing large fields e.g. all installed packages and their versions.
- Type
boolean
- Default
False
- Environment Variable
AIRFLOW__OPENLINEAGE__DEBUG_MODE
disable_source_code¶
Disable the inclusion of source code in OpenLineage events by setting this to true. By default, several Operators (e.g. Python, Bash) will include their source code in the events unless disabled.
- Type
boolean
- Default
False
- Environment Variable
AIRFLOW__OPENLINEAGE__DISABLE_SOURCE_CODE
disabled¶
Disable sending events without uninstalling the OpenLineage Provider by setting this to true.
- Type
boolean
- Default
False
- Environment Variable
AIRFLOW__OPENLINEAGE__DISABLED
disabled_for_operators¶
New in version 1.1.0.
Exclude some Operators from emitting OpenLineage events by passing a string of semicolon separated full import paths of Operators to disable.
- Type
string
- Default
''
- Environment Variable
AIRFLOW__OPENLINEAGE__DISABLED_FOR_OPERATORS
- Example
airflow.providers.standard.operators.bash.BashOperator; airflow.providers.standard.operators.python.PythonOperator
execution_timeout¶
New in version 1.9.0.
Maximum amount of time (in seconds) that OpenLineage can spend executing metadata extraction.
- Type
integer
- Default
10
- Environment Variable
AIRFLOW__OPENLINEAGE__EXECUTION_TIMEOUT
extractors¶
Register custom OpenLineage Extractors by passing a string of semicolon separated full import paths.
- Type
string
- Default
None
- Environment Variable
AIRFLOW__OPENLINEAGE__EXTRACTORS
- Example
full.path.to.ExtractorClass;full.path.to.AnotherExtractorClass
include_full_task_info¶
New in version 1.10.0.
If true, OpenLineage event will include full task info - potentially containing large fields.
- Type
boolean
- Default
False
- Environment Variable
AIRFLOW__OPENLINEAGE__INCLUDE_FULL_TASK_INFO
namespace¶
Set namespace that the lineage data belongs to, so that if you use multiple OpenLineage producers, events coming from them will be logically separated.
- Type
string
- Default
None
- Environment Variable
AIRFLOW__OPENLINEAGE__NAMESPACE
- Example
my_airflow_instance_1
selective_enable¶
New in version 1.7.0.
If this setting is enabled, OpenLineage integration won’t collect and emit metadata, unless you explicitly enable it per DAG or Task using enable_lineage method.
- Type
boolean
- Default
False
- Environment Variable
AIRFLOW__OPENLINEAGE__SELECTIVE_ENABLE
spark_inject_parent_job_info¶
New in version 1.15.0.
Automatically inject OpenLineage’s parent job (namespace, job name, run id) information into Spark application properties for supported Operators.
- Type
boolean
- Default
False
- Environment Variable
AIRFLOW__OPENLINEAGE__SPARK_INJECT_PARENT_JOB_INFO
transport¶
Pass OpenLineage Client transport configuration as JSON string. It should contain type of the transport and additional options (different for each transport type). For more details see: https://openlineage.io/docs/client/python/#built-in-transport-types
Currently supported types are:
HTTP
Kafka
Console
File
- Type
string
- Default
''
- Environment Variable
AIRFLOW__OPENLINEAGE__TRANSPORT
- Example
{"type": "http", "url": "http://localhost:5000", "endpoint": "api/v1/lineage"}