Configuration Reference¶
This page contains the list of all available Airflow configurations for the
apache-airflow-providers-openlineage
provider that can be set in the airflow.cfg
file or using environment variables.
Note
The configuration embedded in provider packages started to be used as of Airflow 2.7.0. Previously the configuration was described and configured in the Airflow core package - so if you are using Airflow below 2.7.0, look at Airflow documentation for the list of available configuration options that were available in Airflow core.
Note
For more information see Setting Configuration Options.
[openlineage]¶
This section applies settings for OpenLineage integration. More about configuration and it’s precedence can be found at https://airflow.apache.org/docs/apache-airflow-providers-openlineage/stable/guides/user.html#transport-setup
config_path¶
Specify the path to the YAML configuration file. This ensures backwards compatibility with passing config through the openlineage.yml file.
- Type:
string
- Default:
''
- Environment Variable:
AIRFLOW__OPENLINEAGE__CONFIG_PATH
- Example:
full/path/to/openlineage.yml
custom_run_facets¶
Added in version 1.10.0.
Register custom run facet functions by passing a string of semicolon separated full import paths.
- Type:
string
- Default:
''
- Environment Variable:
AIRFLOW__OPENLINEAGE__CUSTOM_RUN_FACETS
- Example:
full.path.to.custom_facet_function;full.path.to.another_custom_facet_function
dag_state_change_process_pool_size¶
Added in version 1.8.0.
Number of processes to utilize for processing DAG state changes in an asynchronous manner within the scheduler process.
- Type:
integer
- Default:
1
- Environment Variable:
AIRFLOW__OPENLINEAGE__DAG_STATE_CHANGE_PROCESS_POOL_SIZE
debug_mode¶
Added in version 1.11.0.
If true, OpenLineage events will include information useful for debugging - potentially containing large fields e.g. all installed packages and their versions.
- Type:
boolean
- Default:
False
- Environment Variable:
AIRFLOW__OPENLINEAGE__DEBUG_MODE
disable_source_code¶
Disable the inclusion of source code in OpenLineage events by setting this to true. By default, several Operators (e.g. Python, Bash) will include their source code in the events unless disabled.
- Type:
boolean
- Default:
False
- Environment Variable:
AIRFLOW__OPENLINEAGE__DISABLE_SOURCE_CODE
disabled¶
Disable sending events without uninstalling the OpenLineage Provider by setting this to true.
- Type:
boolean
- Default:
False
- Environment Variable:
AIRFLOW__OPENLINEAGE__DISABLED
disabled_for_operators¶
Added in version 1.1.0.
Exclude some Operators from emitting OpenLineage events by passing a string of semicolon separated full import paths of Operators to disable.
- Type:
string
- Default:
''
- Environment Variable:
AIRFLOW__OPENLINEAGE__DISABLED_FOR_OPERATORS
- Example:
airflow.providers.standard.operators.bash.BashOperator; airflow.providers.standard.operators.python.PythonOperator
execution_timeout¶
Added in version 1.9.0.
Maximum amount of time (in seconds) that OpenLineage can spend executing metadata extraction.
- Type:
integer
- Default:
10
- Environment Variable:
AIRFLOW__OPENLINEAGE__EXECUTION_TIMEOUT
extractors¶
Register custom OpenLineage Extractors by passing a string of semicolon separated full import paths.
- Type:
string
- Default:
None
- Environment Variable:
AIRFLOW__OPENLINEAGE__EXTRACTORS
- Example:
full.path.to.ExtractorClass;full.path.to.AnotherExtractorClass
include_full_task_info¶
Added in version 1.10.0.
If true, OpenLineage event will include full task info - potentially containing large fields.
- Type:
boolean
- Default:
False
- Environment Variable:
AIRFLOW__OPENLINEAGE__INCLUDE_FULL_TASK_INFO
namespace¶
Set namespace that the lineage data belongs to, so that if you use multiple OpenLineage producers, events coming from them will be logically separated.
- Type:
string
- Default:
None
- Environment Variable:
AIRFLOW__OPENLINEAGE__NAMESPACE
- Example:
my_airflow_instance_1
selective_enable¶
Added in version 1.7.0.
If this setting is enabled, OpenLineage integration won’t collect and emit metadata, unless you explicitly enable it per DAG or Task using enable_lineage method.
- Type:
boolean
- Default:
False
- Environment Variable:
AIRFLOW__OPENLINEAGE__SELECTIVE_ENABLE
spark_inject_parent_job_info¶
Added in version 2.0.0.
Automatically inject OpenLineage’s parent job (namespace, job name, run id) information into Spark application properties for supported Operators.
- Type:
boolean
- Default:
False
- Environment Variable:
AIRFLOW__OPENLINEAGE__SPARK_INJECT_PARENT_JOB_INFO
transport¶
Pass OpenLineage Client transport configuration as JSON string. It should contain type of the transport and additional options (different for each transport type). For more details see: https://openlineage.io/docs/client/python/#built-in-transport-types
Currently supported types are:
HTTP
Kafka
Console
File
- Type:
string
- Default:
''
- Environment Variable:
AIRFLOW__OPENLINEAGE__TRANSPORT
- Example:
{"type": "http", "url": "http://localhost:5000", "endpoint": "api/v1/lineage"}