airflow.sdk API Reference¶
This page documents the full public API exposed in Airflow 3.0+ via the Task SDK python module.
If something is not on this page it is best to assume that it is not part of the public API and use of it is entirely at your own risk – we won’t go out of our way break usage of them, but we make no promises either.
Defining DAGs¶
- class airflow.sdk.DAG¶
A dag (directed acyclic graph) is a collection of tasks with directional dependencies.
A dag also has a schedule, a start date and an end date (optional). For each schedule, (say daily or hourly), the DAG needs to run each individual tasks as their dependencies are met. Certain tasks have the property of depending on their own past, meaning that they can’t run until their previous schedule (and upstream tasks) are completed.
DAGs essentially act as namespaces for tasks. A task_id can only be added once to a DAG.
Note that if you plan to use time zones all the dates provided should be pendulum dates. See Time zone aware dags.
Added in version 2.4: The schedule argument to specify either time-based scheduling logic (timetable), or dataset-driven triggers.
Changed in version 3.0: The default value of schedule has been changed to None (no schedule). The previous default was
timedelta(days=1)
.- Parameters:
dag_id – The id of the DAG; must consist exclusively of alphanumeric characters, dashes, dots and underscores (all ASCII)
description – The description for the DAG to e.g. be shown on the webserver
schedule – If provided, this defines the rules according to which DAG runs are scheduled. Possible values include a cron expression string, timedelta object, Timetable, or list of Asset objects. See also Customizing DAG Scheduling with Timetables.
start_date – The timestamp from which the scheduler will attempt to backfill. If this is not provided, backfilling must be done manually with an explicit time range.
end_date – A date beyond which your DAG won’t run, leave to None for open-ended scheduling.
template_searchpath – This list of folders (non-relative) defines where jinja will look for your templates. Order matters. Note that jinja/airflow includes the path of your DAG file by default
template_undefined – Template undefined type.
user_defined_macros – a dictionary of macros that will be exposed in your jinja templates. For example, passing
dict(foo='bar')
to this argument allows you to{{ foo }}
in all jinja templates related to this DAG. Note that you can pass any type of object here.user_defined_filters – a dictionary of filters that will be exposed in your jinja templates. For example, passing
dict(hello=lambda name: 'Hello %s' % name)
to this argument allows you to{{ 'world' | hello }}
in all jinja templates related to this DAG.default_args – A dictionary of default parameters to be used as constructor keyword parameters when initialising operators. Note that operators have the same hook, and precede those defined here, meaning that if your dict contains ‘depends_on_past’: True here and ‘depends_on_past’: False in the operator’s call default_args, the actual value will be False.
params – a dictionary of DAG level parameters that are made accessible in templates, namespaced under params. These params can be overridden at the task level.
max_active_tasks – the number of task instances allowed to run concurrently
max_active_runs – maximum number of active DAG runs, beyond this number of DAG runs in a running state, the scheduler won’t create new active DAG runs
max_consecutive_failed_dag_runs – (experimental) maximum number of consecutive failed DAG runs, beyond this the scheduler will disable the DAG
dagrun_timeout – Specify the duration a DagRun should be allowed to run before it times out or fails. Task instances that are running when a DagRun is timed out will be marked as skipped.
sla_miss_callback – DEPRECATED - The SLA feature is removed in Airflow 3.0, to be replaced with a new implementation in 3.1
catchup – Perform scheduler catchup (or only run latest)? Defaults to False
on_failure_callback – A function or list of functions to be called when a DagRun of this dag fails. A context dictionary is passed as a single parameter to this function.
on_success_callback – Much like the
on_failure_callback
except that it is executed when the dag succeeds.access_control – Specify optional DAG-level actions, e.g., “{‘role1’: {‘can_read’}, ‘role2’: {‘can_read’, ‘can_edit’, ‘can_delete’}}” or it can specify the resource name if there is a DAGs Run resource, e.g., “{‘role1’: {‘DAG Runs’: {‘can_create’}}, ‘role2’: {‘DAGs’: {‘can_read’, ‘can_edit’, ‘can_delete’}}”
is_paused_upon_creation – Specifies if the dag is paused when created for the first time. If the dag exists already, this flag will be ignored. If this optional parameter is not specified, the global config setting will be used.
jinja_environment_kwargs –
additional configuration options to be passed to Jinja
Environment
for template renderingExample: to avoid Jinja from removing a trailing newline from template strings
DAG( dag_id="my-dag", jinja_environment_kwargs={ "keep_trailing_newline": True, # some other jinja2 Environment options here }, )
render_template_as_native_obj – If True, uses a Jinja
NativeEnvironment
to render templates as native Python types. If False, a JinjaEnvironment
is used to render templates as string values.tags – List of tags to help filtering DAGs in the UI.
owner_links – Dict of owners and their links, that will be clickable on the DAGs view UI. Can be used as an HTTP link (for example the link to your Slack channel), or a mailto link. e.g:
{"dag_owner": "https://airflow.apache.org/"}
auto_register – Automatically register this DAG when it is used in a
with
blockfail_fast – Fails currently running tasks when task in DAG fails. Warning: A fail stop dag can only have tasks with the default trigger rule (“all_success”). An exception will be thrown if any task in a fail stop dag has a non default trigger rule.
dag_display_name – The display name of the DAG which appears on the UI.
Decorators¶
- airflow.sdk.dag(dag_id='', *, description=None, schedule=None, start_date=None, end_date=None, template_searchpath=None, template_undefined=jinja2.StrictUndefined, user_defined_macros=None, user_defined_filters=None, default_args=None, max_active_tasks=..., max_active_runs=..., max_consecutive_failed_dag_runs=..., dagrun_timeout=None, catchup=..., on_success_callback=None, on_failure_callback=None, deadline=None, doc_md=None, params=None, access_control=None, is_paused_upon_creation=None, jinja_environment_kwargs=None, render_template_as_native_obj=False, tags=None, owner_links=None, auto_register=True, fail_fast=False, dag_display_name=None, disable_bundle_versioning=False)¶
- Parameters:
dag_id (str)
description (str | None)
schedule (ScheduleArg)
start_date (datetime.datetime | None)
end_date (datetime.datetime | None)
template_searchpath (str | collections.abc.Iterable[str] | None)
template_undefined (type[jinja2.StrictUndefined])
user_defined_macros (dict | None)
user_defined_filters (dict | None)
default_args (dict[str, Any] | None)
max_active_tasks (int)
max_active_runs (int)
max_consecutive_failed_dag_runs (int)
dagrun_timeout (datetime.timedelta | None)
catchup (bool)
on_success_callback (None | DagStateChangeCallback | list[DagStateChangeCallback])
on_failure_callback (None | DagStateChangeCallback | list[DagStateChangeCallback])
deadline (airflow.sdk.definitions.deadline.DeadlineAlert | None)
doc_md (str | None)
params (airflow.sdk.definitions.param.ParamsDict | dict[str, Any] | None)
access_control (dict[str, dict[str, collections.abc.Collection[str]]] | dict[str, collections.abc.Collection[str]] | None)
is_paused_upon_creation (bool | None)
jinja_environment_kwargs (dict | None)
render_template_as_native_obj (bool)
tags (collections.abc.Collection[str] | None)
owner_links (dict[str, str] | None)
auto_register (bool)
fail_fast (bool)
dag_display_name (str | None)
disable_bundle_versioning (bool)
- Return type:
collections.abc.Callable[[collections.abc.Callable], collections.abc.Callable[Ellipsis, DAG]]
Task Decorators:
@task.run_if(condition, skip_message=None)
Run the task only if the given condition is met; otherwise the task is skipped. The condition is a callable that receives the task execution context and returns either a boolean or a tuple(bool, message)
.@task.skip_if(condition, skip_message=None)
Skip the task if the given condition is met, raising a skip exception with an optional message.Provider-specific task decorators under
@task.<provider>
, e.g.@task.python
,@task.docker
, etc., dynamically loaded from registered providers.
- airflow.sdk.task_group(group_id: str | None = None, prefix_group_id: bool = True, parent_group: airflow.sdk.definitions.taskgroup.TaskGroup | None = None, dag: airflow.sdk.definitions.dag.DAG | None = None, default_args: dict[str, Any] | None = None, tooltip: str = '', ui_color: str = 'CornflowerBlue', ui_fgcolor: str = '#000', add_suffix_on_collision: bool = False, group_display_name: str = '') collections.abc.Callable[[collections.abc.Callable[FParams, FReturn]], _TaskGroupFactory[FParams, FReturn]] ¶
Python TaskGroup decorator.
This wraps a function into an Airflow TaskGroup. When used as the
@task_group()
form, all arguments are forwarded to the underlying TaskGroup class. Can be used to parameterize TaskGroup.- Parameters:
python_callable – Function to decorate.
tg_kwargs – Keyword arguments for the TaskGroup object.
- airflow.sdk.task(*args, **kwargs)¶
Implementation to provide the
@task
syntax.
- airflow.sdk.setup(func)¶
Decorate a function to mark it as a setup task.
A setup task runs before all other tasks in its DAG or TaskGroup context and can perform initialization or resource preparation.
Example:
@setup def initialize_context(...): ...
- Parameters:
func (Callable)
- Return type:
Callable
- airflow.sdk.teardown(_func=None, *, on_failure_fail_dagrun=False)¶
Decorate a function to mark it as a teardown task.
A teardown task runs after all main tasks in its DAG or TaskGroup context. If
on_failure_fail_dagrun=True
, a failure in teardown will mark the DAG run as failed.Example:
@teardown(on_failure_fail_dagrun=True) def cleanup(...): ...
- Parameters:
on_failure_fail_dagrun (bool)
- Return type:
Callable
- airflow.sdk.asset(*, schedule, is_paused_upon_creation=None, dag_id=None, dag_display_name=None, description=None, params=None, on_success_callback=None, on_failure_callback=None, access_control=None, owner_links=NOTHING, tags=NOTHING, name=None, uri=None, group='asset', extra=NOTHING, watchers=NOTHING)¶
Create an asset by decorating a materialization function.
- Parameters:
schedule (ScheduleArg)
is_paused_upon_creation (bool | None)
dag_id (str | None)
dag_display_name (str | None)
description (str | None)
params (ParamsDict | None)
on_success_callback (None | DagStateChangeCallback | list[DagStateChangeCallback])
on_failure_callback (None | DagStateChangeCallback | list[DagStateChangeCallback])
access_control (dict[str, dict[str, Collection[str]]] | None)
owner_links (dict[str, str])
tags (Collection[str])
name (str | None)
uri (str | ObjectStoragePath | None)
group (str)
extra (dict[str, Any])
watchers (list[BaseTrigger])
- Return type:
None
Bases¶
- class airflow.sdk.BaseOperator(*, task_id, owner=DEFAULT_OWNER, email=None, email_on_retry=True, email_on_failure=True, retries=DEFAULT_RETRIES, retry_delay=DEFAULT_RETRY_DELAY, retry_exponential_backoff=False, max_retry_delay=None, start_date=None, end_date=None, depends_on_past=False, ignore_first_depends_on_past=DEFAULT_IGNORE_FIRST_DEPENDS_ON_PAST, wait_for_past_depends_before_skipping=DEFAULT_WAIT_FOR_PAST_DEPENDS_BEFORE_SKIPPING, wait_for_downstream=False, dag=None, params=None, default_args=None, priority_weight=DEFAULT_PRIORITY_WEIGHT, weight_rule=DEFAULT_WEIGHT_RULE, queue=DEFAULT_QUEUE, pool=None, pool_slots=DEFAULT_POOL_SLOTS, sla=None, execution_timeout=DEFAULT_TASK_EXECUTION_TIMEOUT, on_execute_callback=None, on_failure_callback=None, on_success_callback=None, on_retry_callback=None, on_skipped_callback=None, pre_execute=None, post_execute=None, trigger_rule=DEFAULT_TRIGGER_RULE, resources=None, run_as_user=None, map_index_template=None, max_active_tis_per_dag=None, max_active_tis_per_dagrun=None, executor=None, executor_config=None, do_xcom_push=True, multiple_outputs=False, inlets=None, outlets=None, task_group=None, doc=None, doc_md=None, doc_json=None, doc_yaml=None, doc_rst=None, task_display_name=None, logger_name=None, allow_nested_operators=True, **kwargs)¶
Abstract base class for all operators.
Since operators create objects that become nodes in the DAG, BaseOperator contains many recursive methods for DAG crawling behavior. To derive from this class, you are expected to override the constructor and the ‘execute’ method.
Operators derived from this class should perform or trigger certain tasks synchronously (wait for completion). Example of operators could be an operator that runs a Pig job (PigOperator), a sensor operator that waits for a partition to land in Hive (HiveSensorOperator), or one that moves data from Hive to MySQL (Hive2MySqlOperator). Instances of these operators (tasks) target specific operations, running specific scripts, functions or data transfers.
This class is abstract and shouldn’t be instantiated. Instantiating a class derived from this one results in the creation of a task object, which ultimately becomes a node in DAG objects. Task dependencies should be set by using the set_upstream and/or set_downstream methods.
- Parameters:
task_id (str) – a unique, meaningful id for the task
owner (str) – the owner of the task. Using a meaningful description (e.g. user/person/team/role name) to clarify ownership is recommended.
email (str | collections.abc.Sequence[str] | None) – the ‘to’ email address(es) used in email alerts. This can be a single email or multiple ones. Multiple addresses can be specified as a comma or semicolon separated string or by passing a list of strings. (deprecated)
email_on_retry (bool) – Indicates whether email alerts should be sent when a task is retried (deprecated)
email_on_failure (bool) – Indicates whether email alerts should be sent when a task failed (deprecated)
retries (int | None) – the number of retries that should be performed before failing the task
retry_delay (datetime.timedelta | float) – delay between retries, can be set as
timedelta
orfloat
seconds, which will be converted intotimedelta
, the default istimedelta(seconds=300)
.retry_exponential_backoff (bool) – allow progressively longer waits between retries by using exponential backoff algorithm on retry delay (delay will be converted into seconds)
max_retry_delay (datetime.timedelta | float | None) – maximum delay interval between retries, can be set as
timedelta
orfloat
seconds, which will be converted intotimedelta
.start_date (datetime.datetime | None) – The
start_date
for the task, determines thelogical_date
for the first task instance. The best practice is to have the start_date rounded to your DAG’sschedule_interval
. Daily jobs have their start_date some day at 00:00:00, hourly jobs have their start_date at 00:00 of a specific hour. Note that Airflow simply looks at the latestlogical_date
and adds theschedule_interval
to determine the nextlogical_date
. It is also very important to note that different tasks’ dependencies need to line up in time. If task A depends on task B and their start_date are offset in a way that their logical_date don’t line up, A’s dependencies will never be met. If you are looking to delay a task, for example running a daily task at 2AM, look into theTimeSensor
andTimeDeltaSensor
. We advise against using dynamicstart_date
and recommend using fixed ones. Read the FAQ entry about start_date for more information.end_date (datetime.datetime | None) – if specified, the scheduler won’t go beyond this date
depends_on_past (bool) – when set to true, task instances will run sequentially and only if the previous instance has succeeded or has been skipped. The task instance for the start_date is allowed to run.
wait_for_past_depends_before_skipping (bool) – when set to true, if the task instance should be marked as skipped, and depends_on_past is true, the ti will stay on None state waiting the task of the previous run
wait_for_downstream (bool) – when set to true, an instance of task X will wait for tasks immediately downstream of the previous instance of task X to finish successfully or be skipped before it runs. This is useful if the different instances of a task X alter the same asset, and this asset is used by tasks downstream of task X. Note that depends_on_past is forced to True wherever wait_for_downstream is used. Also note that only tasks immediately downstream of the previous task instance are waited for; the statuses of any tasks further downstream are ignored.
dag (airflow.sdk.definitions.dag.DAG | None) – a reference to the dag the task is attached to (if any)
priority_weight (int) – priority weight of this task against other task. This allows the executor to trigger higher priority tasks before others when things get backed up. Set priority_weight as a higher number for more important tasks. As not all database engines support 64-bit integers, values are capped with 32-bit. Valid range is from -2,147,483,648 to 2,147,483,647.
weight_rule (str | airflow.task.priority_strategy.PriorityWeightStrategy) – weighting method used for the effective total priority weight of the task. Options are:
{ downstream | upstream | absolute }
default isdownstream
When set todownstream
the effective weight of the task is the aggregate sum of all downstream descendants. As a result, upstream tasks will have higher weight and will be scheduled more aggressively when using positive weight values. This is useful when you have multiple dag run instances and desire to have all upstream tasks to complete for all runs before each dag can continue processing downstream tasks. When set toupstream
the effective weight is the aggregate sum of all upstream ancestors. This is the opposite where downstream tasks have higher weight and will be scheduled more aggressively when using positive weight values. This is useful when you have multiple dag run instances and prefer to have each dag complete before starting upstream tasks of other dags. When set toabsolute
, the effective weight is the exactpriority_weight
specified without additional weighting. You may want to do this when you know exactly what priority weight each task should have. Additionally, when set toabsolute
, there is bonus effect of significantly speeding up the task creation process as for very large DAGs. Options can be set as string or using the constants defined in the static classairflow.utils.WeightRule
. Irrespective of the weight rule, resulting priority values are capped with 32-bit. This is an experimental feature. Since 2.9.0, Airflow allows to define custom priority weight strategy, by creating a subclass ofairflow.task.priority_strategy.PriorityWeightStrategy
and registering in a plugin, then providing the class path or the class instance viaweight_rule
parameter. The custom priority weight strategy will be used to calculate the effective total priority weight of the task instance.queue (str) – which queue to target when running this job. Not all executors implement queue management, the CeleryExecutor does support targeting specific queues.
pool (str | None) – the slot pool this task should run in, slot pools are a way to limit concurrency for certain tasks
pool_slots (int) – the number of pool slots this task should use (>= 1) Values less than 1 are not allowed.
sla (datetime.timedelta | None) – DEPRECATED - The SLA feature is removed in Airflow 3.0, to be replaced with a new implementation in Airflow >=3.1.
execution_timeout (datetime.timedelta | None) – max time allowed for the execution of this task instance, if it goes beyond it will raise and fail.
on_failure_callback (None | airflow.sdk.definitions._internal.abstractoperator.TaskStateChangeCallback | list[airflow.sdk.definitions._internal.abstractoperator.TaskStateChangeCallback]) – a function or list of functions to be called when a task instance of this task fails. a context dictionary is passed as a single parameter to this function. Context contains references to related objects to the task instance and is documented under the macros section of the API.
on_execute_callback (None | airflow.sdk.definitions._internal.abstractoperator.TaskStateChangeCallback | collections.abc.Collection[airflow.sdk.definitions._internal.abstractoperator.TaskStateChangeCallback]) – much like the
on_failure_callback
except that it is executed right before the task is executed.on_retry_callback (None | airflow.sdk.definitions._internal.abstractoperator.TaskStateChangeCallback | list[airflow.sdk.definitions._internal.abstractoperator.TaskStateChangeCallback]) – much like the
on_failure_callback
except that it is executed when retries occur.on_success_callback (None | airflow.sdk.definitions._internal.abstractoperator.TaskStateChangeCallback | collections.abc.Collection[airflow.sdk.definitions._internal.abstractoperator.TaskStateChangeCallback]) – much like the
on_failure_callback
except that it is executed when the task succeeds.on_skipped_callback (None | airflow.sdk.definitions._internal.abstractoperator.TaskStateChangeCallback | collections.abc.Collection[airflow.sdk.definitions._internal.abstractoperator.TaskStateChangeCallback]) – much like the
on_failure_callback
except that it is executed when skipped occur; this callback will be called only if AirflowSkipException get raised. Explicitly it is NOT called if a task is not started to be executed because of a preceding branching decision in the DAG or a trigger rule which causes execution to skip so that the task execution is never scheduled.pre_execute (TaskPreExecuteHook | None) –
a function to be called immediately before task execution, receiving a context dictionary; raising an exception will prevent the task from being executed.
This is an experimental feature.
post_execute (TaskPostExecuteHook | None) –
a function to be called immediately after task execution, receiving a context dictionary and task result; raising an exception will prevent the task from succeeding.
This is an experimental feature.
trigger_rule (str) – defines the rule by which dependencies are applied for the task to get triggered. Options are:
{ all_success | all_failed | all_done | all_skipped | one_success | one_done | one_failed | none_failed | none_failed_min_one_success | none_skipped | always}
default isall_success
. Options can be set as string or using the constants defined in the static classairflow.utils.TriggerRule
resources (dict[str, Any] | None) – A map of resource parameter names (the argument names of the Resources constructor) to their values.
run_as_user (str | None) – unix username to impersonate while running the task
max_active_tis_per_dag (int | None) – When set, a task will be able to limit the concurrent runs across logical_dates.
max_active_tis_per_dagrun (int | None) – When set, a task will be able to limit the concurrent task instances per DAG run.
executor (str | None) – Which executor to target when running this task. NOT YET SUPPORTED
executor_config (dict | None) –
Additional task-level configuration parameters that are interpreted by a specific executor. Parameters are namespaced by the name of executor.
Example: to run this task in a specific docker container through the KubernetesExecutor
MyOperator(..., executor_config={"KubernetesExecutor": {"image": "myCustomDockerImage"}})
do_xcom_push (bool) – if True, an XCom is pushed containing the Operator’s result
multiple_outputs (bool) – if True and do_xcom_push is True, pushes multiple XComs, one for each key in the returned dictionary result. If False and do_xcom_push is True, pushes a single XCom.
task_group (airflow.sdk.definitions.taskgroup.TaskGroup | None) – The TaskGroup to which the task should belong. This is typically provided when not using a TaskGroup as a context manager.
doc (str | None) – Add documentation or notes to your Task objects that is visible in Task Instance details View in the Webserver
doc_md (str | None) – Add documentation (in Markdown format) or notes to your Task objects that is visible in Task Instance details View in the Webserver
doc_rst (str | None) – Add documentation (in RST format) or notes to your Task objects that is visible in Task Instance details View in the Webserver
doc_json (str | None) – Add documentation (in JSON format) or notes to your Task objects that is visible in Task Instance details View in the Webserver
doc_yaml (str | None) – Add documentation (in YAML format) or notes to your Task objects that is visible in Task Instance details View in the Webserver
task_display_name (str | None) – The display name of the task which appears on the UI.
logger_name (str | None) – Name of the logger used by the Operator to emit logs. If set to None (default), the logger name will fall back to airflow.task.operators.{class.__module__}.{class.__name__} (e.g. SimpleHttpOperator will have airflow.task.operators.airflow.providers.http.operators.http.SimpleHttpOperator as logger).
allow_nested_operators (bool) –
if True, when an operator is executed within another one a warning message will be logged. If False, then an exception will be raised if the operator is badly used (e.g. nested within another one). In future releases of Airflow this parameter will be removed and an exception will always be thrown when operators are nested within each other (default is True).
Example: example of a bad operator mixin usage:
@task(provide_context=True) def say_hello_world(**context): hello_world_task = BashOperator( task_id="hello_world_task", bash_command="python -c \"print('Hello, world!')\"", dag=dag, ) hello_world_task.execute(context)
ignore_first_depends_on_past (bool)
params (collections.abc.MutableMapping[str, Any] | None)
default_args (dict | None)
map_index_template (str | None)
inlets (Any | None)
outlets (Any | None)
kwargs (Any)
- class airflow.sdk.BaseSensorOperator(*, poke_interval=60, timeout=conf.getfloat('sensors', 'default_timeout'), soft_fail=False, mode='poke', exponential_backoff=False, max_wait=None, silent_fail=False, never_fail=False, **kwargs)¶
Sensor operators are derived from this class and inherit these attributes.
Sensor operators keep executing at a time interval and succeed when a criteria is met and fail if and when they time out.
- Parameters:
soft_fail (bool) – Set to true to mark the task as SKIPPED on failure. Mutually exclusive with never_fail.
poke_interval (datetime.timedelta | float) – Time that the job should wait in between each try. Can be
timedelta
orfloat
seconds.timeout (datetime.timedelta | float) – Time elapsed before the task times out and fails. Can be
timedelta
orfloat
seconds. This should not be confused withexecution_timeout
of theBaseOperator
class.timeout
measures the time elapsed between the first poke and the current time (taking into account any reschedule delay between each poke), whileexecution_timeout
checks the running time of the task (leaving out any reschedule delay). In case that themode
ispoke
(see below), both of them are equivalent (as the sensor is never rescheduled), which is not the case inreschedule
mode.mode (str) – How the sensor operates. Options are:
{ poke | reschedule }
, default ispoke
. When set topoke
the sensor is taking up a worker slot for its whole execution time and sleeps between pokes. Use this mode if the expected runtime of the sensor is short or if a short poke interval is required. Note that the sensor will hold onto a worker slot and a pool slot for the duration of the sensor’s runtime in this mode. When set toreschedule
the sensor task frees the worker slot when the criteria is not yet met and it’s rescheduled at a later time. Use this mode if the time before the criteria is met is expected to be quite long. The poke interval should be more than one minute to prevent too much load on the scheduler.exponential_backoff (bool) – allow progressive longer waits between pokes by using exponential backoff algorithm
max_wait (datetime.timedelta | float | None) – maximum wait interval between pokes, can be
timedelta
orfloat
secondssilent_fail (bool) – If true, and poke method raises an exception different from AirflowSensorTimeout, AirflowTaskTimeout, AirflowSkipException and AirflowFailException, the sensor will log the error and continue its execution. Otherwise, the sensor task fails, and it can be retried based on the provided retries parameter.
never_fail (bool) – If true, and poke method raises an exception, sensor will be skipped. Mutually exclusive with soft_fail.
- class airflow.sdk.BaseNotifier¶
BaseNotifier class for sending notifications.
- class airflow.sdk.BaseOperatorLink¶
Abstract base class that defines how we get an operator link.
- class airflow.sdk.XComArg¶
Reference to an XCom value pushed from another operator.
The implementation supports:
xcomarg >> op xcomarg << op op >> xcomarg # By BaseOperator code op << xcomarg # By BaseOperator code
Example: The moment you get a result from any operator (decorated or regular) you can
any_op = AnyOperator() xcomarg = XComArg(any_op) # or equivalently xcomarg = any_op.output my_op = MyOperator() my_op >> xcomarg
This object can be used in legacy Operators via Jinja.
Example: You can make this result to be part of any generated string:
any_op = AnyOperator() xcomarg = any_op.output op1 = MyOperator(my_text_message=f"the value is {xcomarg}") op2 = MyOperator(my_text_message=f"the value is {xcomarg['topic']}")
- Parameters:
operator – Operator instance to which the XComArg references.
key – Key used to pull the XCom value. Defaults to XCOM_RETURN_KEY, i.e. the referenced operator’s return value.
- class airflow.sdk.PokeReturnValue(is_done, xcom_value=None)¶
Optional return value for poke methods.
Sensors can optionally return an instance of the PokeReturnValue class in the poke method. If an XCom value is supplied when the sensor is done, then the XCom value will be pushed through the operator return value. :param is_done: Set to true to indicate the sensor can stop poking. :param xcom_value: An optional XCOM value to be returned by the operator.
- Parameters:
is_done (bool)
xcom_value (Any | None)
- class airflow.sdk.BaseHook(logger_name=None)¶
Abstract base class for hooks.
Hooks are meant as an interface to interact with external systems. MySqlHook, HiveHook, PigHook return object that can handle the connection and interaction to specific instances of these systems, and expose consistent methods to interact with them.
- Parameters:
logger_name (str | None) – Name of the logger used by the Hook to emit logs. If set to None (default), the logger name will fall back to airflow.task.hooks.{class.__module__}.{class.__name__} (e.g. DbApiHook will have airflow.task.hooks.airflow.providers.common.sql.hooks.sql.DbApiHook as logger).
Connections & Variables¶
- class airflow.sdk.Connection¶
A connection to an external data source.
- Parameters:
conn_id – The connection ID.
conn_type – The connection type.
description – The connection description.
host – The host.
login – The login.
password – The password.
schema – The schema.
port – The port number.
extra – Extra metadata. Non-standard data such as private/SSH keys can be saved here. JSON encoded object.
- class airflow.sdk.Variable¶
A generic way to store and retrieve arbitrary content or settings as a simple key/value store.
- Parameters:
key – The variable key.
value – The variable value.
description – The variable description.
Tasks & Operators¶
- class airflow.sdk.TaskGroup¶
A collection of tasks.
When set_downstream() or set_upstream() are called on the TaskGroup, it is applied across all tasks within the group if necessary.
- Parameters:
group_id – a unique, meaningful id for the TaskGroup. group_id must not conflict with group_id of TaskGroup or task_id of tasks in the DAG. Root TaskGroup has group_id set to None.
prefix_group_id – If set to True, child task_id and group_id will be prefixed with this TaskGroup’s group_id. If set to False, child task_id and group_id are not prefixed. Default is True.
parent_group – The parent TaskGroup of this TaskGroup. parent_group is set to None for the root TaskGroup.
dag – The DAG that this TaskGroup belongs to.
default_args – A dictionary of default parameters to be used as constructor keyword parameters when initialising operators, will override default_args defined in the DAG level. Note that operators have the same hook, and precede those defined here, meaning that if your dict contains ‘depends_on_past’: True here and ‘depends_on_past’: False in the operator’s call default_args, the actual value will be False.
tooltip – The tooltip of the TaskGroup node when displayed in the UI
ui_color – The fill color of the TaskGroup node when displayed in the UI
ui_fgcolor – The label color of the TaskGroup node when displayed in the UI
add_suffix_on_collision – If this task group name already exists, automatically add __1 etc suffixes
group_display_name – If set, this will be the display name for the TaskGroup node in the UI.
- airflow.sdk.get_current_context()¶
Retrieve the execution context dictionary without altering user method’s signature.
This is the simplest method of retrieving the execution context dictionary.
Old style:
def my_task(**context): ti = context["ti"]
New style:
from airflow.sdk import get_current_context def my_task(): context = get_current_context() ti = context["ti"]
Current context will only have value if this method was called after an operator was starting to execute.
- Return type:
- airflow.sdk.get_parsing_context()¶
Return the current (DAG) parsing context info.
- Return type:
AirflowParsingContext
- class airflow.sdk.Param(default=NOTSET, description=None, **kwargs)¶
Class to hold the default value of a Param and rule set to do the validations.
Without the rule set it always validates and returns the default value.
- Parameters:
default (Any) – The value this Param object holds
description (str | None) – Optional help text for the Param
schema – The validation schema of the Param, if not given then all kwargs except default & description will form the schema
Setting Dependencies¶
- airflow.sdk.chain(*tasks)¶
Given a number of tasks, builds a dependency chain.
This function accepts values of BaseOperator (aka tasks), EdgeModifiers (aka Labels), XComArg, TaskGroups, or lists containing any mix of these types (or a mix in the same list). If you want to chain between two lists you must ensure they have the same length.
Using classic operators/sensors:
chain(t1, [t2, t3], [t4, t5], t6)
is equivalent to:
/ -> t2 -> t4 \ t1 -> t6 \ -> t3 -> t5 /
t1.set_downstream(t2) t1.set_downstream(t3) t2.set_downstream(t4) t3.set_downstream(t5) t4.set_downstream(t6) t5.set_downstream(t6)
Using task-decorated functions aka XComArgs:
chain(x1(), [x2(), x3()], [x4(), x5()], x6())
is equivalent to:
/ -> x2 -> x4 \ x1 -> x6 \ -> x3 -> x5 /
x1 = x1() x2 = x2() x3 = x3() x4 = x4() x5 = x5() x6 = x6() x1.set_downstream(x2) x1.set_downstream(x3) x2.set_downstream(x4) x3.set_downstream(x5) x4.set_downstream(x6) x5.set_downstream(x6)
Using TaskGroups:
chain(t1, task_group1, task_group2, t2) t1.set_downstream(task_group1) task_group1.set_downstream(task_group2) task_group2.set_downstream(t2)
It is also possible to mix between classic operator/sensor, EdgeModifiers, XComArg, and TaskGroups:
chain(t1, [Label("branch one"), Label("branch two")], [x1(), x2()], task_group1, x3())
is equivalent to:
/ "branch one" -> x1 \ t1 -> task_group1 -> x3 \ "branch two" -> x2 /
x1 = x1() x2 = x2() x3 = x3() label1 = Label("branch one") label2 = Label("branch two") t1.set_downstream(label1) label1.set_downstream(x1) t2.set_downstream(label2) label2.set_downstream(x2) x1.set_downstream(task_group1) x2.set_downstream(task_group1) task_group1.set_downstream(x3) # or x1 = x1() x2 = x2() x3 = x3() t1.set_downstream(x1, edge_modifier=Label("branch one")) t1.set_downstream(x2, edge_modifier=Label("branch two")) x1.set_downstream(task_group1) x2.set_downstream(task_group1) task_group1.set_downstream(x3)
- Parameters:
tasks (airflow.sdk.definitions._internal.abstractoperator.DependencyMixin | collections.abc.Sequence[airflow.sdk.definitions._internal.abstractoperator.DependencyMixin]) – Individual and/or list of tasks, EdgeModifiers, XComArgs, or TaskGroups to set dependencies
- Return type:
None
- airflow.sdk.chain_linear(*elements)¶
Simplify task dependency definition.
E.g.: suppose you want precedence like so:
╭─op2─╮ ╭─op4─╮ op1─┤ ├─├─op5─┤─op7 ╰-op3─╯ ╰-op6─╯
Then you can accomplish like so:
chain_linear(op1, [op2, op3], [op4, op5, op6], op7)
- Parameters:
elements (airflow.sdk.definitions._internal.abstractoperator.DependencyMixin | collections.abc.Sequence[airflow.sdk.definitions._internal.abstractoperator.DependencyMixin]) – a list of operators / lists of operators
- airflow.sdk.cross_downstream(from_tasks, to_tasks)¶
Set downstream dependencies for all tasks in from_tasks to all tasks in to_tasks.
Using classic operators/sensors:
cross_downstream(from_tasks=[t1, t2, t3], to_tasks=[t4, t5, t6])
is equivalent to:
t1 ---> t4 \ / t2 -X -> t5 / \ t3 ---> t6
t1.set_downstream(t4) t1.set_downstream(t5) t1.set_downstream(t6) t2.set_downstream(t4) t2.set_downstream(t5) t2.set_downstream(t6) t3.set_downstream(t4) t3.set_downstream(t5) t3.set_downstream(t6)
Using task-decorated functions aka XComArgs:
cross_downstream(from_tasks=[x1(), x2(), x3()], to_tasks=[x4(), x5(), x6()])
is equivalent to:
x1 ---> x4 \ / x2 -X -> x5 / \ x3 ---> x6
x1 = x1() x2 = x2() x3 = x3() x4 = x4() x5 = x5() x6 = x6() x1.set_downstream(x4) x1.set_downstream(x5) x1.set_downstream(x6) x2.set_downstream(x4) x2.set_downstream(x5) x2.set_downstream(x6) x3.set_downstream(x4) x3.set_downstream(x5) x3.set_downstream(x6)
It is also possible to mix between classic operator/sensor and XComArg tasks:
cross_downstream(from_tasks=[t1, x2(), t3], to_tasks=[x1(), t2, x3()])
is equivalent to:
t1 ---> x1 \ / x2 -X -> t2 / \ t3 ---> x3
x1 = x1() x2 = x2() x3 = x3() t1.set_downstream(x1) t1.set_downstream(t2) t1.set_downstream(x3) x2.set_downstream(x1) x2.set_downstream(t2) x2.set_downstream(x3) t3.set_downstream(x1) t3.set_downstream(t2) t3.set_downstream(x3)
- Parameters:
from_tasks (collections.abc.Sequence[airflow.sdk.definitions._internal.abstractoperator.DependencyMixin]) – List of tasks or XComArgs to start from.
to_tasks (airflow.sdk.definitions._internal.abstractoperator.DependencyMixin | collections.abc.Sequence[airflow.sdk.definitions._internal.abstractoperator.DependencyMixin]) – List of tasks or XComArgs to set as downstream dependencies.
- airflow.sdk.literal(value)¶
Wrap a value to ensure it is rendered as-is without applying Jinja templating to its contents.
Designed for use in an operator’s template field.
- Parameters:
value (Any) – The value to be rendered without templating
- Return type:
airflow.sdk.definitions._internal.templater.LiteralValue
Edges & Labels¶
- class airflow.sdk.EdgeModifier(label=None)¶
Class that represents edge information to be added between two tasks/operators.
Has shorthand factory functions, like Label(“hooray”).
- Current implementation supports
t1 >> Label(“Success route”) >> t2 t2 << Label(“Success route”) << t2
Note that due to the potential for use in either direction, this waits to make the actual connection between both sides until both are declared, and will do so progressively if multiple ups/downs are added.
This and EdgeInfo are related - an EdgeModifier is the Python object you use to add information to (potentially multiple) edges, and EdgeInfo is the representation of the information for one specific edge.
- Parameters:
label (str | None)
- class airflow.sdk.Label(label)¶
Create an EdgeModifier that sets a human-readable label on the edge.
- Parameters:
label (str)
Assets¶
- class airflow.sdk.Asset(name: str, uri: str | airflow.sdk.io.path.ObjectStoragePath, *, group: str = ..., extra: dict | None = None, watchers: list[AssetWatcher | airflow.serialization.serialized_objects.SerializedAssetWatcher] = ...)¶
A representation of data asset dependencies between workflows.
- class airflow.sdk.AssetAlias¶
A representation of asset alias which is used to create asset during the runtime.
- class airflow.sdk.AssetAll(*objects)¶
Use to combine assets schedule references in an “and” relationship.
- Parameters:
objects (BaseAsset)
- class airflow.sdk.AssetAny(*objects)¶
Use to combine assets schedule references in an “or” relationship.
- Parameters:
objects (BaseAsset)
- class airflow.sdk.AssetWatcher(name, trigger)¶
A representation of an asset watcher. The name uniquely identifies the watch.
- Parameters:
name (str)
trigger (airflow.triggers.base.BaseEventTrigger | dict)
- class airflow.sdk.Metadata¶
Metadata to attach to an AssetEvent.
I/O Helpers¶
- class airflow.sdk.ObjectStoragePath(*args, protocol=None, **storage_options)¶
A path-like object for object storage.
- Parameters:
protocol (str | None)
storage_options (Any)
Execution Time Components¶
Context
- class airflow.sdk.Context¶
Jinja2 template context for task rendering.
Everything else¶
- class airflow.sdk.BaseHook(logger_name=None)
Abstract base class for hooks.
Hooks are meant as an interface to interact with external systems. MySqlHook, HiveHook, PigHook return object that can handle the connection and interaction to specific instances of these systems, and expose consistent methods to interact with them.
- Parameters:
logger_name (str | None) – Name of the logger used by the Hook to emit logs. If set to None (default), the logger name will fall back to airflow.task.hooks.{class.__module__}.{class.__name__} (e.g. DbApiHook will have airflow.task.hooks.airflow.providers.common.sql.hooks.sql.DbApiHook as logger).
- abstract get_conn()
Return connection for the hook.
- Return type:
Any
- classmethod get_connection(conn_id)
Get connection, given connection id.
- Parameters:
conn_id (str) – connection id
- Returns:
connection
- Return type:
airflow.sdk.definitions.connection.Connection
- classmethod get_connection_form_widgets()
- Return type:
dict[str, Any]
- classmethod get_hook(conn_id, hook_params=None)
Return default hook for this connection id.
- Parameters:
conn_id (str) – connection id
hook_params (dict | None) – hook parameters
- Returns:
default hook for this connection
- classmethod get_ui_field_behaviour()
- Return type:
dict[str, Any]
- class airflow.sdk.BaseNotifier
BaseNotifier class for sending notifications.
- abstract notify(context)
Send a notification.
- Parameters:
context (airflow.sdk.definitions.context.Context) – The airflow context
- Return type:
None
- render_template_fields(context, jinja_env=None)
Template all attributes listed in self.template_fields.
This mutates the attributes in-place and is irreversible.
- Parameters:
context (airflow.sdk.definitions.context.Context) – Context dict with values to apply on content.
jinja_env (jinja2.Environment | None) – Jinja environment to use for rendering.
- Return type:
None
- template_ext: collections.abc.Sequence[str] = ()
- template_fields: collections.abc.Sequence[str] = ()
- class airflow.sdk.BaseOperatorLink
Abstract base class that defines how we get an operator link.
- abstract get_link(operator, *, ti_key)
Link to external system.
- Parameters:
operator (airflow.models.baseoperator.BaseOperator) – The Airflow operator object this link is associated to.
ti_key (airflow.models.taskinstancekey.TaskInstanceKey) – TaskInstance ID to return link for.
- Returns:
link to external system
- Return type:
str
- abstract property name: str
Name of the link. This will be the button name on the task UI.
- Return type:
str
- operators: ClassVar[list[type[airflow.models.baseoperator.BaseOperator]]] = []
This property will be used by Airflow Plugins to find the Operators to which you want to assign this Operator Link
- Returns:
List of Operator classes used by task for which you want to create extra link
- property xcom_key: str
XCom key with while the whole “link” for this operator link is stored.
On retrieving with this key, the entire link is returned.
Defaults to _link_<class name> if not provided.
- Return type:
str
- class airflow.sdk.BaseSensorOperator(*, poke_interval=60, timeout=conf.getfloat('sensors', 'default_timeout'), soft_fail=False, mode='poke', exponential_backoff=False, max_wait=None, silent_fail=False, never_fail=False, **kwargs)
Sensor operators are derived from this class and inherit these attributes.
Sensor operators keep executing at a time interval and succeed when a criteria is met and fail if and when they time out.
- Parameters:
soft_fail (bool) – Set to true to mark the task as SKIPPED on failure. Mutually exclusive with never_fail.
poke_interval (datetime.timedelta | float) – Time that the job should wait in between each try. Can be
timedelta
orfloat
seconds.timeout (datetime.timedelta | float) – Time elapsed before the task times out and fails. Can be
timedelta
orfloat
seconds. This should not be confused withexecution_timeout
of theBaseOperator
class.timeout
measures the time elapsed between the first poke and the current time (taking into account any reschedule delay between each poke), whileexecution_timeout
checks the running time of the task (leaving out any reschedule delay). In case that themode
ispoke
(see below), both of them are equivalent (as the sensor is never rescheduled), which is not the case inreschedule
mode.mode (str) – How the sensor operates. Options are:
{ poke | reschedule }
, default ispoke
. When set topoke
the sensor is taking up a worker slot for its whole execution time and sleeps between pokes. Use this mode if the expected runtime of the sensor is short or if a short poke interval is required. Note that the sensor will hold onto a worker slot and a pool slot for the duration of the sensor’s runtime in this mode. When set toreschedule
the sensor task frees the worker slot when the criteria is not yet met and it’s rescheduled at a later time. Use this mode if the time before the criteria is met is expected to be quite long. The poke interval should be more than one minute to prevent too much load on the scheduler.exponential_backoff (bool) – allow progressive longer waits between pokes by using exponential backoff algorithm
max_wait (datetime.timedelta | float | None) – maximum wait interval between pokes, can be
timedelta
orfloat
secondssilent_fail (bool) – If true, and poke method raises an exception different from AirflowSensorTimeout, AirflowTaskTimeout, AirflowSkipException and AirflowFailException, the sensor will log the error and continue its execution. Otherwise, the sensor task fails, and it can be retried based on the provided retries parameter.
never_fail (bool) – If true, and poke method raises an exception, sensor will be skipped. Mutually exclusive with soft_fail.
- execute(context)
Derive when creating an operator.
The main method to execute the task. Context is the same dictionary used as when rendering jinja templates.
Refer to get_template_context for more context.
- Parameters:
context (airflow.sdk.definitions.context.Context)
- Return type:
Any
- exponential_backoff = False
- classmethod get_serialized_fields()
Stringified DAGs and operators contain exactly these fields.
- max_wait = None
- mode = 'poke'
- never_fail = False
- poke(context)
Override when deriving this class.
- Parameters:
context (airflow.sdk.definitions.context.Context)
- Return type:
bool | PokeReturnValue
- poke_interval
- property reschedule
Define mode rescheduled sensors.
- resume_execution(next_method, next_kwargs, context)
Entrypoint method called by the Task Runner (instead of execute) when this task is resumed.
- Parameters:
next_method (str)
next_kwargs (dict[str, Any] | None)
context (airflow.sdk.definitions.context.Context)
- silent_fail = False
- soft_fail = False
- timeout: int | float
- ui_color: str = '#e6f1f2'
- valid_modes: collections.abc.Iterable[str] = ['poke', 'reschedule']
- class airflow.sdk.Connection
A connection to an external data source.
- Parameters:
conn_id – The connection ID.
conn_type – The connection type.
description – The connection description.
host – The host.
login – The login.
password – The password.
schema – The schema.
port – The port number.
extra – Extra metadata. Non-standard data such as private/SSH keys can be saved here. JSON encoded object.
- EXTRA_KEY = '__extra__'
- conn_id: str
- conn_type: str
- description: str | None = None
- extra: str | None = None
- property extra_dejson: dict
Deserialize extra property to JSON.
- Return type:
dict
- classmethod get(conn_id)
- Parameters:
conn_id (str)
- Return type:
Any
- get_hook(*, hook_params=None)
Return hook based on conn_type.
- get_uri()
Generate and return connection in URI format.
- Return type:
str
- host: str | None = None
- login: str | None = None
- password: str | None = None
- port: int | None = None
- schema: str | None = None
- class airflow.sdk.Context
Jinja2 template context for task rendering.
- conn: Any
- dag_run: airflow.sdk.types.DagRunProtocol
- data_interval_end: pendulum.DateTime | None
- data_interval_start: pendulum.DateTime | None
- ds: str
- ds_nodash: str
- exception: None | str | BaseException
- expanded_ti_count: int | None
- inlet_events: airflow.sdk.execution_time.context.InletEventsAccessors
- inlets: list
- logical_date: pendulum.DateTime
- macros: Any
- map_index_template: str | None
- outlet_events: airflow.sdk.types.OutletEventAccessorsProtocol
- outlets: list
- params: dict[str, Any]
- prev_data_interval_end_success: pendulum.DateTime | None
- prev_data_interval_start_success: pendulum.DateTime | None
- prev_end_date_success: pendulum.DateTime | None
- prev_start_date_success: pendulum.DateTime | None
- reason: str | None
- run_id: str
- start_date: pendulum.DateTime
- task: airflow.sdk.bases.operator.BaseOperator | airflow.models.operator.Operator
- task_instance: airflow.sdk.types.RuntimeTaskInstanceProtocol
- task_instance_key_str: str
- task_reschedule_count: int
- templates_dict: dict[str, Any] | None
- test_mode: bool
- ti: airflow.sdk.types.RuntimeTaskInstanceProtocol
- triggering_asset_events: Any
- try_number: int | None
- ts: str
- ts_nodash: str
- ts_nodash_with_tz: str
- var: Any
- class airflow.sdk.EdgeModifier(label=None)
Class that represents edge information to be added between two tasks/operators.
Has shorthand factory functions, like Label(“hooray”).
- Current implementation supports
t1 >> Label(“Success route”) >> t2 t2 << Label(“Success route”) << t2
Note that due to the potential for use in either direction, this waits to make the actual connection between both sides until both are declared, and will do so progressively if multiple ups/downs are added.
This and EdgeInfo are related - an EdgeModifier is the Python object you use to add information to (potentially multiple) edges, and EdgeInfo is the representation of the information for one specific edge.
- Parameters:
label (str | None)
- add_edge_info(dag, upstream_id, downstream_id)
Add or update task info on the DAG for this specific pair of tasks.
Called either from our relationship trigger methods above, or directly by set_upstream/set_downstream in operators.
- Parameters:
dag (airflow.sdk.definitions.dag.DAG)
upstream_id (str)
downstream_id (str)
- label = None
- property leaves
List of leaf nodes – ones with only upstream dependencies.
a.k.a. the “end” of this sub-graph
- property roots
List of root nodes – ones with no upstream dependencies.
a.k.a. the “start” of this sub-graph
- set_downstream(other, edge_modifier=None)
Set the given task/list onto the downstream attribute, then attempt to resolve the relationship.
Providing this also provides >> via DependencyMixin.
- Parameters:
other (airflow.sdk.definitions._internal.mixins.DependencyMixin | collections.abc.Sequence[airflow.sdk.definitions._internal.mixins.DependencyMixin])
edge_modifier (EdgeModifier | None)
- set_upstream(other, edge_modifier=None)
Set the given task/list onto the upstream attribute, then attempt to resolve the relationship.
Providing this also provides << via DependencyMixin.
- Parameters:
other (airflow.sdk.definitions._internal.mixins.DependencyMixin | collections.abc.Sequence[airflow.sdk.definitions._internal.mixins.DependencyMixin])
edge_modifier (EdgeModifier | None)
- update_relative(other, upstream=True, edge_modifier=None)
Update relative if we’re not the “main” side of a relationship; still run the same logic.
- Parameters:
other (airflow.sdk.definitions._internal.mixins.DependencyMixin)
upstream (bool)
edge_modifier (EdgeModifier | None)
- Return type:
None
- airflow.sdk.Label(label)
Create an EdgeModifier that sets a human-readable label on the edge.
- Parameters:
label (str)
- class airflow.sdk.Metadata
Metadata to attach to an AssetEvent.
- alias: airflow.sdk.definitions.asset.AssetAlias | None = None
- extra: dict[str, Any]
- class airflow.sdk.ObjectStoragePath(*args, protocol=None, **storage_options)
A path-like object for object storage.
- Parameters:
protocol (str | None)
storage_options (Any)
- __version__: ClassVar[int] = 1
- property bucket: str
- Return type:
str
- checksum()
Return the checksum of the file at this path.
- Return type:
int
- property container: str
- Return type:
str
- copy(dst, recursive=False, **kwargs)
Copy file(s) from this path to another location.
For remote to remote copies, the key used for the destination will be the same as the source. So that s3://src_bucket/foo/bar will be copied to gcs://dst_bucket/foo/bar and not gcs://dst_bucket/bar.
- Parameters:
dst (str | ObjectStoragePath) – Destination path
recursive (bool) – If True, copy directories recursively.
- Return type:
None
kwargs: Additional keyword arguments to be passed to the underlying implementation.
- classmethod cwd()
Return a new path pointing to the current working directory (as returned by os.getcwd()).
- classmethod deserialize(data, version)
- Parameters:
data (dict)
version (int)
- Return type:
- classmethod home()
Return a new path pointing to the user’s home directory (as returned by os.path.expanduser(‘~’)).
- property key: str
- Return type:
str
- move(path, recursive=False, **kwargs)
Move file(s) from this path to another location.
- Parameters:
path (str | ObjectStoragePath) – Destination path
recursive (bool) – bool If True, move directories recursively.
- Return type:
None
kwargs: Additional keyword arguments to be passed to the underlying implementation.
- property namespace: str
- Return type:
str
- open(mode='r', **kwargs)
Open the file pointed to by this path.
- read_block(offset, length, delimiter=None)
Read a block of bytes.
Starting at
offset
of the file, readlength
bytes. Ifdelimiter
is set then we ensure that the read starts and stops at delimiter boundaries that follow the locationsoffset
andoffset + length
. Ifoffset
is zero then we start at zero. The bytestring returned WILL include the end delimiter string.If offset+length is beyond the eof, reads to eof.
- Parameters:
offset (int) – int Byte offset to start read
length (int) – int Number of bytes to read. If None, read to the end.
delimiter – bytes (optional) Ensure reading starts and stops at delimiter bytestring
Examples¶
# Read the first 13 bytes (no delimiter) >>> read_block(0, 13) b'Alice, 100\nBo' # Read first 13 bytes, but force newline boundaries >>> read_block(0, 13, delimiter=b"\n") b'Alice, 100\nBob, 200\n' # Read until EOF, but only stop at newline >>> read_block(0, None, delimiter=b"\n") b'Alice, 100\nBob, 200\nCharlie, 300'
See Also¶
fsspec.utils.read_block()
- replace(target)
Rename this path to the target path, overwriting if that path exists.
The target path may be absolute or relative. Relative paths are interpreted relative to the current working directory, not the directory of the Path object.
Returns the new Path instance pointing to the target path.
- Return type:
- root_marker: ClassVar[str] = '/'
- samefile(other_path)
Return whether other_path is the same or not as this file.
- Parameters:
other_path (Any)
- Return type:
bool
- samestore(other)
- Parameters:
other (Any)
- Return type:
bool
- sep: ClassVar[str] = '/'
- serialize()
- Return type:
dict[str, Any]
- sign(expiration=100, **kwargs)
Create a signed URL representing the given path.
Some implementations allow temporary URLs to be generated, as a way of delegating credentials.
- Parameters:
path – str The path on the filesystem
expiration (int) – int Number of seconds to enable the URL for (if supported)
- Returns URL:
str The signed URL
- Raises:
NotImplementedError – if the method is not implemented for a store
- size()
Size in bytes of the file at this path.
- Return type:
int
- stat()
Call
stat
and return the result.- Return type:
airflow.sdk.io.stat.stat_result
- ukey()
Hash of file properties, to tell if it has changed.
- Return type:
str
- class airflow.sdk.Param(default=NOTSET, description=None, **kwargs)
Class to hold the default value of a Param and rule set to do the validations.
Without the rule set it always validates and returns the default value.
- Parameters:
default (Any) – The value this Param object holds
description (str | None) – Optional help text for the Param
schema – The validation schema of the Param, if not given then all kwargs except default & description will form the schema
- CLASS_IDENTIFIER = '__class'
- __version__: ClassVar[int] = 1
- description = None
- static deserialize(data, version)
- Parameters:
data (dict[str, Any])
version (int)
- Return type:
- dump()
Dump the Param as a dictionary.
- Return type:
dict
- property has_value: bool
- Return type:
bool
- resolve(value=NOTSET, suppress_exception=False)
Run the validations and returns the Param’s final value.
May raise ValueError on failed validations, or TypeError if no value is passed and no value already exists. We first check that value is json-serializable; if not, warn. In future release we will require the value to be json-serializable.
- Parameters:
value (Any) – The value to be updated for the Param
suppress_exception (bool) – To raise an exception or not when the validations fails. If true and validations fails, the return value would be None.
- Return type:
Any
- schema
- serialize()
- Return type:
dict
- value
- class airflow.sdk.PokeReturnValue(is_done, xcom_value=None)
Optional return value for poke methods.
Sensors can optionally return an instance of the PokeReturnValue class in the poke method. If an XCom value is supplied when the sensor is done, then the XCom value will be pushed through the operator return value. :param is_done: Set to true to indicate the sensor can stop poking. :param xcom_value: An optional XCOM value to be returned by the operator.
- Parameters:
is_done (bool)
xcom_value (Any | None)
- is_done
- xcom_value = None
- class airflow.sdk.Variable
A generic way to store and retrieve arbitrary content or settings as a simple key/value store.
- Parameters:
key – The variable key.
value – The variable value.
description – The variable description.
- classmethod delete(key)
- Parameters:
key (str)
- Return type:
None
- description: str | None = None
- classmethod get(key, default=NOTSET, deserialize_json=False)
- Parameters:
key (str)
default (Any)
deserialize_json (bool)
- key: str
- classmethod set(key, value, description=None, serialize_json=False)
- Parameters:
key (str)
value (Any)
description (str | None)
serialize_json (bool)
- Return type:
None
- value: Any | None = None
- airflow.sdk.__version__: str
- airflow.sdk.chain(*tasks)
Given a number of tasks, builds a dependency chain.
This function accepts values of BaseOperator (aka tasks), EdgeModifiers (aka Labels), XComArg, TaskGroups, or lists containing any mix of these types (or a mix in the same list). If you want to chain between two lists you must ensure they have the same length.
Using classic operators/sensors:
chain(t1, [t2, t3], [t4, t5], t6)
is equivalent to:
/ -> t2 -> t4 \ t1 -> t6 \ -> t3 -> t5 /
t1.set_downstream(t2) t1.set_downstream(t3) t2.set_downstream(t4) t3.set_downstream(t5) t4.set_downstream(t6) t5.set_downstream(t6)
Using task-decorated functions aka XComArgs:
chain(x1(), [x2(), x3()], [x4(), x5()], x6())
is equivalent to:
/ -> x2 -> x4 \ x1 -> x6 \ -> x3 -> x5 /
x1 = x1() x2 = x2() x3 = x3() x4 = x4() x5 = x5() x6 = x6() x1.set_downstream(x2) x1.set_downstream(x3) x2.set_downstream(x4) x3.set_downstream(x5) x4.set_downstream(x6) x5.set_downstream(x6)
Using TaskGroups:
chain(t1, task_group1, task_group2, t2) t1.set_downstream(task_group1) task_group1.set_downstream(task_group2) task_group2.set_downstream(t2)
It is also possible to mix between classic operator/sensor, EdgeModifiers, XComArg, and TaskGroups:
chain(t1, [Label("branch one"), Label("branch two")], [x1(), x2()], task_group1, x3())
is equivalent to:
/ "branch one" -> x1 \ t1 -> task_group1 -> x3 \ "branch two" -> x2 /
x1 = x1() x2 = x2() x3 = x3() label1 = Label("branch one") label2 = Label("branch two") t1.set_downstream(label1) label1.set_downstream(x1) t2.set_downstream(label2) label2.set_downstream(x2) x1.set_downstream(task_group1) x2.set_downstream(task_group1) task_group1.set_downstream(x3) # or x1 = x1() x2 = x2() x3 = x3() t1.set_downstream(x1, edge_modifier=Label("branch one")) t1.set_downstream(x2, edge_modifier=Label("branch two")) x1.set_downstream(task_group1) x2.set_downstream(task_group1) task_group1.set_downstream(x3)
- Parameters:
tasks (airflow.sdk.definitions._internal.abstractoperator.DependencyMixin | collections.abc.Sequence[airflow.sdk.definitions._internal.abstractoperator.DependencyMixin]) – Individual and/or list of tasks, EdgeModifiers, XComArgs, or TaskGroups to set dependencies
- Return type:
None
- airflow.sdk.chain_linear(*elements)
Simplify task dependency definition.
E.g.: suppose you want precedence like so:
╭─op2─╮ ╭─op4─╮ op1─┤ ├─├─op5─┤─op7 ╰-op3─╯ ╰-op6─╯
Then you can accomplish like so:
chain_linear(op1, [op2, op3], [op4, op5, op6], op7)
- Parameters:
elements (airflow.sdk.definitions._internal.abstractoperator.DependencyMixin | collections.abc.Sequence[airflow.sdk.definitions._internal.abstractoperator.DependencyMixin]) – a list of operators / lists of operators
- airflow.sdk.cross_downstream(from_tasks, to_tasks)
Set downstream dependencies for all tasks in from_tasks to all tasks in to_tasks.
Using classic operators/sensors:
cross_downstream(from_tasks=[t1, t2, t3], to_tasks=[t4, t5, t6])
is equivalent to:
t1 ---> t4 \ / t2 -X -> t5 / \ t3 ---> t6
t1.set_downstream(t4) t1.set_downstream(t5) t1.set_downstream(t6) t2.set_downstream(t4) t2.set_downstream(t5) t2.set_downstream(t6) t3.set_downstream(t4) t3.set_downstream(t5) t3.set_downstream(t6)
Using task-decorated functions aka XComArgs:
cross_downstream(from_tasks=[x1(), x2(), x3()], to_tasks=[x4(), x5(), x6()])
is equivalent to:
x1 ---> x4 \ / x2 -X -> x5 / \ x3 ---> x6
x1 = x1() x2 = x2() x3 = x3() x4 = x4() x5 = x5() x6 = x6() x1.set_downstream(x4) x1.set_downstream(x5) x1.set_downstream(x6) x2.set_downstream(x4) x2.set_downstream(x5) x2.set_downstream(x6) x3.set_downstream(x4) x3.set_downstream(x5) x3.set_downstream(x6)
It is also possible to mix between classic operator/sensor and XComArg tasks:
cross_downstream(from_tasks=[t1, x2(), t3], to_tasks=[x1(), t2, x3()])
is equivalent to:
t1 ---> x1 \ / x2 -X -> t2 / \ t3 ---> x3
x1 = x1() x2 = x2() x3 = x3() t1.set_downstream(x1) t1.set_downstream(t2) t1.set_downstream(x3) x2.set_downstream(x1) x2.set_downstream(t2) x2.set_downstream(x3) t3.set_downstream(x1) t3.set_downstream(t2) t3.set_downstream(x3)
- Parameters:
from_tasks (collections.abc.Sequence[airflow.sdk.definitions._internal.abstractoperator.DependencyMixin]) – List of tasks or XComArgs to start from.
to_tasks (airflow.sdk.definitions._internal.abstractoperator.DependencyMixin | collections.abc.Sequence[airflow.sdk.definitions._internal.abstractoperator.DependencyMixin]) – List of tasks or XComArgs to set as downstream dependencies.
- airflow.sdk.literal(value)
Wrap a value to ensure it is rendered as-is without applying Jinja templating to its contents.
Designed for use in an operator’s template field.
- Parameters:
value (Any) – The value to be rendered without templating
- Return type:
airflow.sdk.definitions._internal.templater.LiteralValue
- airflow.sdk.setup: collections.abc.Callable
- Parameters:
func (Callable)
- Return type:
Callable
- airflow.sdk.task: TaskDecoratorCollection
- airflow.sdk.task_group(group_id: str | None = None, prefix_group_id: bool = True, parent_group: airflow.sdk.definitions.taskgroup.TaskGroup | None = None, dag: airflow.sdk.definitions.dag.DAG | None = None, default_args: dict[str, Any] | None = None, tooltip: str = '', ui_color: str = 'CornflowerBlue', ui_fgcolor: str = '#000', add_suffix_on_collision: bool = False, group_display_name: str = '') collections.abc.Callable[[collections.abc.Callable[FParams, FReturn]], _TaskGroupFactory[FParams, FReturn]]
Python TaskGroup decorator.
This wraps a function into an Airflow TaskGroup. When used as the
@task_group()
form, all arguments are forwarded to the underlying TaskGroup class. Can be used to parameterize TaskGroup.- Parameters:
python_callable – Function to decorate.
tg_kwargs – Keyword arguments for the TaskGroup object.
- airflow.sdk.teardown: collections.abc.Callable
- Parameters:
on_failure_fail_dagrun (bool)
- Return type:
Callable