Airflow Summit 2025 is coming October 07-09. Register now for early bird ticket!

airflow.sdk API Reference

This page documents the full public API exposed in Airflow 3.0+ via the Task SDK python module.

If something is not on this page it is best to assume that it is not part of the public API and use of it is entirely at your own risk – we won’t go out of our way break usage of them, but we make no promises either.

Defining DAGs

class airflow.sdk.DAG

A dag (directed acyclic graph) is a collection of tasks with directional dependencies.

A dag also has a schedule, a start date and an end date (optional). For each schedule, (say daily or hourly), the DAG needs to run each individual tasks as their dependencies are met. Certain tasks have the property of depending on their own past, meaning that they can’t run until their previous schedule (and upstream tasks) are completed.

DAGs essentially act as namespaces for tasks. A task_id can only be added once to a DAG.

Note that if you plan to use time zones all the dates provided should be pendulum dates. See Time zone aware dags.

Added in version 2.4: The schedule argument to specify either time-based scheduling logic (timetable), or dataset-driven triggers.

Changed in version 3.0: The default value of schedule has been changed to None (no schedule). The previous default was timedelta(days=1).

Parameters:
  • dag_id – The id of the DAG; must consist exclusively of alphanumeric characters, dashes, dots and underscores (all ASCII)

  • description – The description for the DAG to e.g. be shown on the webserver

  • schedule – If provided, this defines the rules according to which DAG runs are scheduled. Possible values include a cron expression string, timedelta object, Timetable, or list of Asset objects. See also Customizing DAG Scheduling with Timetables.

  • start_date – The timestamp from which the scheduler will attempt to backfill. If this is not provided, backfilling must be done manually with an explicit time range.

  • end_date – A date beyond which your DAG won’t run, leave to None for open-ended scheduling.

  • template_searchpath – This list of folders (non-relative) defines where jinja will look for your templates. Order matters. Note that jinja/airflow includes the path of your DAG file by default

  • template_undefined – Template undefined type.

  • user_defined_macros – a dictionary of macros that will be exposed in your jinja templates. For example, passing dict(foo='bar') to this argument allows you to {{ foo }} in all jinja templates related to this DAG. Note that you can pass any type of object here.

  • user_defined_filters – a dictionary of filters that will be exposed in your jinja templates. For example, passing dict(hello=lambda name: 'Hello %s' % name) to this argument allows you to {{ 'world' | hello }} in all jinja templates related to this DAG.

  • default_args – A dictionary of default parameters to be used as constructor keyword parameters when initialising operators. Note that operators have the same hook, and precede those defined here, meaning that if your dict contains ‘depends_on_past’: True here and ‘depends_on_past’: False in the operator’s call default_args, the actual value will be False.

  • params – a dictionary of DAG level parameters that are made accessible in templates, namespaced under params. These params can be overridden at the task level.

  • max_active_tasks – the number of task instances allowed to run concurrently

  • max_active_runs – maximum number of active DAG runs, beyond this number of DAG runs in a running state, the scheduler won’t create new active DAG runs

  • max_consecutive_failed_dag_runs – (experimental) maximum number of consecutive failed DAG runs, beyond this the scheduler will disable the DAG

  • dagrun_timeout – Specify the duration a DagRun should be allowed to run before it times out or fails. Task instances that are running when a DagRun is timed out will be marked as skipped.

  • sla_miss_callback – DEPRECATED - The SLA feature is removed in Airflow 3.0, to be replaced with a new implementation in 3.1

  • catchup – Perform scheduler catchup (or only run latest)? Defaults to False

  • on_failure_callback – A function or list of functions to be called when a DagRun of this dag fails. A context dictionary is passed as a single parameter to this function.

  • on_success_callback – Much like the on_failure_callback except that it is executed when the dag succeeds.

  • access_control – Specify optional DAG-level actions, e.g., “{‘role1’: {‘can_read’}, ‘role2’: {‘can_read’, ‘can_edit’, ‘can_delete’}}” or it can specify the resource name if there is a DAGs Run resource, e.g., “{‘role1’: {‘DAG Runs’: {‘can_create’}}, ‘role2’: {‘DAGs’: {‘can_read’, ‘can_edit’, ‘can_delete’}}”

  • is_paused_upon_creation – Specifies if the dag is paused when created for the first time. If the dag exists already, this flag will be ignored. If this optional parameter is not specified, the global config setting will be used.

  • jinja_environment_kwargs

    additional configuration options to be passed to Jinja Environment for template rendering

    Example: to avoid Jinja from removing a trailing newline from template strings

    DAG(
        dag_id="my-dag",
        jinja_environment_kwargs={
            "keep_trailing_newline": True,
            # some other jinja2 Environment options here
        },
    )
    

    See: Jinja Environment documentation

  • render_template_as_native_obj – If True, uses a Jinja NativeEnvironment to render templates as native Python types. If False, a Jinja Environment is used to render templates as string values.

  • tags – List of tags to help filtering DAGs in the UI.

  • owner_links – Dict of owners and their links, that will be clickable on the DAGs view UI. Can be used as an HTTP link (for example the link to your Slack channel), or a mailto link. e.g: {"dag_owner": "https://airflow.apache.org/"}

  • auto_register – Automatically register this DAG when it is used in a with block

  • fail_fast – Fails currently running tasks when task in DAG fails. Warning: A fail stop dag can only have tasks with the default trigger rule (“all_success”). An exception will be thrown if any task in a fail stop dag has a non default trigger rule.

  • dag_display_name – The display name of the DAG which appears on the UI.

Decorators

airflow.sdk.dag(dag_id='', *, description=None, schedule=None, start_date=None, end_date=None, template_searchpath=None, template_undefined=jinja2.StrictUndefined, user_defined_macros=None, user_defined_filters=None, default_args=None, max_active_tasks=..., max_active_runs=..., max_consecutive_failed_dag_runs=..., dagrun_timeout=None, catchup=..., on_success_callback=None, on_failure_callback=None, deadline=None, doc_md=None, params=None, access_control=None, is_paused_upon_creation=None, jinja_environment_kwargs=None, render_template_as_native_obj=False, tags=None, owner_links=None, auto_register=True, fail_fast=False, dag_display_name=None, disable_bundle_versioning=False)
Parameters:
  • dag_id (str)

  • description (str | None)

  • schedule (ScheduleArg)

  • start_date (datetime.datetime | None)

  • end_date (datetime.datetime | None)

  • template_searchpath (str | collections.abc.Iterable[str] | None)

  • template_undefined (type[jinja2.StrictUndefined])

  • user_defined_macros (dict | None)

  • user_defined_filters (dict | None)

  • default_args (dict[str, Any] | None)

  • max_active_tasks (int)

  • max_active_runs (int)

  • max_consecutive_failed_dag_runs (int)

  • dagrun_timeout (datetime.timedelta | None)

  • catchup (bool)

  • on_success_callback (None | DagStateChangeCallback | list[DagStateChangeCallback])

  • on_failure_callback (None | DagStateChangeCallback | list[DagStateChangeCallback])

  • deadline (airflow.sdk.definitions.deadline.DeadlineAlert | None)

  • doc_md (str | None)

  • params (airflow.sdk.definitions.param.ParamsDict | dict[str, Any] | None)

  • access_control (dict[str, dict[str, collections.abc.Collection[str]]] | dict[str, collections.abc.Collection[str]] | None)

  • is_paused_upon_creation (bool | None)

  • jinja_environment_kwargs (dict | None)

  • render_template_as_native_obj (bool)

  • tags (collections.abc.Collection[str] | None)

  • owner_links (dict[str, str] | None)

  • auto_register (bool)

  • fail_fast (bool)

  • dag_display_name (str | None)

  • disable_bundle_versioning (bool)

Return type:

collections.abc.Callable[[collections.abc.Callable], collections.abc.Callable[Ellipsis, DAG]]

Task Decorators:

  • @task.run_if(condition, skip_message=None) Run the task only if the given condition is met; otherwise the task is skipped. The condition is a callable that receives the task execution context and returns either a boolean or a tuple (bool, message).

  • @task.skip_if(condition, skip_message=None) Skip the task if the given condition is met, raising a skip exception with an optional message.

  • Provider-specific task decorators under @task.<provider>, e.g. @task.python, @task.docker, etc., dynamically loaded from registered providers.

airflow.sdk.task_group(group_id: str | None = None, prefix_group_id: bool = True, parent_group: airflow.sdk.definitions.taskgroup.TaskGroup | None = None, dag: airflow.sdk.definitions.dag.DAG | None = None, default_args: dict[str, Any] | None = None, tooltip: str = '', ui_color: str = 'CornflowerBlue', ui_fgcolor: str = '#000', add_suffix_on_collision: bool = False, group_display_name: str = '') collections.abc.Callable[[collections.abc.Callable[FParams, FReturn]], _TaskGroupFactory[FParams, FReturn]]

Python TaskGroup decorator.

This wraps a function into an Airflow TaskGroup. When used as the @task_group() form, all arguments are forwarded to the underlying TaskGroup class. Can be used to parameterize TaskGroup.

Parameters:
  • python_callable – Function to decorate.

  • tg_kwargs – Keyword arguments for the TaskGroup object.

airflow.sdk.task(*args, **kwargs)

Implementation to provide the @task syntax.

airflow.sdk.setup(func)

Decorate a function to mark it as a setup task.

A setup task runs before all other tasks in its DAG or TaskGroup context and can perform initialization or resource preparation.

Example:

@setup
def initialize_context(...):
    ...
Parameters:

func (Callable)

Return type:

Callable

airflow.sdk.teardown(_func=None, *, on_failure_fail_dagrun=False)

Decorate a function to mark it as a teardown task.

A teardown task runs after all main tasks in its DAG or TaskGroup context. If on_failure_fail_dagrun=True, a failure in teardown will mark the DAG run as failed.

Example:

@teardown(on_failure_fail_dagrun=True)
def cleanup(...):
    ...
Parameters:

on_failure_fail_dagrun (bool)

Return type:

Callable

airflow.sdk.asset(*, schedule, is_paused_upon_creation=None, dag_id=None, dag_display_name=None, description=None, params=None, on_success_callback=None, on_failure_callback=None, access_control=None, owner_links=NOTHING, tags=NOTHING, name=None, uri=None, group='asset', extra=NOTHING, watchers=NOTHING)

Create an asset by decorating a materialization function.

Parameters:
  • schedule (ScheduleArg)

  • is_paused_upon_creation (bool | None)

  • dag_id (str | None)

  • dag_display_name (str | None)

  • description (str | None)

  • params (ParamsDict | None)

  • on_success_callback (None | DagStateChangeCallback | list[DagStateChangeCallback])

  • on_failure_callback (None | DagStateChangeCallback | list[DagStateChangeCallback])

  • access_control (dict[str, dict[str, Collection[str]]] | None)

  • owner_links (dict[str, str])

  • tags (Collection[str])

  • name (str | None)

  • uri (str | ObjectStoragePath | None)

  • group (str)

  • extra (dict[str, Any])

  • watchers (list[BaseTrigger])

Return type:

None

Bases

class airflow.sdk.BaseOperator(*, task_id, owner=DEFAULT_OWNER, email=None, email_on_retry=True, email_on_failure=True, retries=DEFAULT_RETRIES, retry_delay=DEFAULT_RETRY_DELAY, retry_exponential_backoff=False, max_retry_delay=None, start_date=None, end_date=None, depends_on_past=False, ignore_first_depends_on_past=DEFAULT_IGNORE_FIRST_DEPENDS_ON_PAST, wait_for_past_depends_before_skipping=DEFAULT_WAIT_FOR_PAST_DEPENDS_BEFORE_SKIPPING, wait_for_downstream=False, dag=None, params=None, default_args=None, priority_weight=DEFAULT_PRIORITY_WEIGHT, weight_rule=DEFAULT_WEIGHT_RULE, queue=DEFAULT_QUEUE, pool=None, pool_slots=DEFAULT_POOL_SLOTS, sla=None, execution_timeout=DEFAULT_TASK_EXECUTION_TIMEOUT, on_execute_callback=None, on_failure_callback=None, on_success_callback=None, on_retry_callback=None, on_skipped_callback=None, pre_execute=None, post_execute=None, trigger_rule=DEFAULT_TRIGGER_RULE, resources=None, run_as_user=None, map_index_template=None, max_active_tis_per_dag=None, max_active_tis_per_dagrun=None, executor=None, executor_config=None, do_xcom_push=True, multiple_outputs=False, inlets=None, outlets=None, task_group=None, doc=None, doc_md=None, doc_json=None, doc_yaml=None, doc_rst=None, task_display_name=None, logger_name=None, allow_nested_operators=True, **kwargs)

Abstract base class for all operators.

Since operators create objects that become nodes in the DAG, BaseOperator contains many recursive methods for DAG crawling behavior. To derive from this class, you are expected to override the constructor and the ‘execute’ method.

Operators derived from this class should perform or trigger certain tasks synchronously (wait for completion). Example of operators could be an operator that runs a Pig job (PigOperator), a sensor operator that waits for a partition to land in Hive (HiveSensorOperator), or one that moves data from Hive to MySQL (Hive2MySqlOperator). Instances of these operators (tasks) target specific operations, running specific scripts, functions or data transfers.

This class is abstract and shouldn’t be instantiated. Instantiating a class derived from this one results in the creation of a task object, which ultimately becomes a node in DAG objects. Task dependencies should be set by using the set_upstream and/or set_downstream methods.

Parameters:
  • task_id (str) – a unique, meaningful id for the task

  • owner (str) – the owner of the task. Using a meaningful description (e.g. user/person/team/role name) to clarify ownership is recommended.

  • email (str | collections.abc.Sequence[str] | None) – the ‘to’ email address(es) used in email alerts. This can be a single email or multiple ones. Multiple addresses can be specified as a comma or semicolon separated string or by passing a list of strings. (deprecated)

  • email_on_retry (bool) – Indicates whether email alerts should be sent when a task is retried (deprecated)

  • email_on_failure (bool) – Indicates whether email alerts should be sent when a task failed (deprecated)

  • retries (int | None) – the number of retries that should be performed before failing the task

  • retry_delay (datetime.timedelta | float) – delay between retries, can be set as timedelta or float seconds, which will be converted into timedelta, the default is timedelta(seconds=300).

  • retry_exponential_backoff (bool) – allow progressively longer waits between retries by using exponential backoff algorithm on retry delay (delay will be converted into seconds)

  • max_retry_delay (datetime.timedelta | float | None) – maximum delay interval between retries, can be set as timedelta or float seconds, which will be converted into timedelta.

  • start_date (datetime.datetime | None) – The start_date for the task, determines the logical_date for the first task instance. The best practice is to have the start_date rounded to your DAG’s schedule_interval. Daily jobs have their start_date some day at 00:00:00, hourly jobs have their start_date at 00:00 of a specific hour. Note that Airflow simply looks at the latest logical_date and adds the schedule_interval to determine the next logical_date. It is also very important to note that different tasks’ dependencies need to line up in time. If task A depends on task B and their start_date are offset in a way that their logical_date don’t line up, A’s dependencies will never be met. If you are looking to delay a task, for example running a daily task at 2AM, look into the TimeSensor and TimeDeltaSensor. We advise against using dynamic start_date and recommend using fixed ones. Read the FAQ entry about start_date for more information.

  • end_date (datetime.datetime | None) – if specified, the scheduler won’t go beyond this date

  • depends_on_past (bool) – when set to true, task instances will run sequentially and only if the previous instance has succeeded or has been skipped. The task instance for the start_date is allowed to run.

  • wait_for_past_depends_before_skipping (bool) – when set to true, if the task instance should be marked as skipped, and depends_on_past is true, the ti will stay on None state waiting the task of the previous run

  • wait_for_downstream (bool) – when set to true, an instance of task X will wait for tasks immediately downstream of the previous instance of task X to finish successfully or be skipped before it runs. This is useful if the different instances of a task X alter the same asset, and this asset is used by tasks downstream of task X. Note that depends_on_past is forced to True wherever wait_for_downstream is used. Also note that only tasks immediately downstream of the previous task instance are waited for; the statuses of any tasks further downstream are ignored.

  • dag (airflow.sdk.definitions.dag.DAG | None) – a reference to the dag the task is attached to (if any)

  • priority_weight (int) – priority weight of this task against other task. This allows the executor to trigger higher priority tasks before others when things get backed up. Set priority_weight as a higher number for more important tasks. As not all database engines support 64-bit integers, values are capped with 32-bit. Valid range is from -2,147,483,648 to 2,147,483,647.

  • weight_rule (str | airflow.task.priority_strategy.PriorityWeightStrategy) – weighting method used for the effective total priority weight of the task. Options are: { downstream | upstream | absolute } default is downstream When set to downstream the effective weight of the task is the aggregate sum of all downstream descendants. As a result, upstream tasks will have higher weight and will be scheduled more aggressively when using positive weight values. This is useful when you have multiple dag run instances and desire to have all upstream tasks to complete for all runs before each dag can continue processing downstream tasks. When set to upstream the effective weight is the aggregate sum of all upstream ancestors. This is the opposite where downstream tasks have higher weight and will be scheduled more aggressively when using positive weight values. This is useful when you have multiple dag run instances and prefer to have each dag complete before starting upstream tasks of other dags. When set to absolute, the effective weight is the exact priority_weight specified without additional weighting. You may want to do this when you know exactly what priority weight each task should have. Additionally, when set to absolute, there is bonus effect of significantly speeding up the task creation process as for very large DAGs. Options can be set as string or using the constants defined in the static class airflow.utils.WeightRule. Irrespective of the weight rule, resulting priority values are capped with 32-bit. This is an experimental feature. Since 2.9.0, Airflow allows to define custom priority weight strategy, by creating a subclass of airflow.task.priority_strategy.PriorityWeightStrategy and registering in a plugin, then providing the class path or the class instance via weight_rule parameter. The custom priority weight strategy will be used to calculate the effective total priority weight of the task instance.

  • queue (str) – which queue to target when running this job. Not all executors implement queue management, the CeleryExecutor does support targeting specific queues.

  • pool (str | None) – the slot pool this task should run in, slot pools are a way to limit concurrency for certain tasks

  • pool_slots (int) – the number of pool slots this task should use (>= 1) Values less than 1 are not allowed.

  • sla (datetime.timedelta | None) – DEPRECATED - The SLA feature is removed in Airflow 3.0, to be replaced with a new implementation in Airflow >=3.1.

  • execution_timeout (datetime.timedelta | None) – max time allowed for the execution of this task instance, if it goes beyond it will raise and fail.

  • on_failure_callback (None | airflow.sdk.definitions._internal.abstractoperator.TaskStateChangeCallback | list[airflow.sdk.definitions._internal.abstractoperator.TaskStateChangeCallback]) – a function or list of functions to be called when a task instance of this task fails. a context dictionary is passed as a single parameter to this function. Context contains references to related objects to the task instance and is documented under the macros section of the API.

  • on_execute_callback (None | airflow.sdk.definitions._internal.abstractoperator.TaskStateChangeCallback | collections.abc.Collection[airflow.sdk.definitions._internal.abstractoperator.TaskStateChangeCallback]) – much like the on_failure_callback except that it is executed right before the task is executed.

  • on_retry_callback (None | airflow.sdk.definitions._internal.abstractoperator.TaskStateChangeCallback | list[airflow.sdk.definitions._internal.abstractoperator.TaskStateChangeCallback]) – much like the on_failure_callback except that it is executed when retries occur.

  • on_success_callback (None | airflow.sdk.definitions._internal.abstractoperator.TaskStateChangeCallback | collections.abc.Collection[airflow.sdk.definitions._internal.abstractoperator.TaskStateChangeCallback]) – much like the on_failure_callback except that it is executed when the task succeeds.

  • on_skipped_callback (None | airflow.sdk.definitions._internal.abstractoperator.TaskStateChangeCallback | collections.abc.Collection[airflow.sdk.definitions._internal.abstractoperator.TaskStateChangeCallback]) – much like the on_failure_callback except that it is executed when skipped occur; this callback will be called only if AirflowSkipException get raised. Explicitly it is NOT called if a task is not started to be executed because of a preceding branching decision in the DAG or a trigger rule which causes execution to skip so that the task execution is never scheduled.

  • pre_execute (TaskPreExecuteHook | None) –

    a function to be called immediately before task execution, receiving a context dictionary; raising an exception will prevent the task from being executed.

    This is an experimental feature.

  • post_execute (TaskPostExecuteHook | None) –

    a function to be called immediately after task execution, receiving a context dictionary and task result; raising an exception will prevent the task from succeeding.

    This is an experimental feature.

  • trigger_rule (str) – defines the rule by which dependencies are applied for the task to get triggered. Options are: { all_success | all_failed | all_done | all_skipped | one_success | one_done | one_failed | none_failed | none_failed_min_one_success | none_skipped | always} default is all_success. Options can be set as string or using the constants defined in the static class airflow.utils.TriggerRule

  • resources (dict[str, Any] | None) – A map of resource parameter names (the argument names of the Resources constructor) to their values.

  • run_as_user (str | None) – unix username to impersonate while running the task

  • max_active_tis_per_dag (int | None) – When set, a task will be able to limit the concurrent runs across logical_dates.

  • max_active_tis_per_dagrun (int | None) – When set, a task will be able to limit the concurrent task instances per DAG run.

  • executor (str | None) – Which executor to target when running this task. NOT YET SUPPORTED

  • executor_config (dict | None) –

    Additional task-level configuration parameters that are interpreted by a specific executor. Parameters are namespaced by the name of executor.

    Example: to run this task in a specific docker container through the KubernetesExecutor

    MyOperator(..., executor_config={"KubernetesExecutor": {"image": "myCustomDockerImage"}})
    

  • do_xcom_push (bool) – if True, an XCom is pushed containing the Operator’s result

  • multiple_outputs (bool) – if True and do_xcom_push is True, pushes multiple XComs, one for each key in the returned dictionary result. If False and do_xcom_push is True, pushes a single XCom.

  • task_group (airflow.sdk.definitions.taskgroup.TaskGroup | None) – The TaskGroup to which the task should belong. This is typically provided when not using a TaskGroup as a context manager.

  • doc (str | None) – Add documentation or notes to your Task objects that is visible in Task Instance details View in the Webserver

  • doc_md (str | None) – Add documentation (in Markdown format) or notes to your Task objects that is visible in Task Instance details View in the Webserver

  • doc_rst (str | None) – Add documentation (in RST format) or notes to your Task objects that is visible in Task Instance details View in the Webserver

  • doc_json (str | None) – Add documentation (in JSON format) or notes to your Task objects that is visible in Task Instance details View in the Webserver

  • doc_yaml (str | None) – Add documentation (in YAML format) or notes to your Task objects that is visible in Task Instance details View in the Webserver

  • task_display_name (str | None) – The display name of the task which appears on the UI.

  • logger_name (str | None) – Name of the logger used by the Operator to emit logs. If set to None (default), the logger name will fall back to airflow.task.operators.{class.__module__}.{class.__name__} (e.g. SimpleHttpOperator will have airflow.task.operators.airflow.providers.http.operators.http.SimpleHttpOperator as logger).

  • allow_nested_operators (bool) –

    if True, when an operator is executed within another one a warning message will be logged. If False, then an exception will be raised if the operator is badly used (e.g. nested within another one). In future releases of Airflow this parameter will be removed and an exception will always be thrown when operators are nested within each other (default is True).

    Example: example of a bad operator mixin usage:

    @task(provide_context=True)
    def say_hello_world(**context):
        hello_world_task = BashOperator(
            task_id="hello_world_task",
            bash_command="python -c \"print('Hello, world!')\"",
            dag=dag,
        )
        hello_world_task.execute(context)
    

  • ignore_first_depends_on_past (bool)

  • params (collections.abc.MutableMapping[str, Any] | None)

  • default_args (dict | None)

  • map_index_template (str | None)

  • inlets (Any | None)

  • outlets (Any | None)

  • kwargs (Any)

class airflow.sdk.BaseSensorOperator(*, poke_interval=60, timeout=conf.getfloat('sensors', 'default_timeout'), soft_fail=False, mode='poke', exponential_backoff=False, max_wait=None, silent_fail=False, never_fail=False, **kwargs)

Sensor operators are derived from this class and inherit these attributes.

Sensor operators keep executing at a time interval and succeed when a criteria is met and fail if and when they time out.

Parameters:
  • soft_fail (bool) – Set to true to mark the task as SKIPPED on failure. Mutually exclusive with never_fail.

  • poke_interval (datetime.timedelta | float) – Time that the job should wait in between each try. Can be timedelta or float seconds.

  • timeout (datetime.timedelta | float) – Time elapsed before the task times out and fails. Can be timedelta or float seconds. This should not be confused with execution_timeout of the BaseOperator class. timeout measures the time elapsed between the first poke and the current time (taking into account any reschedule delay between each poke), while execution_timeout checks the running time of the task (leaving out any reschedule delay). In case that the mode is poke (see below), both of them are equivalent (as the sensor is never rescheduled), which is not the case in reschedule mode.

  • mode (str) – How the sensor operates. Options are: { poke | reschedule }, default is poke. When set to poke the sensor is taking up a worker slot for its whole execution time and sleeps between pokes. Use this mode if the expected runtime of the sensor is short or if a short poke interval is required. Note that the sensor will hold onto a worker slot and a pool slot for the duration of the sensor’s runtime in this mode. When set to reschedule the sensor task frees the worker slot when the criteria is not yet met and it’s rescheduled at a later time. Use this mode if the time before the criteria is met is expected to be quite long. The poke interval should be more than one minute to prevent too much load on the scheduler.

  • exponential_backoff (bool) – allow progressive longer waits between pokes by using exponential backoff algorithm

  • max_wait (datetime.timedelta | float | None) – maximum wait interval between pokes, can be timedelta or float seconds

  • silent_fail (bool) – If true, and poke method raises an exception different from AirflowSensorTimeout, AirflowTaskTimeout, AirflowSkipException and AirflowFailException, the sensor will log the error and continue its execution. Otherwise, the sensor task fails, and it can be retried based on the provided retries parameter.

  • never_fail (bool) – If true, and poke method raises an exception, sensor will be skipped. Mutually exclusive with soft_fail.

class airflow.sdk.BaseNotifier

BaseNotifier class for sending notifications.

Abstract base class that defines how we get an operator link.

class airflow.sdk.XComArg

Reference to an XCom value pushed from another operator.

The implementation supports:

xcomarg >> op
xcomarg << op
op >> xcomarg  # By BaseOperator code
op << xcomarg  # By BaseOperator code

Example: The moment you get a result from any operator (decorated or regular) you can

any_op = AnyOperator()
xcomarg = XComArg(any_op)
# or equivalently
xcomarg = any_op.output
my_op = MyOperator()
my_op >> xcomarg

This object can be used in legacy Operators via Jinja.

Example: You can make this result to be part of any generated string:

any_op = AnyOperator()
xcomarg = any_op.output
op1 = MyOperator(my_text_message=f"the value is {xcomarg}")
op2 = MyOperator(my_text_message=f"the value is {xcomarg['topic']}")
Parameters:
  • operator – Operator instance to which the XComArg references.

  • key – Key used to pull the XCom value. Defaults to XCOM_RETURN_KEY, i.e. the referenced operator’s return value.

class airflow.sdk.PokeReturnValue(is_done, xcom_value=None)

Optional return value for poke methods.

Sensors can optionally return an instance of the PokeReturnValue class in the poke method. If an XCom value is supplied when the sensor is done, then the XCom value will be pushed through the operator return value. :param is_done: Set to true to indicate the sensor can stop poking. :param xcom_value: An optional XCOM value to be returned by the operator.

Parameters:
  • is_done (bool)

  • xcom_value (Any | None)

class airflow.sdk.BaseHook(logger_name=None)

Abstract base class for hooks.

Hooks are meant as an interface to interact with external systems. MySqlHook, HiveHook, PigHook return object that can handle the connection and interaction to specific instances of these systems, and expose consistent methods to interact with them.

Parameters:

logger_name (str | None) – Name of the logger used by the Hook to emit logs. If set to None (default), the logger name will fall back to airflow.task.hooks.{class.__module__}.{class.__name__} (e.g. DbApiHook will have airflow.task.hooks.airflow.providers.common.sql.hooks.sql.DbApiHook as logger).

Connections & Variables

class airflow.sdk.Connection

A connection to an external data source.

Parameters:
  • conn_id – The connection ID.

  • conn_type – The connection type.

  • description – The connection description.

  • host – The host.

  • login – The login.

  • password – The password.

  • schema – The schema.

  • port – The port number.

  • extra – Extra metadata. Non-standard data such as private/SSH keys can be saved here. JSON encoded object.

class airflow.sdk.Variable

A generic way to store and retrieve arbitrary content or settings as a simple key/value store.

Parameters:
  • key – The variable key.

  • value – The variable value.

  • description – The variable description.

Tasks & Operators

class airflow.sdk.TaskGroup

A collection of tasks.

When set_downstream() or set_upstream() are called on the TaskGroup, it is applied across all tasks within the group if necessary.

Parameters:
  • group_id – a unique, meaningful id for the TaskGroup. group_id must not conflict with group_id of TaskGroup or task_id of tasks in the DAG. Root TaskGroup has group_id set to None.

  • prefix_group_id – If set to True, child task_id and group_id will be prefixed with this TaskGroup’s group_id. If set to False, child task_id and group_id are not prefixed. Default is True.

  • parent_group – The parent TaskGroup of this TaskGroup. parent_group is set to None for the root TaskGroup.

  • dag – The DAG that this TaskGroup belongs to.

  • default_args – A dictionary of default parameters to be used as constructor keyword parameters when initialising operators, will override default_args defined in the DAG level. Note that operators have the same hook, and precede those defined here, meaning that if your dict contains ‘depends_on_past’: True here and ‘depends_on_past’: False in the operator’s call default_args, the actual value will be False.

  • tooltip – The tooltip of the TaskGroup node when displayed in the UI

  • ui_color – The fill color of the TaskGroup node when displayed in the UI

  • ui_fgcolor – The label color of the TaskGroup node when displayed in the UI

  • add_suffix_on_collision – If this task group name already exists, automatically add __1 etc suffixes

  • group_display_name – If set, this will be the display name for the TaskGroup node in the UI.

airflow.sdk.get_current_context()

Retrieve the execution context dictionary without altering user method’s signature.

This is the simplest method of retrieving the execution context dictionary.

Old style:

def my_task(**context):
    ti = context["ti"]

New style:

from airflow.sdk import get_current_context


def my_task():
    context = get_current_context()
    ti = context["ti"]

Current context will only have value if this method was called after an operator was starting to execute.

Return type:

Context

airflow.sdk.get_parsing_context()

Return the current (DAG) parsing context info.

Return type:

AirflowParsingContext

class airflow.sdk.Param(default=NOTSET, description=None, **kwargs)

Class to hold the default value of a Param and rule set to do the validations.

Without the rule set it always validates and returns the default value.

Parameters:
  • default (Any) – The value this Param object holds

  • description (str | None) – Optional help text for the Param

  • schema – The validation schema of the Param, if not given then all kwargs except default & description will form the schema

Setting Dependencies

airflow.sdk.chain(*tasks)

Given a number of tasks, builds a dependency chain.

This function accepts values of BaseOperator (aka tasks), EdgeModifiers (aka Labels), XComArg, TaskGroups, or lists containing any mix of these types (or a mix in the same list). If you want to chain between two lists you must ensure they have the same length.

Using classic operators/sensors:

chain(t1, [t2, t3], [t4, t5], t6)

is equivalent to:

  / -> t2 -> t4 \
t1               -> t6
  \ -> t3 -> t5 /
t1.set_downstream(t2)
t1.set_downstream(t3)
t2.set_downstream(t4)
t3.set_downstream(t5)
t4.set_downstream(t6)
t5.set_downstream(t6)

Using task-decorated functions aka XComArgs:

chain(x1(), [x2(), x3()], [x4(), x5()], x6())

is equivalent to:

  / -> x2 -> x4 \
x1               -> x6
  \ -> x3 -> x5 /
x1 = x1()
x2 = x2()
x3 = x3()
x4 = x4()
x5 = x5()
x6 = x6()
x1.set_downstream(x2)
x1.set_downstream(x3)
x2.set_downstream(x4)
x3.set_downstream(x5)
x4.set_downstream(x6)
x5.set_downstream(x6)

Using TaskGroups:

chain(t1, task_group1, task_group2, t2)

t1.set_downstream(task_group1)
task_group1.set_downstream(task_group2)
task_group2.set_downstream(t2)

It is also possible to mix between classic operator/sensor, EdgeModifiers, XComArg, and TaskGroups:

chain(t1, [Label("branch one"), Label("branch two")], [x1(), x2()], task_group1, x3())

is equivalent to:

  / "branch one" -> x1 \
t1                      -> task_group1 -> x3
  \ "branch two" -> x2 /
x1 = x1()
x2 = x2()
x3 = x3()
label1 = Label("branch one")
label2 = Label("branch two")
t1.set_downstream(label1)
label1.set_downstream(x1)
t2.set_downstream(label2)
label2.set_downstream(x2)
x1.set_downstream(task_group1)
x2.set_downstream(task_group1)
task_group1.set_downstream(x3)

# or

x1 = x1()
x2 = x2()
x3 = x3()
t1.set_downstream(x1, edge_modifier=Label("branch one"))
t1.set_downstream(x2, edge_modifier=Label("branch two"))
x1.set_downstream(task_group1)
x2.set_downstream(task_group1)
task_group1.set_downstream(x3)
Parameters:

tasks (airflow.sdk.definitions._internal.abstractoperator.DependencyMixin | collections.abc.Sequence[airflow.sdk.definitions._internal.abstractoperator.DependencyMixin]) – Individual and/or list of tasks, EdgeModifiers, XComArgs, or TaskGroups to set dependencies

Return type:

None

airflow.sdk.chain_linear(*elements)

Simplify task dependency definition.

E.g.: suppose you want precedence like so:

    ╭─op2─╮ ╭─op4─╮
op1─┤     ├─├─op5─┤─op7
    ╰-op3─╯ ╰-op6─╯

Then you can accomplish like so:

chain_linear(op1, [op2, op3], [op4, op5, op6], op7)
Parameters:

elements (airflow.sdk.definitions._internal.abstractoperator.DependencyMixin | collections.abc.Sequence[airflow.sdk.definitions._internal.abstractoperator.DependencyMixin]) – a list of operators / lists of operators

airflow.sdk.cross_downstream(from_tasks, to_tasks)

Set downstream dependencies for all tasks in from_tasks to all tasks in to_tasks.

Using classic operators/sensors:

cross_downstream(from_tasks=[t1, t2, t3], to_tasks=[t4, t5, t6])

is equivalent to:

t1 ---> t4
   \ /
t2 -X -> t5
   / \
t3 ---> t6
t1.set_downstream(t4)
t1.set_downstream(t5)
t1.set_downstream(t6)
t2.set_downstream(t4)
t2.set_downstream(t5)
t2.set_downstream(t6)
t3.set_downstream(t4)
t3.set_downstream(t5)
t3.set_downstream(t6)

Using task-decorated functions aka XComArgs:

cross_downstream(from_tasks=[x1(), x2(), x3()], to_tasks=[x4(), x5(), x6()])

is equivalent to:

x1 ---> x4
   \ /
x2 -X -> x5
   / \
x3 ---> x6
x1 = x1()
x2 = x2()
x3 = x3()
x4 = x4()
x5 = x5()
x6 = x6()
x1.set_downstream(x4)
x1.set_downstream(x5)
x1.set_downstream(x6)
x2.set_downstream(x4)
x2.set_downstream(x5)
x2.set_downstream(x6)
x3.set_downstream(x4)
x3.set_downstream(x5)
x3.set_downstream(x6)

It is also possible to mix between classic operator/sensor and XComArg tasks:

cross_downstream(from_tasks=[t1, x2(), t3], to_tasks=[x1(), t2, x3()])

is equivalent to:

t1 ---> x1
   \ /
x2 -X -> t2
   / \
t3 ---> x3
x1 = x1()
x2 = x2()
x3 = x3()
t1.set_downstream(x1)
t1.set_downstream(t2)
t1.set_downstream(x3)
x2.set_downstream(x1)
x2.set_downstream(t2)
x2.set_downstream(x3)
t3.set_downstream(x1)
t3.set_downstream(t2)
t3.set_downstream(x3)
Parameters:
  • from_tasks (collections.abc.Sequence[airflow.sdk.definitions._internal.abstractoperator.DependencyMixin]) – List of tasks or XComArgs to start from.

  • to_tasks (airflow.sdk.definitions._internal.abstractoperator.DependencyMixin | collections.abc.Sequence[airflow.sdk.definitions._internal.abstractoperator.DependencyMixin]) – List of tasks or XComArgs to set as downstream dependencies.

airflow.sdk.literal(value)

Wrap a value to ensure it is rendered as-is without applying Jinja templating to its contents.

Designed for use in an operator’s template field.

Parameters:

value (Any) – The value to be rendered without templating

Return type:

airflow.sdk.definitions._internal.templater.LiteralValue

Edges & Labels

class airflow.sdk.EdgeModifier(label=None)

Class that represents edge information to be added between two tasks/operators.

Has shorthand factory functions, like Label(“hooray”).

Current implementation supports

t1 >> Label(“Success route”) >> t2 t2 << Label(“Success route”) << t2

Note that due to the potential for use in either direction, this waits to make the actual connection between both sides until both are declared, and will do so progressively if multiple ups/downs are added.

This and EdgeInfo are related - an EdgeModifier is the Python object you use to add information to (potentially multiple) edges, and EdgeInfo is the representation of the information for one specific edge.

Parameters:

label (str | None)

class airflow.sdk.Label(label)

Create an EdgeModifier that sets a human-readable label on the edge.

Parameters:

label (str)

Assets

class airflow.sdk.Asset(name: str, uri: str | airflow.sdk.io.path.ObjectStoragePath, *, group: str = ..., extra: dict | None = None, watchers: list[AssetWatcher | airflow.serialization.serialized_objects.SerializedAssetWatcher] = ...)

A representation of data asset dependencies between workflows.

class airflow.sdk.AssetAlias

A representation of asset alias which is used to create asset during the runtime.

class airflow.sdk.AssetAll(*objects)

Use to combine assets schedule references in an “and” relationship.

Parameters:

objects (BaseAsset)

class airflow.sdk.AssetAny(*objects)

Use to combine assets schedule references in an “or” relationship.

Parameters:

objects (BaseAsset)

class airflow.sdk.AssetWatcher(name, trigger)

A representation of an asset watcher. The name uniquely identifies the watch.

Parameters:
class airflow.sdk.Metadata

Metadata to attach to an AssetEvent.

I/O Helpers

class airflow.sdk.ObjectStoragePath(*args, protocol=None, **storage_options)

A path-like object for object storage.

Parameters:
  • protocol (str | None)

  • storage_options (Any)

Execution Time Components

Context

class airflow.sdk.Context

Jinja2 template context for task rendering.

Everything else

class airflow.sdk.BaseHook(logger_name=None)

Abstract base class for hooks.

Hooks are meant as an interface to interact with external systems. MySqlHook, HiveHook, PigHook return object that can handle the connection and interaction to specific instances of these systems, and expose consistent methods to interact with them.

Parameters:

logger_name (str | None) – Name of the logger used by the Hook to emit logs. If set to None (default), the logger name will fall back to airflow.task.hooks.{class.__module__}.{class.__name__} (e.g. DbApiHook will have airflow.task.hooks.airflow.providers.common.sql.hooks.sql.DbApiHook as logger).

abstract get_conn()

Return connection for the hook.

Return type:

Any

classmethod get_connection(conn_id)

Get connection, given connection id.

Parameters:

conn_id (str) – connection id

Returns:

connection

Return type:

airflow.sdk.definitions.connection.Connection

classmethod get_connection_form_widgets()
Return type:

dict[str, Any]

classmethod get_hook(conn_id, hook_params=None)

Return default hook for this connection id.

Parameters:
  • conn_id (str) – connection id

  • hook_params (dict | None) – hook parameters

Returns:

default hook for this connection

classmethod get_ui_field_behaviour()
Return type:

dict[str, Any]

class airflow.sdk.BaseNotifier

BaseNotifier class for sending notifications.

abstract notify(context)

Send a notification.

Parameters:

context (airflow.sdk.definitions.context.Context) – The airflow context

Return type:

None

render_template_fields(context, jinja_env=None)

Template all attributes listed in self.template_fields.

This mutates the attributes in-place and is irreversible.

Parameters:
  • context (airflow.sdk.definitions.context.Context) – Context dict with values to apply on content.

  • jinja_env (jinja2.Environment | None) – Jinja environment to use for rendering.

Return type:

None

template_ext: collections.abc.Sequence[str] = ()
template_fields: collections.abc.Sequence[str] = ()
class airflow.sdk.BaseOperatorLink

Abstract base class that defines how we get an operator link.

abstract get_link(operator, *, ti_key)

Link to external system.

Parameters:
Returns:

link to external system

Return type:

str

abstract property name: str

Name of the link. This will be the button name on the task UI.

Return type:

str

operators: ClassVar[list[type[airflow.models.baseoperator.BaseOperator]]] = []

This property will be used by Airflow Plugins to find the Operators to which you want to assign this Operator Link

Returns:

List of Operator classes used by task for which you want to create extra link

property xcom_key: str

XCom key with while the whole “link” for this operator link is stored.

On retrieving with this key, the entire link is returned.

Defaults to _link_<class name> if not provided.

Return type:

str

class airflow.sdk.BaseSensorOperator(*, poke_interval=60, timeout=conf.getfloat('sensors', 'default_timeout'), soft_fail=False, mode='poke', exponential_backoff=False, max_wait=None, silent_fail=False, never_fail=False, **kwargs)

Sensor operators are derived from this class and inherit these attributes.

Sensor operators keep executing at a time interval and succeed when a criteria is met and fail if and when they time out.

Parameters:
  • soft_fail (bool) – Set to true to mark the task as SKIPPED on failure. Mutually exclusive with never_fail.

  • poke_interval (datetime.timedelta | float) – Time that the job should wait in between each try. Can be timedelta or float seconds.

  • timeout (datetime.timedelta | float) – Time elapsed before the task times out and fails. Can be timedelta or float seconds. This should not be confused with execution_timeout of the BaseOperator class. timeout measures the time elapsed between the first poke and the current time (taking into account any reschedule delay between each poke), while execution_timeout checks the running time of the task (leaving out any reschedule delay). In case that the mode is poke (see below), both of them are equivalent (as the sensor is never rescheduled), which is not the case in reschedule mode.

  • mode (str) – How the sensor operates. Options are: { poke | reschedule }, default is poke. When set to poke the sensor is taking up a worker slot for its whole execution time and sleeps between pokes. Use this mode if the expected runtime of the sensor is short or if a short poke interval is required. Note that the sensor will hold onto a worker slot and a pool slot for the duration of the sensor’s runtime in this mode. When set to reschedule the sensor task frees the worker slot when the criteria is not yet met and it’s rescheduled at a later time. Use this mode if the time before the criteria is met is expected to be quite long. The poke interval should be more than one minute to prevent too much load on the scheduler.

  • exponential_backoff (bool) – allow progressive longer waits between pokes by using exponential backoff algorithm

  • max_wait (datetime.timedelta | float | None) – maximum wait interval between pokes, can be timedelta or float seconds

  • silent_fail (bool) – If true, and poke method raises an exception different from AirflowSensorTimeout, AirflowTaskTimeout, AirflowSkipException and AirflowFailException, the sensor will log the error and continue its execution. Otherwise, the sensor task fails, and it can be retried based on the provided retries parameter.

  • never_fail (bool) – If true, and poke method raises an exception, sensor will be skipped. Mutually exclusive with soft_fail.

execute(context)

Derive when creating an operator.

The main method to execute the task. Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

Parameters:

context (airflow.sdk.definitions.context.Context)

Return type:

Any

exponential_backoff = False
classmethod get_serialized_fields()

Stringified DAGs and operators contain exactly these fields.

max_wait = None
mode = 'poke'
never_fail = False
poke(context)

Override when deriving this class.

Parameters:

context (airflow.sdk.definitions.context.Context)

Return type:

bool | PokeReturnValue

poke_interval
property reschedule

Define mode rescheduled sensors.

resume_execution(next_method, next_kwargs, context)

Entrypoint method called by the Task Runner (instead of execute) when this task is resumed.

Parameters:
  • next_method (str)

  • next_kwargs (dict[str, Any] | None)

  • context (airflow.sdk.definitions.context.Context)

silent_fail = False
soft_fail = False
timeout: int | float
ui_color: str = '#e6f1f2'
valid_modes: collections.abc.Iterable[str] = ['poke', 'reschedule']
class airflow.sdk.Connection

A connection to an external data source.

Parameters:
  • conn_id – The connection ID.

  • conn_type – The connection type.

  • description – The connection description.

  • host – The host.

  • login – The login.

  • password – The password.

  • schema – The schema.

  • port – The port number.

  • extra – Extra metadata. Non-standard data such as private/SSH keys can be saved here. JSON encoded object.

EXTRA_KEY = '__extra__'
conn_id: str
conn_type: str
description: str | None = None
extra: str | None = None
property extra_dejson: dict

Deserialize extra property to JSON.

Return type:

dict

classmethod get(conn_id)
Parameters:

conn_id (str)

Return type:

Any

get_hook(*, hook_params=None)

Return hook based on conn_type.

get_uri()

Generate and return connection in URI format.

Return type:

str

host: str | None = None
login: str | None = None
password: str | None = None
port: int | None = None
schema: str | None = None
class airflow.sdk.Context

Jinja2 template context for task rendering.

conn: Any
dag_run: airflow.sdk.types.DagRunProtocol
data_interval_end: pendulum.DateTime | None
data_interval_start: pendulum.DateTime | None
ds: str
ds_nodash: str
exception: None | str | BaseException
expanded_ti_count: int | None
inlet_events: airflow.sdk.execution_time.context.InletEventsAccessors
inlets: list
logical_date: pendulum.DateTime
macros: Any
map_index_template: str | None
outlet_events: airflow.sdk.types.OutletEventAccessorsProtocol
outlets: list
params: dict[str, Any]
prev_data_interval_end_success: pendulum.DateTime | None
prev_data_interval_start_success: pendulum.DateTime | None
prev_end_date_success: pendulum.DateTime | None
prev_start_date_success: pendulum.DateTime | None
reason: str | None
run_id: str
start_date: pendulum.DateTime
task: airflow.sdk.bases.operator.BaseOperator | airflow.models.operator.Operator
task_instance: airflow.sdk.types.RuntimeTaskInstanceProtocol
task_instance_key_str: str
task_reschedule_count: int
templates_dict: dict[str, Any] | None
test_mode: bool
ti: airflow.sdk.types.RuntimeTaskInstanceProtocol
triggering_asset_events: Any
try_number: int | None
ts: str
ts_nodash: str
ts_nodash_with_tz: str
var: Any
class airflow.sdk.EdgeModifier(label=None)

Class that represents edge information to be added between two tasks/operators.

Has shorthand factory functions, like Label(“hooray”).

Current implementation supports

t1 >> Label(“Success route”) >> t2 t2 << Label(“Success route”) << t2

Note that due to the potential for use in either direction, this waits to make the actual connection between both sides until both are declared, and will do so progressively if multiple ups/downs are added.

This and EdgeInfo are related - an EdgeModifier is the Python object you use to add information to (potentially multiple) edges, and EdgeInfo is the representation of the information for one specific edge.

Parameters:

label (str | None)

add_edge_info(dag, upstream_id, downstream_id)

Add or update task info on the DAG for this specific pair of tasks.

Called either from our relationship trigger methods above, or directly by set_upstream/set_downstream in operators.

Parameters:
  • dag (airflow.sdk.definitions.dag.DAG)

  • upstream_id (str)

  • downstream_id (str)

label = None
property leaves

List of leaf nodes – ones with only upstream dependencies.

a.k.a. the “end” of this sub-graph

property roots

List of root nodes – ones with no upstream dependencies.

a.k.a. the “start” of this sub-graph

set_downstream(other, edge_modifier=None)

Set the given task/list onto the downstream attribute, then attempt to resolve the relationship.

Providing this also provides >> via DependencyMixin.

Parameters:
  • other (airflow.sdk.definitions._internal.mixins.DependencyMixin | collections.abc.Sequence[airflow.sdk.definitions._internal.mixins.DependencyMixin])

  • edge_modifier (EdgeModifier | None)

set_upstream(other, edge_modifier=None)

Set the given task/list onto the upstream attribute, then attempt to resolve the relationship.

Providing this also provides << via DependencyMixin.

Parameters:
  • other (airflow.sdk.definitions._internal.mixins.DependencyMixin | collections.abc.Sequence[airflow.sdk.definitions._internal.mixins.DependencyMixin])

  • edge_modifier (EdgeModifier | None)

update_relative(other, upstream=True, edge_modifier=None)

Update relative if we’re not the “main” side of a relationship; still run the same logic.

Parameters:
  • other (airflow.sdk.definitions._internal.mixins.DependencyMixin)

  • upstream (bool)

  • edge_modifier (EdgeModifier | None)

Return type:

None

airflow.sdk.Label(label)

Create an EdgeModifier that sets a human-readable label on the edge.

Parameters:

label (str)

class airflow.sdk.Metadata

Metadata to attach to an AssetEvent.

alias: airflow.sdk.definitions.asset.AssetAlias | None = None
extra: dict[str, Any]
class airflow.sdk.ObjectStoragePath(*args, protocol=None, **storage_options)

A path-like object for object storage.

Parameters:
  • protocol (str | None)

  • storage_options (Any)

__version__: ClassVar[int] = 1
property bucket: str
Return type:

str

checksum()

Return the checksum of the file at this path.

Return type:

int

property container: str
Return type:

str

copy(dst, recursive=False, **kwargs)

Copy file(s) from this path to another location.

For remote to remote copies, the key used for the destination will be the same as the source. So that s3://src_bucket/foo/bar will be copied to gcs://dst_bucket/foo/bar and not gcs://dst_bucket/bar.

Parameters:
  • dst (str | ObjectStoragePath) – Destination path

  • recursive (bool) – If True, copy directories recursively.

Return type:

None

kwargs: Additional keyword arguments to be passed to the underlying implementation.

classmethod cwd()

Return a new path pointing to the current working directory (as returned by os.getcwd()).

classmethod deserialize(data, version)
Parameters:
  • data (dict)

  • version (int)

Return type:

ObjectStoragePath

classmethod home()

Return a new path pointing to the user’s home directory (as returned by os.path.expanduser(‘~’)).

property key: str
Return type:

str

move(path, recursive=False, **kwargs)

Move file(s) from this path to another location.

Parameters:
  • path (str | ObjectStoragePath) – Destination path

  • recursive (bool) – bool If True, move directories recursively.

Return type:

None

kwargs: Additional keyword arguments to be passed to the underlying implementation.

property namespace: str
Return type:

str

open(mode='r', **kwargs)

Open the file pointed to by this path.

read_block(offset, length, delimiter=None)

Read a block of bytes.

Starting at offset of the file, read length bytes. If delimiter is set then we ensure that the read starts and stops at delimiter boundaries that follow the locations offset and offset + length. If offset is zero then we start at zero. The bytestring returned WILL include the end delimiter string.

If offset+length is beyond the eof, reads to eof.

Parameters:
  • offset (int) – int Byte offset to start read

  • length (int) – int Number of bytes to read. If None, read to the end.

  • delimiter – bytes (optional) Ensure reading starts and stops at delimiter bytestring

Examples

# Read the first 13 bytes (no delimiter)
>>> read_block(0, 13)
b'Alice, 100\nBo'

# Read first 13 bytes, but force newline boundaries
>>> read_block(0, 13, delimiter=b"\n")
b'Alice, 100\nBob, 200\n'

# Read until EOF, but only stop at newline
>>> read_block(0, None, delimiter=b"\n")
b'Alice, 100\nBob, 200\nCharlie, 300'

See Also

fsspec.utils.read_block()

replace(target)

Rename this path to the target path, overwriting if that path exists.

The target path may be absolute or relative. Relative paths are interpreted relative to the current working directory, not the directory of the Path object.

Returns the new Path instance pointing to the target path.

Return type:

ObjectStoragePath

root_marker: ClassVar[str] = '/'
samefile(other_path)

Return whether other_path is the same or not as this file.

Parameters:

other_path (Any)

Return type:

bool

samestore(other)
Parameters:

other (Any)

Return type:

bool

sep: ClassVar[str] = '/'
serialize()
Return type:

dict[str, Any]

sign(expiration=100, **kwargs)

Create a signed URL representing the given path.

Some implementations allow temporary URLs to be generated, as a way of delegating credentials.

Parameters:
  • path – str The path on the filesystem

  • expiration (int) – int Number of seconds to enable the URL for (if supported)

Returns URL:

str The signed URL

Raises:

NotImplementedError – if the method is not implemented for a store

size()

Size in bytes of the file at this path.

Return type:

int

stat()

Call stat and return the result.

Return type:

airflow.sdk.io.stat.stat_result

ukey()

Hash of file properties, to tell if it has changed.

Return type:

str

class airflow.sdk.Param(default=NOTSET, description=None, **kwargs)

Class to hold the default value of a Param and rule set to do the validations.

Without the rule set it always validates and returns the default value.

Parameters:
  • default (Any) – The value this Param object holds

  • description (str | None) – Optional help text for the Param

  • schema – The validation schema of the Param, if not given then all kwargs except default & description will form the schema

CLASS_IDENTIFIER = '__class'
__version__: ClassVar[int] = 1
description = None
static deserialize(data, version)
Parameters:
  • data (dict[str, Any])

  • version (int)

Return type:

Param

dump()

Dump the Param as a dictionary.

Return type:

dict

property has_value: bool
Return type:

bool

resolve(value=NOTSET, suppress_exception=False)

Run the validations and returns the Param’s final value.

May raise ValueError on failed validations, or TypeError if no value is passed and no value already exists. We first check that value is json-serializable; if not, warn. In future release we will require the value to be json-serializable.

Parameters:
  • value (Any) – The value to be updated for the Param

  • suppress_exception (bool) – To raise an exception or not when the validations fails. If true and validations fails, the return value would be None.

Return type:

Any

schema
serialize()
Return type:

dict

value
class airflow.sdk.PokeReturnValue(is_done, xcom_value=None)

Optional return value for poke methods.

Sensors can optionally return an instance of the PokeReturnValue class in the poke method. If an XCom value is supplied when the sensor is done, then the XCom value will be pushed through the operator return value. :param is_done: Set to true to indicate the sensor can stop poking. :param xcom_value: An optional XCOM value to be returned by the operator.

Parameters:
  • is_done (bool)

  • xcom_value (Any | None)

is_done
xcom_value = None
class airflow.sdk.Variable

A generic way to store and retrieve arbitrary content or settings as a simple key/value store.

Parameters:
  • key – The variable key.

  • value – The variable value.

  • description – The variable description.

classmethod delete(key)
Parameters:

key (str)

Return type:

None

description: str | None = None
classmethod get(key, default=NOTSET, deserialize_json=False)
Parameters:
  • key (str)

  • default (Any)

  • deserialize_json (bool)

key: str
classmethod set(key, value, description=None, serialize_json=False)
Parameters:
  • key (str)

  • value (Any)

  • description (str | None)

  • serialize_json (bool)

Return type:

None

value: Any | None = None
airflow.sdk.__version__: str
airflow.sdk.chain(*tasks)

Given a number of tasks, builds a dependency chain.

This function accepts values of BaseOperator (aka tasks), EdgeModifiers (aka Labels), XComArg, TaskGroups, or lists containing any mix of these types (or a mix in the same list). If you want to chain between two lists you must ensure they have the same length.

Using classic operators/sensors:

chain(t1, [t2, t3], [t4, t5], t6)

is equivalent to:

  / -> t2 -> t4 \
t1               -> t6
  \ -> t3 -> t5 /
t1.set_downstream(t2)
t1.set_downstream(t3)
t2.set_downstream(t4)
t3.set_downstream(t5)
t4.set_downstream(t6)
t5.set_downstream(t6)

Using task-decorated functions aka XComArgs:

chain(x1(), [x2(), x3()], [x4(), x5()], x6())

is equivalent to:

  / -> x2 -> x4 \
x1               -> x6
  \ -> x3 -> x5 /
x1 = x1()
x2 = x2()
x3 = x3()
x4 = x4()
x5 = x5()
x6 = x6()
x1.set_downstream(x2)
x1.set_downstream(x3)
x2.set_downstream(x4)
x3.set_downstream(x5)
x4.set_downstream(x6)
x5.set_downstream(x6)

Using TaskGroups:

chain(t1, task_group1, task_group2, t2)

t1.set_downstream(task_group1)
task_group1.set_downstream(task_group2)
task_group2.set_downstream(t2)

It is also possible to mix between classic operator/sensor, EdgeModifiers, XComArg, and TaskGroups:

chain(t1, [Label("branch one"), Label("branch two")], [x1(), x2()], task_group1, x3())

is equivalent to:

  / "branch one" -> x1 \
t1                      -> task_group1 -> x3
  \ "branch two" -> x2 /
x1 = x1()
x2 = x2()
x3 = x3()
label1 = Label("branch one")
label2 = Label("branch two")
t1.set_downstream(label1)
label1.set_downstream(x1)
t2.set_downstream(label2)
label2.set_downstream(x2)
x1.set_downstream(task_group1)
x2.set_downstream(task_group1)
task_group1.set_downstream(x3)

# or

x1 = x1()
x2 = x2()
x3 = x3()
t1.set_downstream(x1, edge_modifier=Label("branch one"))
t1.set_downstream(x2, edge_modifier=Label("branch two"))
x1.set_downstream(task_group1)
x2.set_downstream(task_group1)
task_group1.set_downstream(x3)
Parameters:

tasks (airflow.sdk.definitions._internal.abstractoperator.DependencyMixin | collections.abc.Sequence[airflow.sdk.definitions._internal.abstractoperator.DependencyMixin]) – Individual and/or list of tasks, EdgeModifiers, XComArgs, or TaskGroups to set dependencies

Return type:

None

airflow.sdk.chain_linear(*elements)

Simplify task dependency definition.

E.g.: suppose you want precedence like so:

    ╭─op2─╮ ╭─op4─╮
op1─┤     ├─├─op5─┤─op7
    ╰-op3─╯ ╰-op6─╯

Then you can accomplish like so:

chain_linear(op1, [op2, op3], [op4, op5, op6], op7)
Parameters:

elements (airflow.sdk.definitions._internal.abstractoperator.DependencyMixin | collections.abc.Sequence[airflow.sdk.definitions._internal.abstractoperator.DependencyMixin]) – a list of operators / lists of operators

airflow.sdk.cross_downstream(from_tasks, to_tasks)

Set downstream dependencies for all tasks in from_tasks to all tasks in to_tasks.

Using classic operators/sensors:

cross_downstream(from_tasks=[t1, t2, t3], to_tasks=[t4, t5, t6])

is equivalent to:

t1 ---> t4
   \ /
t2 -X -> t5
   / \
t3 ---> t6
t1.set_downstream(t4)
t1.set_downstream(t5)
t1.set_downstream(t6)
t2.set_downstream(t4)
t2.set_downstream(t5)
t2.set_downstream(t6)
t3.set_downstream(t4)
t3.set_downstream(t5)
t3.set_downstream(t6)

Using task-decorated functions aka XComArgs:

cross_downstream(from_tasks=[x1(), x2(), x3()], to_tasks=[x4(), x5(), x6()])

is equivalent to:

x1 ---> x4
   \ /
x2 -X -> x5
   / \
x3 ---> x6
x1 = x1()
x2 = x2()
x3 = x3()
x4 = x4()
x5 = x5()
x6 = x6()
x1.set_downstream(x4)
x1.set_downstream(x5)
x1.set_downstream(x6)
x2.set_downstream(x4)
x2.set_downstream(x5)
x2.set_downstream(x6)
x3.set_downstream(x4)
x3.set_downstream(x5)
x3.set_downstream(x6)

It is also possible to mix between classic operator/sensor and XComArg tasks:

cross_downstream(from_tasks=[t1, x2(), t3], to_tasks=[x1(), t2, x3()])

is equivalent to:

t1 ---> x1
   \ /
x2 -X -> t2
   / \
t3 ---> x3
x1 = x1()
x2 = x2()
x3 = x3()
t1.set_downstream(x1)
t1.set_downstream(t2)
t1.set_downstream(x3)
x2.set_downstream(x1)
x2.set_downstream(t2)
x2.set_downstream(x3)
t3.set_downstream(x1)
t3.set_downstream(t2)
t3.set_downstream(x3)
Parameters:
  • from_tasks (collections.abc.Sequence[airflow.sdk.definitions._internal.abstractoperator.DependencyMixin]) – List of tasks or XComArgs to start from.

  • to_tasks (airflow.sdk.definitions._internal.abstractoperator.DependencyMixin | collections.abc.Sequence[airflow.sdk.definitions._internal.abstractoperator.DependencyMixin]) – List of tasks or XComArgs to set as downstream dependencies.

airflow.sdk.literal(value)

Wrap a value to ensure it is rendered as-is without applying Jinja templating to its contents.

Designed for use in an operator’s template field.

Parameters:

value (Any) – The value to be rendered without templating

Return type:

airflow.sdk.definitions._internal.templater.LiteralValue

airflow.sdk.setup: collections.abc.Callable
Parameters:

func (Callable)

Return type:

Callable

airflow.sdk.task: TaskDecoratorCollection
airflow.sdk.task_group(group_id: str | None = None, prefix_group_id: bool = True, parent_group: airflow.sdk.definitions.taskgroup.TaskGroup | None = None, dag: airflow.sdk.definitions.dag.DAG | None = None, default_args: dict[str, Any] | None = None, tooltip: str = '', ui_color: str = 'CornflowerBlue', ui_fgcolor: str = '#000', add_suffix_on_collision: bool = False, group_display_name: str = '') collections.abc.Callable[[collections.abc.Callable[FParams, FReturn]], _TaskGroupFactory[FParams, FReturn]]

Python TaskGroup decorator.

This wraps a function into an Airflow TaskGroup. When used as the @task_group() form, all arguments are forwarded to the underlying TaskGroup class. Can be used to parameterize TaskGroup.

Parameters:
  • python_callable – Function to decorate.

  • tg_kwargs – Keyword arguments for the TaskGroup object.

airflow.sdk.teardown: collections.abc.Callable
Parameters:

on_failure_fail_dagrun (bool)

Return type:

Callable

Was this entry helpful?