Task and Asset Store Cleanup

Added in version 3.3.

Airflow does not automatically purge task store rows on a schedule. Cleanup (also known as “garbage collection”) is the responsibility of the user (you) and must be triggered explicitly via the CLI. This page explains what gets cleaned up, how to run it, and how to integrate it into a recurring maintenance workflow.

What gets cleaned up

The cleanup command operates only on task store rows in the MetastoreStoreBackend. Asset store rows are never touched by this command. Asset store rows are removed only by the orphan sweep when an asset is deactivated (see Task and Asset Store Configuration).

A task store row is eligible for deletion when its expires_at timestamp is in the past. expires_at is computed on the worker at write time:

  • Keys written with an explicit retention=timedelta(...) expire after that duration from the time of the write.

  • Keys written with retention=None (the default) pick up an expiry based on [state_store] default_retention_days. If that value is > 0, the key expires that many days after the write.

  • Keys written with retention=NEVER_EXPIRE have expires_at = NULL and a flag that marks them as permanent. They are never deleted by this command regardless of configuration.

If [state_store] default_retention_days = 0, keys written without an explicit retention have expires_at = NULL (no expiry) and are also skipped. Only keys with a non-null, past expires_at are removed.

Note

Custom backends ([state_store] backend set to anything other than the default) are explicitly skipped. The cleanup command prints a message and exits cleanly without deleting anything. If your custom backend needs its own retention logic, implement it in BaseStoreBackend.cleanup() and call it from your own maintenance process.

Running cleanup

The command is:

airflow state-store cleanup-task-store

It reads [state_store] default_retention_days and [state_store] state_cleanup_batch_size from the airflow.cfg file, then deletes all eligible rows.

Dry run

Use --dry-run to preview what would be deleted without removing anything:

airflow state-store cleanup-task-store --dry-run

The output lists every row that would be deleted, grouped by dag, run, task, map index, and key.

Batching

By default (state_cleanup_batch_size = 0) all eligible rows are deleted in a single statement. On deployments with large task_store tables, set a batch size to reduce lock duration per transaction:

# airflow.cfg
[state_store]
state_cleanup_batch_size = 10000

The command then deletes rows in batches of 10,000, committing after each batch, until no eligible rows remain.

How often to run cleanup depends on your write volume and the value of default_retention_days. A weekly cleanup may be sufficient for most environments. For high-throughput pipelines that write task store entries on every task execution, consider running cleanup more frequently to keep the task_store table small.

Was this entry helpful?