Troubleshooting

Obscure task failures

Task state changed externally

There are many potential causes for a task’s state to be changed by a component other than the executor, which might cause some confusion when reviewing task instance or scheduler logs.

Below are some example scenarios that could cause a task’s state to change by a component other than the executor:

LocalTaskJob killed

Sometimes, Airflow or some adjacent system will kill a task instance’s LocalTaskJob, causing the task instance to fail.

Here are some examples that could cause such an event:

  • A DAG run timeout, specified by dagrun_timeout in the DAG’s definition.

  • An Airflow worker running out of memory - Usually, Airflow workers that run out of memory receive a SIGKILL and are marked as a zombie and failed by the scheduler. However, in some scenarios, Airflow kills the task before that happens.

Was this entry helpful?