Listener Plugin of Airflow¶
Airflow has feature that allows to add listener for monitoring and tracking the task state using Plugins.
This is a simple example listener plugin of Airflow that helps to track the task state and collect useful metadata information about the task, dag run and dag.
This is an example plugin for Airflow that allows to create listener plugin of Airflow. This plugin works by using SQLAlchemy’s event mechanism. It watches the task instance state change in the table level and triggers event. This will be notified for all the tasks across all the DAGs.
In this plugin, an object reference is derived from the base class
airflow.plugins_manager.AirflowPlugin
.
Listener plugin uses pluggy app under the hood. Pluggy is an app built for plugin management and hook calling for Pytest. Pluggy enables function hooking so it allows building “pluggable” systems with your own customization over that hooking.
- Using this plugin, following events can be listened:
task instance is in running state.
task instance is in success state.
task instance is in failure state.
dag run is in running state.
dag run is in success state.
dag run is in failure state.
on start before event like airflow job, scheduler
before stop for event like airflow job, scheduler
Listener Registration¶
A listener plugin with object reference to listener object is registered as part of airflow plugin. The following is a skeleton for us to implement a new listener:
from airflow.plugins_manager import AirflowPlugin
# This is the listener file created where custom code to monitor is added over hookimpl
import listener
class MetadataCollectionPlugin(AirflowPlugin):
name = "MetadataCollectionPlugin"
listeners = [listener]
Next, we can check code added into listener
and see implementation
methods for each of those listeners. After the implementation, the listener part
gets executed during all the task execution across all the DAGs
For reference, here’s the plugin code within listener.py
class that shows list of tables in the database:
This example listens when the task instance is in running state
@hookimpl
def on_task_instance_running(previous_state: TaskInstanceState, task_instance: TaskInstance, session):
"""
This method is called when task state changes to RUNNING.
Through callback, parameters like previous_task_state, task_instance object can be accessed.
This will give more information about current task_instance that is running its dag_run,
task and dag information.
"""
print("Task instance is in running state")
print(" Previous state of the Task instance:", previous_state)
state: TaskInstanceState = task_instance.state
name: str = task_instance.task_id
start_date = task_instance.start_date
dagrun = task_instance.dag_run
dagrun_status = dagrun.state
task = task_instance.task
if TYPE_CHECKING:
assert task
dag = task.dag
dag_name = None
if dag:
dag_name = dag.dag_id
print(f"Current task name:{name} state:{state} start_date:{start_date}")
print(f"Dag name:{dag_name} and current dag run status:{dagrun_status}")
Similarly, code to listen after task_instance success and failure can be implemented.
This example listens when the dag run is change to failed state
@hookimpl
def on_dag_run_failed(dag_run: DagRun, msg: str):
"""
This method is called when dag run state changes to FAILED.
"""
print("Dag run in failure state")
dag_id = dag_run.dag_id
run_id = dag_run.run_id
external_trigger = dag_run.external_trigger
print(f"Dag information:{dag_id} Run id: {run_id} external trigger: {external_trigger}")
print(f"Failed with message: {msg}")
Similarly, code to listen after dag_run success and during running state can be implemented.
The listener plugin files required to add the listener implementation is added as part of the
Airflow plugin into $AIRFLOW_HOME/plugins/
folder and loaded during Airflow startup.