Edge Provider Architecture¶
Airflow consist of several components which are connected like in the following diagram. The Edge Worker which is deployed outside of the central Airflow cluster is connected via HTTP(s) to the API server of the Airflow cluster:
Workers - Execute the assigned tasks - most standard setup has local or centralized workers, e.g. via Celery
Edge Workers - Special workers which pull tasks via HTTP(s) as provided as feature via this provider package
Scheduler - Responsible for adding the necessary tasks to the queue. The EdgeExecutor is running as a module inside the scheduler.
API server - HTTP REST API Server provides access to DAG/task status information. The required end-points are provided by the Edge provider plugin. The Edge Worker uses this API to pull tasks and send back the results.
Database - Contains information about the status of tasks, Dags, Variables, connections, etc.
In detail the parts of the Edge provider are deployed as follows:
EdgeExecutor - The EdgeExecutor is running inside the core Airflow scheduler. It is responsible for scheduling tasks and sending them to the Edge job queue in the database. The EdgeExecutor is a subclass of the
airflow.executors.base_executor.BaseExecutor
class. To activate the EdgeExecutor, you need to set theexecutor
configuration option in theairflow.cfg
file toairflow.providers.edge3.executors.EdgeExecutor
. For more details see Edge Executor. Note that also multiple executors can be used in parallel together with the EdgeExecutor.API server - The API server is providing REST endpoints to the web UI as well as serves static files. The Edge provider adds a plugin that provides additional REST API for the Edge Worker as well as UI elements to manage workers (currently Airflow 2.10 only). The API server is responsible for handling requests from the Edge Worker and sending back the results. To activate the API server, you need to set the
api_enabled
configuration option inedge
section in theairflow.cfg
file toTrue
. The API endpoints for edge is not started by default. Fr more details see Edge UI Plugin and REST API.Database - The Airflow meta database is used to store the status of tasks, Dags, Variables, connections etc. The Edge provider uses the database to store the status of the Edge Worker instances and the tasks that are assigned to it. The database is also used to store the results of the tasks that are executed by the Edge Worker. Setup of needed tables and migration is done automatically when the provider package is deployed.
Edge Worker - The Edge Worker is a lightweight process that runs on the edge device. It is responsible for pulling tasks from the API server and executing them. The Edge Worker is a standalone process that can be deployed on any machine that has access to the API server. It is designed to be lightweight and easy to deploy. The Edge Worker is implemented as a command line tool that can be started with the
airflow edge worker
command. For more details see Edge Worker Deployment.
Edge Worker State Model¶
Each Edge Worker is tracked from the API server such that it is known which worker is currently active. This is for monitoring as well as administrators as else it is assumed a distributed monitoring and tracking is hard to achieve. This also allows central management for administrative maintenance.
Workers send regular heartbeats to the API server to indicate that they are still alive. The heartbeats are used to determine the state of the worker.
The following states are used to track the worker:
See also https://github.com/apache/airflow/blob/main/providers/edge3/src/airflow/providers/edge3/models/edge_worker.py#L45 for a documentation of details of all states of the Edge Worker.
Feature Backlog Edge Provider¶
The current version of the EdgeExecutor is released with known limitations. It will mature over time.
The following features are known missing and will be implemented in increments:
API token per worker: Today there is a global API token available only
Edge Worker Plugin
Make plugin working on Airflow 3.0, depending on AIP-68
Overview about queues / jobs per queue
Allow starting Edge Worker REST API separate to api-server
Add some hints how to setup an additional worker
Edge Worker CLI
Use WebSockets instead of HTTP calls for communication
Send logs also to TaskFileHandler if external logging services are used
Integration into telemetry to send metrics from remote site
Publish system metrics with heartbeats (CPU, Disk space, RAM, Load)
Be more liberal e.g. on patch version. Currently requires exact version match (In current state if versions do not match, the worker will gracefully shut down when jobs are completed, no new jobs will be started)
Tests
System tests in Github, test the deployment of the worker with a Dag execution
Test/Support on Windows for Edge Worker
Scaling test - Check and define boundaries of workers/jobs. Today it is known to scale into a range of 50 workers. This is not a hard limit but just an experience reported.
Load tests - impact of scaled execution and code optimization
Incremental logs during task execution can be served w/o shared log disk on api-server
Reduce dependencies during execution: Today the worker depends on the airflow core with a lot of transitive dependencies. Target is to reduce the dependencies to a minimum like TaskSDK and providers only.
Documentation
Provide scripts and guides to install edge components as service (systemd)
Extend Helm-Chart for needed support
Provide an example docker compose for worker setup