Audit Logs in Airflow¶
Understanding Audit Logs¶
Audit logs serve as the historical record of an Airflow system, documenting who performed what actions and when they occurred. These logs are essential for maintaining system integrity, meeting compliance requirements, and conducting forensic analysis when issues arise.
In essence, audit logs answer three fundamental questions:
Who: Which user or system component initiated an action
What: The specific operation that was performed
When: The precise timestamp of the event
The primary purposes of audit logs include:
Regulatory Compliance: Meeting requirements for data governance and audit trails
Security Monitoring: Detecting unauthorized access or suspicious activities
Operational Troubleshooting: Understanding the sequence of events leading to system issues
Change Management: Tracking modifications to critical system components
Note
Access to audit logs requires the Audit Logs.can_read
permission. Users with this permission can view all audit entries regardless of their DAG-specific access rights.
Understanding Event Logs¶
Event logs represent the operational heartbeat of an Airflow system. Unlike audit logs, which focus on accountability and compliance, event logs capture the technical details of system behavior, application performance, and operational metrics.
Event logs serve several critical functions:
Debugging and Troubleshooting: Providing detailed error messages and stack traces
Performance Monitoring: Recording execution times, resource usage, and system metrics
Operational Insights: Tracking system health, component interactions, and workflow execution
Development Support: Offering detailed information for code debugging and optimization
Event logs are typically stored in log files or external logging systems and include information such as:
Task execution details and output
System errors and warnings
Performance metrics and timing information
Component startup and shutdown events
Resource utilization data
Audit Logs vs Event Logs¶
While both logging systems are crucial for system management, they serve distinct purposes and audiences:
Characteristic |
Audit Logs |
Event Logs |
---|---|---|
Primary Purpose |
Accountability and compliance tracking |
Operational monitoring and system debugging |
Target Audience |
Security teams, auditors, compliance officers |
Developers, system administrators, operations teams |
Content Focus |
User actions and administrative changes |
System behavior, errors, and performance data |
Storage Location |
Structured database table ( |
Log files, external logging systems |
Retention Requirements |
Long-term (months to years for compliance) |
Short to medium-term (days to weeks) |
Query Patterns |
“Who modified this configuration?” |
“Why did this task execution fail?” |
Accessing Audit Logs¶
Airflow provides multiple interfaces for accessing audit log data, each suited to different use cases and technical requirements:
- Web User Interface
The Airflow web interface provides the most accessible method for viewing audit logs. Navigate to Browse → Audit Logs to access an interface with built-in filtering, sorting, and search capabilities. This interface is ideal for ad-hoc investigations and routine monitoring.
- REST API Integration
For programmatic access and system integration, use the
/eventLogs
REST API endpoint. This approach enables automated monitoring, integration with external security tools, and custom reporting applications.- Direct Database Access
Advanced users can query the
log
table directly using SQL. This method provides maximum flexibility for complex queries, custom reporting, and integration with business intelligence tools.
Scope of Audit Logging¶
Airflow’s audit logging system captures events across three distinct operational domains:
- User-Initiated Actions
These events occur when users interact with Airflow through any interface (web UI, REST API, or command-line tools). Examples include:
Manual DAG run triggers and modifications
Configuration changes to variables, connections, and pools
Task instance state modifications (clearing, marking as success/failed)
Administrative operations and user management activities
- System-Generated Events
These events are automatically created by Airflow’s internal processes during normal operation:
Task lifecycle state transitions (queued, running, success, failed)
System monitoring events (heartbeat timeouts, external state changes)
Automatic recovery operations (task rescheduling, retry attempts)
Resource management activities
- Command-Line Interface Operations
These events capture activities performed through Airflow’s CLI tools:
Direct task execution commands
DAG management operations
System administration and maintenance tasks
Automated script executions
Common Audit Log Scenarios¶
To facilitate audit log analysis, here are some frequently encountered scenarios and their corresponding queries:
“Who triggered this DAG?”
SELECT dttm, owner, extra
FROM log
WHERE event = 'trigger_dag_run' AND dag_id = 'example_dag'
ORDER BY dttm DESC;
“What happened to this failed task?”
SELECT dttm, event, owner, extra
FROM log
WHERE dag_id = 'example_dag' AND task_id = 'example_task'
ORDER BY dttm DESC;
“Who changed variables recently?”
SELECT dttm, event, owner, extra
FROM log
WHERE event LIKE '%variable%'
ORDER BY dttm DESC LIMIT 20;
Event Catalog¶
The following section provides a complete reference of all events tracked by Airflow’s audit logging system. Understanding these event types will help interpret audit logs and construct effective queries for specific use cases.
Task Instance Events¶
System-generated task events:
running
: Task instance started executionsuccess
: Task instance completed successfullyfailed
: Task instance failed during executionskipped
: Task instance was skippedupstream_failed
: Task instance failed due to upstream failureup_for_retry
: Task instance is scheduled for retryup_for_reschedule
: Task instance is rescheduledqueued
: Task instance is queued for executionscheduled
: Task instance is scheduleddeferred
: Task instance is deferred (waiting for trigger)restarting
: Task instance is restartingremoved
: Task instance was removed
System monitoring events:
heartbeat timeout
: Task instance stopped sending heartbeats and will be terminatedstate mismatch
: Task instance state changed externally (outside of Airflow)stuck in queued reschedule
: Task instance was stuck in queued state and rescheduledstuck in queued tries exceeded
: Task instance exceeded maximum requeue attempts
User-initiated task events:
fail task
: User manually marked task as failedskip task
: User manually marked task as skippedaction_set_failed
: User set task instance as failed through UI/APIaction_set_success
: User set task instance as successful through UI/APIaction_set_retry
: User set task instance to retryaction_set_skipped
: User set task instance as skippedaction_set_running
: User set task instance as runningaction_clear
: User cleared task instance state
User Action Events¶
DAG operations:
trigger_dag_run
: User triggered a DAG rundelete_dag_run
: User deleted a DAG runpatch_dag_run
: User modified a DAG runclear_dag_run
: User cleared a DAG runget_dag_run
: User retrieved DAG run informationget_dag_runs_batch
: User retrieved multiple DAG runspost_dag_run
: User created a DAG runpatch_dag
: User modified DAG configurationget_dag
: User retrieved DAG informationget_dags
: User retrieved multiple DAGsdelete_dag
: User deleted a DAG
Task instance operations:
post_clear_task_instances
: User cleared task instancespatch_task_instance
: User modified a task instanceget_task_instances_batch
: User retrieved task instance informationdelete_task_instance
: User deleted a task instanceget_task_instance
: User retrieved single task instance informationget_task_instance_tries
: User retrieved task instance retry informationpatch_task_instances_batch
: User modified multiple task instances
Variable operations:
delete_variable
: User deleted a variablepatch_variable
: User modified a variablepost_variable
: User created a variablebulk_variables
: User performed bulk variable operations
Connection operations:
delete_connection
: User deleted a connectionpost_connection
: User created a connectionpatch_connection
: User modified a connectionbulk_connections
: User performed bulk connection operationscreate_default_connections
: User created default connections
Pool operations:
get_pool
: User retrieved pool informationget_pools
: User retrieved multiple poolspost_pool
: User created a poolpatch_pool
: User modified a pooldelete_pool
: User deleted a poolbulk_pools
: User performed bulk pool operations
Asset operations:
get_asset
: User retrieved asset informationget_assets
: User retrieved multiple assetsget_asset_alias
: User retrieved asset alias informationget_asset_aliases
: User retrieved multiple asset aliasespost_asset_events
: User created asset eventsget_asset_events
: User retrieved asset eventsmaterialize_asset
: User triggered asset materializationget_asset_queued_events
: User retrieved queued asset eventsdelete_asset_queued_events
: User deleted queued asset eventsget_dag_asset_queued_events
: User retrieved DAG asset queued eventsdelete_dag_asset_queued_events
: User deleted DAG asset queued eventsget_dag_asset_queued_event
: User retrieved specific DAG asset queued eventdelete_dag_asset_queued_event
: User deleted specific DAG asset queued event
Backfill operations:
get_backfill
: User retrieved backfill informationget_backfills
: User retrieved multiple backfillspost_backfill
: User created a backfillpause_backfill
: User paused a backfillunpause_backfill
: User unpaused a backfillcancel_backfill
: User cancelled a backfillcreate_backfill_dry_run
: User performed backfill dry run
User and Role Management:
get_user
: User retrieved user informationget_users
: User retrieved multiple userspost_user
: User created a user accountpatch_user
: User modified a user accountdelete_user
: User deleted a user accountget_role
: User retrieved role informationget_roles
: User retrieved multiple rolespost_role
: User created a rolepatch_role
: User modified a roledelete_role
: User deleted a role
CLI Events¶
DAG Management Commands:
cli_dags_list
: List all DAGs in the systemcli_dags_show
: Display DAG information and structurecli_dags_state
: Check the state of a DAG runcli_dags_next_execution
: Show next execution time for a DAGcli_dags_trigger
: Trigger a DAG run from command linecli_dags_delete
: Delete a DAG and its metadatacli_dags_pause
: Pause a DAGcli_dags_unpause
: Unpause a DAGcli_dags_backfill
: Backfill DAG runs for a date rangecli_dags_test
: Test a DAG without affecting the database
Task Management Commands:
cli_tasks_list
: List tasks for a specific DAGcli_tasks_run
: Execute a specific task instancecli_tasks_test
: Test a task without affecting the databasecli_tasks_state
: Check the state of a task instancecli_tasks_failed_deps
: Show failed dependencies for a taskcli_tasks_render
: Render task templatescli_tasks_clear
: Clear task instance state
Database and System Commands:
cli_db_init
: Initialize the Airflow databasecli_db_upgrade
: Upgrade the database schemacli_db_reset
: Reset the database (dangerous operation)cli_db_shell
: Open database shellcli_db_check
: Check database connectivity and schemacli_db_migrate
: Migrate database schema (legacy command)cli_migratedb
: Legacy database migration commandcli_initdb
: Legacy database initialization commandcli_resetdb
: Legacy database reset commandcli_upgradedb
: Legacy database upgrade command
User and Security Commands:
cli_users_create
: Create a new user accountcli_users_delete
: Delete a user accountcli_users_list
: List all users in the systemcli_users_add_role
: Add role to a usercli_users_remove_role
: Remove role from a user
Configuration and Variable Commands:
cli_variables_get
: Retrieve variable valuecli_variables_set
: Set variable valuecli_variables_delete
: Delete a variablecli_variables_list
: List all variablescli_variables_import
: Import variables from filecli_variables_export
: Export variables to file
Connection Management Commands:
cli_connections_get
: Retrieve connection detailscli_connections_add
: Add a new connectioncli_connections_delete
: Delete a connectioncli_connections_list
: List all connectionscli_connections_import
: Import connections from filecli_connections_export
: Export connections to file
Pool Management Commands:
cli_pools_get
: Get pool informationcli_pools_set
: Create or update a poolcli_pools_delete
: Delete a poolcli_pools_list
: List all poolscli_pools_import
: Import pools from filecli_pools_export
: Export pools to file
Service and Process Commands:
cli_webserver
: Start the Airflow webservercli_scheduler
: Start the Airflow schedulercli_worker
: Start a Celery workercli_flower
: Start Flower monitoring toolcli_triggerer
: Start the triggerer processcli_standalone
: Start Airflow in standalone modecli_api_server
: Start the Airflow API servercli_dag_processor
: Start the DAG processor servicecli_celery_worker
: Start Celery worker (alternative command)cli_celery_flower
: Start Celery Flower (alternative command)
Maintenance and Utility Commands:
cli_cheat_sheet
: Display CLI command referencecli_version
: Show Airflow version informationcli_info
: Display system informationcli_config_get_value
: Get configuration valuecli_config_list
: List configuration optionscli_plugins
: List installed pluginscli_rotate_fernet_key
: Rotate Fernet encryption keycli_sync_perm
: Synchronize permissionscli_shell
: Start interactive Python shellcli_kerberos
: Start Kerberos ticket renewer
Testing and Development Commands:
cli_test
: Run testscli_render
: Render templatescli_dag_deps
: Show DAG dependenciescli_task_deps
: Show task dependencies
Legacy Commands:
cli_run
: Legacy task run commandcli_backfill
: Legacy backfill commandcli_clear
: Legacy clear commandcli_list_dags
: Legacy DAG list commandcli_list_tasks
: Legacy task list commandcli_pause
: Legacy pause commandcli_unpause
: Legacy unpause commandcli_trigger_dag
: Legacy DAG trigger command
Each CLI command audit log entry includes:
User identification: Who executed the command
Command details: Full command with arguments
Execution context: Working directory, environment variables
Timestamp: When the command was executed
Exit status: Success or failure indication
Custom Events¶
Airflow allows creating custom audit log entries programmatically:
from airflow.models.log import Log
from airflow.utils.session import provide_session
@provide_session
def log_custom_event(session=None):
log_entry = Log(event="custom_event", owner="username", extra="Additional context information")
session.add(log_entry)
session.commit()
Anatomy of an Audit Log Entry¶
Each audit log record contains structured information that provides a complete picture of the logged event. Understanding these fields is essential for effective log analysis:
Field Name |
Description and Usage |
---|---|
|
Timestamp indicating when the event occurred (UTC timezone) |
|
Descriptive name of the action or event (e.g., |
|
Identity of the actor: username for user actions, “airflow” for system events |
|
Identifier of the affected DAG (when applicable) |
|
Identifier of the affected task (when applicable) |
|
Specific DAG run identifier for tracking execution instances |
|
Attempt number for task retries and re-executions |
|
Index for dynamically mapped tasks |
|
Logical execution date of the DAG run |
|
JSON-formatted additional context (parameters, error details, etc.) |
Audit Log Query Methods¶
Effective audit log analysis requires understanding the various methods available for querying and retrieving log data. Each method has its strengths and is suited to different scenarios:
REST API Examples:
# Get all audit logs
curl -X GET "http://localhost:8080/api/v1/eventLogs"
# Filter by event type
curl -X GET "http://localhost:8080/api/v1/eventLogs?event=trigger_dag_run"
# Filter by DAG
curl -X GET "http://localhost:8080/api/v1/eventLogs?dag_id=example_dag"
# Filter by date range
curl -X GET "http://localhost:8080/api/v1/eventLogs?after=2024-01-01T00:00:00Z&before=2024-12-31T23:59:59Z"
Database Query Examples:
-- Get recent user actions
SELECT dttm, event, owner, dag_id, task_id, extra
FROM log
WHERE owner IS NOT NULL
ORDER BY dttm DESC
LIMIT 100;
-- Get task failure events
SELECT dttm, dag_id, task_id, run_id, extra
FROM log
WHERE event = 'failed'
ORDER BY dttm DESC;
-- Get user actions on specific DAG
SELECT dttm, event, owner, extra
FROM log
WHERE dag_id = 'example_dag' AND owner IS NOT NULL
ORDER BY dttm DESC;
Querying Event Logs¶
Event logs (operational logs) are typically accessed through different methods depending on the logging configuration:
Log Files:
# View scheduler logs
tail -f $AIRFLOW_HOME/logs/scheduler/latest/*.log
# View webserver logs
tail -f $AIRFLOW_HOME/logs/webserver/webserver.log
# View task logs for specific DAG run
cat $AIRFLOW_HOME/logs/dag_id/task_id/2024-01-01T00:00:00+00:00/1.log
REST API for Task Logs:
# Get task instance logs
curl -X GET "http://localhost:8080/api/v1/dags/{dag_id}/dagRuns/{dag_run_id}/taskInstances/{task_id}/logs/{try_number}"
# Get task logs with metadata
curl -X GET "http://localhost:8080/api/v1/dags/example_dag/dagRuns/2024-01-01T00:00:00+00:00/taskInstances/example_task/logs/1?full_content=true"
Python Logging Integration:
import logging
from airflow.utils.log.logging_mixin import LoggingMixin
class MyOperator(BaseOperator, LoggingMixin):
def execute(self, context):
# These will appear in event logs
self.log.info("Task started")
self.log.warning("Warning message")
self.log.error("Error occurred")
External Logging Systems:
When using external logging systems (e.g., ELK stack, Splunk, CloudWatch):
# Example Elasticsearch query
curl -X GET "elasticsearch:9200/airflow-*/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"bool": {
"must": [
{"match": {"dag_id": "example_dag"}},
{"range": {"@timestamp": {"gte": "2024-01-01", "lte": "2024-01-31"}}}
]
}
}
}'
Practical Query Examples¶
The following examples demonstrate practical applications of audit log queries for common operational and security scenarios. These queries serve as templates that can be adapted for specific requirements:
Security Investigation
-- Find all actions by a specific user in the last 24 hours
SELECT dttm, event, dag_id, task_id, extra
FROM log
WHERE owner = 'suspicious_user'
AND dttm > NOW() - INTERVAL '24 hours'
ORDER BY dttm DESC;
Compliance Reporting
-- Get all variable and connection changes for audit report
SELECT dttm, event, owner, extra
FROM log
WHERE event IN ('post_variable', 'patch_variable', 'delete_variable',
'post_connection', 'patch_connection', 'delete_connection')
AND dttm BETWEEN '2024-01-01' AND '2024-01-31'
ORDER BY dttm;
Troubleshooting DAG Issues
-- See all events for a problematic DAG run
SELECT dttm, event, task_id, owner, extra
FROM log
WHERE dag_id = 'example_dag'
AND run_id = '2024-01-15T10:00:00+00:00'
ORDER BY dttm;