SSHRemoteJobOperator¶
Use the SSHRemoteJobOperator to execute
commands on a remote server via SSH as a detached job. This operator is deferrable, meaning it
offloads long-running job monitoring to the triggerer, freeing up worker slots for other tasks.
This operator is designed to be more resilient than SSHOperator
for long-running jobs, especially in environments where network interruptions or worker restarts may occur.
Key Features¶
Deferrable: Offloads job monitoring to the triggerer process
Detached Execution: Starts remote jobs that continue running even if SSH connection drops
Incremental Log Streaming: Tails logs from remote host and displays them in Airflow
Cross-Platform: Supports both POSIX (Linux/macOS) and Windows remote hosts
Resilient: Jobs survive network interruptions and worker restarts
File-based Completion: Uses exit code file for reliable completion detection
When to Use This Operator¶
Use SSHRemoteJobOperator when:
Running long-running jobs (minutes to hours) on remote hosts
Network stability is a concern
You need to see incremental logs as the job progresses
The remote job should survive temporary disconnections
You want to free up worker slots during job execution
Use the traditional SSHOperator when:
Running short commands (seconds)
You need bidirectional communication during execution
The command requires an interactive TTY
How It Works¶
Job Submission: The operator connects via SSH and submits a wrapper script that:
Creates a unique job directory on the remote host
Starts your command as a detached process (
nohupon POSIX,Start-Processon Windows)Redirects output to a log file
Writes exit code to a file when complete
Deferral: The operator immediately defers to
SSHRemoteJobTriggerMonitoring: The trigger periodically:
Checks if the exit code file exists (job complete)
Reads new log content incrementally
Yields events with log chunks back to the operator
Completion: When the job finishes:
Final logs are displayed
Exit code is checked (0 = success, non-zero = failure)
Optional cleanup of remote job directory
Using the Operator¶
Basic Example¶
from airflow.providers.ssh.operators.ssh_remote_job import SSHRemoteJobOperator
run_script = SSHRemoteJobOperator(
task_id="run_remote_script",
ssh_conn_id="my_ssh_connection",
command="/path/to/script.sh",
poll_interval=5, # Check status every 5 seconds
cleanup="on_success", # Clean up remote files on success
)
With Environment Variables¶
run_with_env = SSHRemoteJobOperator(
task_id="run_with_environment",
ssh_conn_id="my_ssh_connection",
command="python process_data.py",
environment={
"DATA_PATH": "/data/input.csv",
"OUTPUT_PATH": "/data/output.csv",
"LOG_LEVEL": "INFO",
},
)
Windows Remote Host¶
run_on_windows = SSHRemoteJobOperator(
task_id="run_on_windows",
ssh_conn_id="windows_ssh_connection",
command="C:\\Scripts\\process.ps1",
remote_os="windows", # Explicitly specify Windows
poll_interval=10,
)
With Timeout and Skip on Exit Code¶
run_with_options = SSHRemoteJobOperator(
task_id="run_with_options",
ssh_conn_id="my_ssh_connection",
command="./long_running_job.sh",
timeout=3600, # Fail if not complete in 1 hour
skip_on_exit_code=99, # Skip task if job exits with code 99
cleanup="always", # Always clean up, even on failure
)
Parameters¶
ssh_conn_id(str, required): The Airflow connection ID for SSH connectioncommand(str, required): The command or script path to execute on the remote hostremote_host(str, optional): Override the host from the connectionenvironment(dict, optional): Environment variables to set for the commandremote_base_dir(str, optional): Base directory for job files. Defaults to:POSIX:
/tmp/airflow-ssh-jobsWindows:
$env:TEMP\\airflow-ssh-jobs
poll_interval(int, optional): How often to check job status in seconds. Default: 5log_chunk_size(int, optional): Maximum bytes to read from log per poll. Default: 65536timeout(int, optional): Hard timeout for the entire task in seconds. Default: None (no timeout)cleanup(str, optional): When to clean up remote job directory:"never"(default): Never clean up"on_success": Clean up only if job succeeds"always": Always clean up regardless of status
remote_os(str, optional): Remote OS type ("auto","posix","windows"). Default:"auto"skip_on_exit_code(int or list, optional): Exit code(s) that should cause task to skip instead of fail
Remote OS Detection¶
The operator can automatically detect the remote OS (remote_os="auto"), but explicit specification
is more reliable:
Use
remote_os="posix"for Linux and macOS hostsUse
remote_os="windows"for Windows hosts with OpenSSH Server
For Windows, ensure:
OpenSSH Server is installed and running
PowerShell is available (default on modern Windows)
SSH connection allows command execution
Job Directory Structure¶
Each job creates a directory on the remote host with these files:
/tmp/airflow-ssh-jobs/af_mydag_mytask_run123_try1_abc12345/
├── stdout.log # Combined stdout/stderr
├── exit_code # Final exit code (0 or non-zero)
├── pid # Process ID (for on_kill)
└── status # Optional status file (for user scripts)
Your command can access these via environment variables:
LOG_FILE: Path to the log fileSTATUS_FILE: Path to the status file
Connection Requirements¶
The SSH connection must support:
Non-interactive authentication (password or key-based)
Command execution without PTY
File I/O on the remote host
See SSH Connection for connection configuration.
Limitations and Considerations¶
Network Interruptions: While the operator is resilient to disconnections during monitoring, the initial job submission must succeed. If submission fails, the task will fail immediately.
Remote Process Management: Jobs are detached using nohup (POSIX) or Start-Process (Windows).
If the remote host reboots during job execution, the job will be lost.
Log Size: Large log outputs may impact performance. The log_chunk_size parameter controls
how much data is read per poll. For very large logs (GBs), consider having your script write
logs to a separate file and only log summaries to stdout.
Exit Code Detection: The operator uses file-based exit code detection for reliability.
If your script uses exec to replace the shell process, ensure the exit code is still
written to the file.
Concurrent Jobs: Each task instance creates a unique job directory. Multiple concurrent tasks can run on the same remote host without conflicts.
Cleanup: Use cleanup="on_success" or cleanup="always" to avoid accumulating
job directories on the remote host. For debugging, use cleanup="never" and manually
inspect the job directory.
Comparison with SSHOperator¶
Feature |
SSHOperator |
SSHRemoteJobOperator |
|---|---|---|
Execution Model |
Synchronous |
Asynchronous |
Worker Slot Usage |
Entire duration |
Only during submit |
Network Resilience |
Low |
High |
Long-running Jobs |
Not recommended |
Designed for |
Incremental Logs |
No |
Yes |
Windows Support |
Limited |
Full (via OpenSSH) |
Setup Complexity |
Simple |
Moderate |