Java SDK
This is an experimental feature.
The Java SDK lets you implement Airflow task logic in Java, Kotlin, or any other JVM language. The Dag and its
scheduling remain in Python; individual tasks delegate to a JVM subprocess that is spawned by
JavaCoordinator for each task instance.
Prerequisites
JRE 17 or later must be available on the Airflow worker nodes.
The compiled task JAR(s) and JVM dependencies must be accessible from the worker.
The
apache-airflow-task-sdkpackage (installed with Airflow) provides the coordinator; no additional Python packages are needed.
Quick start
The following example shows the minimal moving parts: a Python Dag with two stub tasks, and a Java implementation of those tasks.
Python Dag (the scheduling side)
from airflow.sdk import dag, task
@dag
def sales_pipeline():
@task.stub(queue="java")
def extract(): ...
@task.stub(queue="java")
def transform(extracted): ...
@task()
def load(transformed):
print(f"Loaded: {transformed}")
load(transform(extract()))
sales_pipeline()
Java implementation
import org.apache.airflow.sdk.*;
@Builder.Dag(id = "sales_pipeline")
public class SalesPipeline {
@Builder.Task(id = "extract")
public long extract(Client client) {
var conn = client.getConnection("sales_db");
// ... fetch data using conn.host, conn.login, conn.password ...
return recordCount;
}
@Builder.Task(id = "transform")
public long transform(
Client client,
@Builder.XCom(task = "extract") long recordCount
) {
var threshold = (String) client.getVariable("transform_threshold");
// ... process data ...
return transformedCount;
}
}
Note
See how both transform in Python and Java need to have an argument to accept upstream XCom. The
Python one is needed to declare dependency, and the Java one is needed to actually retrieve the value.
Java entry point
public class Main implements BundleBuilder {
@Override
public Iterable<Dag> getDags() {
return List.of(SalesPipelineBuilder.build()); // SalesPipelineBuilder generated at compile time
}
public static void main(String[] args) {
Server.create(args).serve(new Main().build());
}
}
Coordinator configuration
[sdk]
coordinators = {
"java-jdk17": {
"classpath": "airflow.sdk.coordinators.java.JavaCoordinator",
"kwargs": {"jars_root": ["/opt/airflow/jars"]}
}
}
queue_to_coordinator = {"java": "java-jdk17"}
See JavaCoordinator configuration for the full list of accepted kwargs.
Writing tasks
The Java SDK offers two APIs for implementing tasks. Both produce the same runtime behavior; the choice is a matter of style.
Annotation-based API
Annotate a plain Java class and let the SDK generate the boilerplate at compile time.
Annotation |
Purpose |
|---|---|
|
Marks the class as a task container. The |
|
Marks a method as a task implementation. The |
|
Injects the |
The annotation processor generates a <ClassName>Builder class that wires up the task
registry and handles XCom injection automatically.
@Builder.Dag(id = "my_dag")
public class MyDag {
@Builder.Task(id = "fetch")
public String fetch(Client client) throws Exception {
var conn = client.getConnection("my_api");
// implement task logic
return result;
}
@Builder.Task(id = "process")
public long process(
Client client,
@Builder.XCom(task = "fetch") String fetched
) {
var threshold = (String) client.getVariable("process_threshold");
// implement task logic
return count;
}
}
A task method may declare throws Exception; any uncaught exception causes the task instance to be marked
as failed in Airflow (triggering retries if configured on the stub).
Interface-based API
Implement the Task interface directly for full control over how tasks are registered and how XComs are
read.
import org.apache.airflow.sdk.*;
public class FetchTask implements Task {
@Override
public void execute(Context context, Client client) throws Exception {
var conn = client.getConnection("my_api");
// implement task logic
client.setXCom(result);
}
}
Register tasks manually in a BundleBuilder:
public class MyBundle implements BundleBuilder {
@Override
public Iterable<Dag> getDags() {
var dag = new Dag("my_dag");
dag.addTask("fetch", FetchTask.class);
dag.addTask("process", ProcessTask.class);
return List.of(dag);
}
}
See the Java SDK’s published JavaDoc for more details.
Logging
Task code can emit log records through any common Java logging framework. The SDK ships optional integration libraries that forward those records to Airflow’s task log store, where they appear alongside the standard task output in the Airflow UI.
Declare a logger as a static field on the task class, using the class’s own type as the name. This is the conventional pattern regardless of which logging framework you choose:
private static final System.Logger log =
System.getLogger(SalesPipeline.class.getName());
@Builder.Task(id = "extract")
public long extract(Client client) {
log.log(System.Logger.Level.INFO, "Starting extraction");
return recordCount;
}
The Gradle snippets below show the dependency declarations; all Airflow artifact versions are managed
by airflow-sdk-bom. Maven users apply the same artifact IDs following the pattern in
Maven.
System.Logger (Java Platform Logging)
Java 9’s new logging facade java.lang.System.Logger (JEP 264), commonly abbreviated JPL, can be
used by libraries without pulling in any third-party API. The airflow-sdk-jpl artifact registers an
AirflowSystemLoggerFinder via ServiceLoader, which routes all System.Logger calls directly
to Airflow’s task log store.
implementation("org.apache.airflow:airflow-sdk-jpl:${version}")
No configuration file or startup call is required. The ServiceLoader mechanism discovers the
provider automatically as long as the JAR is on the classpath.
Note
Do not add a second System.LoggerFinder implementation alongside
airflow-sdk-jpl. The JVM selects one finder via ServiceLoader; having
multiple providers on the classpath leads to unpredictable behaviour.
SLF4J 2.x
The SLF4J binding is discovered automatically via ServiceLoader; no configuration file or
startup call is required.
implementation("org.apache.airflow:airflow-sdk-slf4j:${version}")
The above automatically pulls in the SLF4J API, so you don’t need to add slf4j-api yourself.
Note
Do not add a second SLF4J binding (such as logback-classic or slf4j-simple) alongside
airflow-sdk-slf4j. SLF4J 2.x warns about multiple bindings and selects one unpredictably.
Log4j 2
airflow-sdk-log4j2 declares log4j-api as a transitive dependency, so you do not need to add the latter
separately. You must also place log4j-core on the runtime classpath to host the plugin loader that
discovers the custom AirflowAppender supplied by airflow-sdk-log4j2 at startup:
implementation("org.apache.airflow:airflow-sdk-log4j2:${version}")
runtimeOnly("org.apache.logging.log4j:log4j-core:${log4jVersion}")
Declare AirflowAppender in your log4j2.xml:
<?xml version="1.0" encoding="UTF-8"?>
<Configuration>
<Appenders>
<AirflowAppender name="Airflow"/>
</Appenders>
<Loggers>
<Root level="info">
<AppenderRef ref="Airflow"/>
</Root>
</Loggers>
</Configuration>
java.util.logging
Add the artifact:
implementation("org.apache.airflow:airflow-sdk-jul:${version}")
and call AirflowJulHandler.setup() on startup, before any task runs. It clears the JUL root
logger’s existing handlers (including the default ConsoleHandler, whose stderr output Airflow
would otherwise capture as task.stderr at ERROR level, duplicating each record and mislabeling
its level) and installs AirflowJulHandler in their place:
public static void main(String[] args) {
AirflowJulHandler.setup();
Server.create(args).serve(new MyBundle());
}
Alternatively, declare the handler in a logging.properties file and point JUL at it with the
java.util.logging.config.file system property (set via jvm_args in the coordinator
configuration):
handlers = org.apache.airflow.sdk.jul.AirflowJulHandler
[sdk]
coordinators = {
"java-jdk17": {
"classpath": "airflow.sdk.coordinators.java.JavaCoordinator",
"kwargs": {
"jars_root": ["/opt/airflow/jars"],
"jvm_args": ["-Djava.util.logging.config.file=/opt/airflow/logging.properties"]
}
}
}
Other frameworks
Several commonly used logging APIs are covered without a dedicated Airflow artifact:
Logback is itself an SLF4J binding. Replace
logback-classicwithairflow-sdk-slf4jand no changes are needed in your task code.Apache Commons Logging (JCL) can be bridged to SLF4J via
org.slf4j:jcl-over-slf4jor to Log4j 2 viaorg.apache.logging.log4j:log4j-jcl.
XCom type mapping
XCom values are stored as JSON in Airflow’s metadata database. The table below shows how JSON types are
represented as Java objects when read back via getXCom.
Python type |
JSON |
Java type (from |
|---|---|---|
|
number (integer) |
|
|
number (decimal) |
|
|
string |
|
|
boolean |
|
|
null |
|
|
array |
|
|
object |
|
Building and packaging
The Java SDK is distributed as a JAR. The sections below show how to build a bundle with Gradle or Maven.
Gradle
Apply the Airflow SDK Gradle plugin in your build.gradle:
plugins {
id("org.apache.airflow.sdk") version "${version}"
}
dependencies {
annotationProcessor("org.apache.airflow:airflow-sdk-processor:${version}")
implementation("org.apache.airflow:airflow-sdk:${version}")
}
airflowBundle {
mainClass = "com.example.Main" // Point to your main class instead.
}
Then run:
./gradlew bundle
The build/bundle/ directory contains all required JAR(s). Copy or mount it into the directory pointed to
by jars_root in the coordinator configuration. JavaCoordinator
scans jars_root recursively and builds the classpath automatically.
Note
You only need the annotationProcessor entry if you use the annotation-based API. It is not needed for
the interface-based API.
Note
The plugin generates a fat JAR with the Shadow plugin by default. This is
generally a good idea since you only deploy one JAR file to avoid dependency issues between projects. If this
does not suit you, set fatJar = false in airflowBundle to produce thin JARs instead. The rest of the
process stays the same, but you will need to put all dependency JARs somewhere Airflow can find with
jars_root.
Maven
Import the airflow-sdk-bom Bill of Materials so that artifact versions and the
${airflow.supervisor.schema.version} property are managed in one place:
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.apache.airflow</groupId>
<artifactId>airflow-sdk-bom</artifactId>
<version>${version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
Add the SDK as a dependency (version is managed by the BOM):
<dependencies>
<dependency>
<groupId>org.apache.airflow</groupId>
<artifactId>airflow-sdk</artifactId>
</dependency>
</dependencies>
Wire the annotation processor through maven-compiler-plugin so it stays off the runtime classpath:
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<annotationProcessorPaths>
<path>
<groupId>org.apache.airflow</groupId>
<artifactId>airflow-sdk-processor</artifactId>
<version>${version}</version>
</path>
</annotationProcessorPaths>
</configuration>
</plugin>
Option 1 (recommended): fat JAR
Use maven-shade-plugin to bundle your code and all dependencies into a single JAR. This is the
simplest deployment: one file, no dependency management at runtime.
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.6.0</version>
<executions>
<execution>
<phase>package</phase>
<goals><goal>shade</goal></goals>
<configuration>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<!-- Replace with your BundleBuilder implementation. -->
<mainClass>com.example.Main</mainClass>
<manifestEntries>
<!-- Resolved from the BOM; do not hard-code this value. -->
<Airflow-Supervisor-Schema-Version>${airflow.supervisor.schema.version}</Airflow-Supervisor-Schema-Version>
</manifestEntries>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
Then run:
mvn package
The fat JAR is written to target/<artifactId>-<version>.jar. Copy it to the directory configured as
jars_root in your coordinator.
Option 2: thin JAR with separate dependencies
If a fat JAR does not suit your project, use maven-jar-plugin to set Main-Class on the regular
JAR and maven-dependency-plugin to collect all runtime dependencies alongside it. Note that
Airflow-Supervisor-Schema-Version does not need to be set here since Airflow reads it directly from the
airflow-sdk JAR on the classpath.
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<configuration>
<archive>
<manifestEntries>
<Main-Class>com.example.Main</Main-Class>
</manifestEntries>
</archive>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<executions>
<execution>
<id>copy-dependencies</id>
<phase>package</phase>
<goals><goal>copy-dependencies</goal></goals>
<configuration>
<outputDirectory>${project.build.directory}/bundle</outputDirectory>
<includeScope>runtime</includeScope>
</configuration>
</execution>
<execution>
<id>copy-artifact</id>
<phase>package</phase>
<goals><goal>copy</goal></goals>
<configuration>
<artifactItems>
<artifactItem>
<groupId>${project.groupId}</groupId>
<artifactId>${project.artifactId}</artifactId>
<version>${project.version}</version>
<outputDirectory>${project.build.directory}/bundle</outputDirectory>
</artifactItem>
</artifactItems>
</configuration>
</execution>
</executions>
</plugin>
Then run:
mvn package
target/bundle/ will contain the thin JAR and all runtime dependency JARs. Point jars_root at
this directory.
Note
You only need the annotationProcessorPaths entry if you use the annotation-based API.
Note
Unlike the Gradle plugin, Maven has no equivalent of the verifyBundleMainClass validation step.
A wrong <mainClass> value will not be caught until runtime.
JavaCoordinator configuration
All kwargs in the coordinators config entry are passed to the
JavaCoordinator constructor:
Parameter |
Default |
Description |
|---|---|---|
|
(required) |
One or more directories scanned recursively for |
|
|
Path to the |
|
|
Extra JVM arguments such as |
|
(auto-detect) |
Explicit entry-point class. If omitted,
|
|
|
Seconds to wait for the JVM subprocess to connect after launch. Increase this if your JVM startup is slow (e.g. on constrained hardware or with a large classpath). |
Note
The [sdk] configuration is read at startup, so changes to coordinators or
queue_to_coordinator (for example adding jvm_args) only take effect after you restart the
scheduler (or airflow standalone). A rebuilt bundle JAR, by contrast, is picked up on the next
task launch without a restart, because a fresh JVM is spawned per task instance.
Pinning the Java executable
As a general recommendation, set java_executable to an absolute path rather than relying on
java resolving from $PATH. This pins tasks to a known JDK, which matters most in production or
corporate environments where the Airflow admin may not control the system-wide java (the same
reasoning behind pinning a Python version).
For example, if you install the JDK with Homebrew on macOS, its java is not on $PATH, so
point java_executable at it explicitly:
[sdk]
coordinators = {
"java-jdk17": {
"classpath": "airflow.sdk.coordinators.java.JavaCoordinator",
"kwargs": {
"jars_root": ["/opt/airflow/jars"],
"java_executable": "/opt/homebrew/opt/openjdk@17/bin/java"
}
}
}
queue_to_coordinator = {"java": "java-jdk17"}
Limitations
One JVM subprocess per task instance. Each task instance spawns a fresh JVM. Tasks that need to share in-process state between instances should use XCom or an external store instead.
Limited support for assets, deferral, and other Airflow features. They may be implemented in the future based on user feedback and demand.