Java SDK

This is an experimental feature.

The Java SDK lets you implement Airflow task logic in Java, Kotlin, or any other JVM language. The Dag and its scheduling remain in Python; individual tasks delegate to a JVM subprocess that is spawned by JavaCoordinator for each task instance.

API reference

The generated API reference for the Java SDK is published with the Airflow documentation at Java SDK API Reference.

Prerequisites

JRE 11 or later must be available on the Airflow worker nodes.
The compiled task JAR(s) and JVM dependencies must be accessible from the worker.
The apache-airflow-task-sdk package (installed with Airflow) provides the coordinator; no additional Python packages are needed.

Quick start

The following example shows the minimal moving parts: a Python Dag with two stub tasks, and a Java implementation of those tasks.

Python Dag (the scheduling side)

from airflow.sdk import dag, task


@dag
def sales_pipeline():
    @task.stub(queue="java")
    def extract(): ...

    @task.stub(queue="java")
    def transform(extracted): ...

    @task()
    def load(transformed):
        print(f"Loaded: {transformed}")

    load(transform(extract()))


sales_pipeline()

Java implementation

import org.apache.airflow.sdk.*;

@Builder.Dag(id = "sales_pipeline")
public class SalesPipeline {

  @Builder.Task(id = "extract")
  public long extract(Client client) {
    var conn = client.getConnection("sales_db");
    // ... fetch data using conn.host, conn.login, conn.password ...
    return recordCount;
  }

  @Builder.Task(id = "transform")
  public long transform(
    Client client,
    @Builder.XCom(task = "extract") long recordCount
  ) {
    var threshold = (String) client.getVariable("transform_threshold");
    // ... process data ...
    return transformedCount;
  }
}

Note

See how both transform in Python and Java need to have an argument to accept upstream XCom. The Python one is needed to declare dependency, and the Java one is needed to actually retrieve the value.

Java entry point

public class Main implements BundleBuilder {
  @Override
  public Iterable<Dag> getDags() {
    return List.of(SalesPipelineBuilder.build());  // SalesPipelineBuilder generated at compile time
  }

  public static void main(String[] args) {
    Server.create(args).serve(new Main().build());
  }
}

Coordinator configuration

[sdk]
coordinators = {
  "java-jdk17": {
    "classpath": "airflow.sdk.coordinators.java.JavaCoordinator",
    "kwargs": {"jars_root": ["/opt/airflow/jars"]}
  }
}
queue_to_coordinator = {"java": "java-jdk17"}

See JavaCoordinator configuration for the full list of accepted kwargs.

Writing tasks

The Java SDK offers two APIs for implementing tasks. Both produce the same runtime behavior; the choice is a matter of style.

Annotation-based API

Annotate a plain Java class and let the SDK generate the boilerplate at compile time.

Annotation	Purpose
`@Builder.Dag(id = "...")`	Marks the class as a task container. The `id` must match the `dag_id` in the Python Dag.
`@Builder.Task(id = "...")`	Marks a method as a task implementation. The `id` must match the `@task.stub` function name in the Python Dag. If `id` is omitted the method name is used.
`@Builder.XCom(task = "...")`	Injects the `return_value` XCom from the named upstream task as a method parameter. The parameter type must be compatible with the stored value (see XCom type mapping).

The annotation processor generates a <ClassName>Builder class that wires up the task registry and handles XCom injection automatically.

@Builder.Dag(id = "my_dag")
public class MyDag {

  @Builder.Task(id = "fetch")
  public String fetch(Client client) throws Exception {
    var conn = client.getConnection("my_api");
    // implement task logic
    return result;
  }

  @Builder.Task(id = "process")
  public long process(
    Client client,
    @Builder.XCom(task = "fetch") String fetched
  ) {
    var threshold = (String) client.getVariable("process_threshold");
    // implement task logic
    return count;
  }
}

A task method may declare throws Exception; any uncaught exception causes the task instance to be marked as failed in Airflow (triggering retries if configured on the stub).

Interface-based API

Implement the Task interface directly for full control over how tasks are registered and how XComs are read.

import org.apache.airflow.sdk.*;

public class FetchTask implements Task {
  @Override
  public void execute(Context context, Client client) throws Exception {
    var conn = client.getConnection("my_api");
    // implement task logic
    client.setXCom(result);
  }
}

Register tasks manually in a BundleBuilder:

public class MyBundle implements BundleBuilder {
  @Override
  public Iterable<Dag> getDags() {
    var dag = new Dag("my_dag");
    dag.addTask("fetch", FetchTask.class);
    dag.addTask("process", ProcessTask.class);
    return List.of(dag);
  }
}

See the Java SDK API Reference for more details.

Logging

Task code can emit log records through any common Java logging framework. The SDK ships optional integration libraries that forward those records to Airflow’s task log store, where they appear alongside the standard task output in the Airflow UI.

Declare a logger as a static field on the task class, using the class’s own type as the name. This is the conventional pattern regardless of which logging framework you choose:

private static final System.Logger log =
    System.getLogger(SalesPipeline.class.getName());

@Builder.Task(id = "extract")
public long extract(Client client) {
    log.log(System.Logger.Level.INFO, "Starting extraction");
    return recordCount;
}

The Gradle snippets below show the dependency declarations; all Airflow artifact versions are managed by airflow-sdk-bom. Maven users apply the same artifact IDs following the pattern in Maven.

`System.Logger` (Java Platform Logging)

Java 9’s new logging facade java.lang.System.Logger (JEP 264), commonly abbreviated JPL, can be used by libraries without pulling in any third-party API. The airflow-sdk-jpl artifact registers an AirflowSystemLoggerFinder via ServiceLoader, which routes all System.Logger calls directly to Airflow’s task log store.

implementation("org.apache.airflow:airflow-sdk-jpl:${version}")

No configuration file or startup call is required. The ServiceLoader mechanism discovers the provider automatically as long as the JAR is on the classpath.

Note

Do not add a second System.LoggerFinder implementation alongside airflow-sdk-jpl. The JVM selects one finder via ServiceLoader; having multiple providers on the classpath leads to unpredictable behaviour.

SLF4J 2.x

The SLF4J binding is discovered automatically via ServiceLoader; no configuration file or startup call is required.

implementation("org.apache.airflow:airflow-sdk-slf4j:${version}")

The above automatically pulls in the SLF4J API, so you don’t need to add slf4j-api yourself.

Note

Do not add a second SLF4J binding (such as logback-classic or slf4j-simple) alongside airflow-sdk-slf4j. SLF4J 2.x warns about multiple bindings and selects one unpredictably.

Log4j 2

airflow-sdk-log4j2 declares log4j-api as a transitive dependency, so you do not need to add the latter separately. You must also place log4j-core on the runtime classpath to host the plugin loader that discovers the custom AirflowAppender supplied by airflow-sdk-log4j2 at startup:

implementation("org.apache.airflow:airflow-sdk-log4j2:${version}")
runtimeOnly("org.apache.logging.log4j:log4j-core:${log4jVersion}")

Declare AirflowAppender in your log4j2.xml:

<?xml version="1.0" encoding="UTF-8"?>
<Configuration>
  <Appenders>
    <AirflowAppender name="Airflow"/>
  </Appenders>
  <Loggers>
    <Root level="info">
      <AppenderRef ref="Airflow"/>
    </Root>
  </Loggers>
</Configuration>

`java.util.logging`

Add the artifact:

implementation("org.apache.airflow:airflow-sdk-jul:${version}")

and call AirflowJulHandler.setup() on startup, before any task runs. It clears the JUL root logger’s existing handlers (including the default ConsoleHandler, whose stderr output Airflow would otherwise capture as task.stderr at ERROR level, duplicating each record and mislabeling its level) and installs AirflowJulHandler in their place:

public static void main(String[] args) {
    AirflowJulHandler.setup();
    Server.create(args).serve(new MyBundle());
}

Alternatively, declare the handler in a logging.properties file and point JUL at it with the java.util.logging.config.file system property (set via jvm_args in the coordinator configuration):

handlers = org.apache.airflow.sdk.jul.AirflowJulHandler

[sdk]
coordinators = {
  "java-jdk17": {
    "classpath": "airflow.sdk.coordinators.java.JavaCoordinator",
    "kwargs": {
      "jars_root": ["/opt/airflow/jars"],
      "jvm_args": ["-Djava.util.logging.config.file=/opt/airflow/logging.properties"]
    }
  }
}

Other frameworks

Several commonly used logging APIs are covered without a dedicated Airflow artifact:

Logback is itself an SLF4J binding. Replace logback-classic with airflow-sdk-slf4j and no changes are needed in your task code.
Apache Commons Logging (JCL) can be bridged to SLF4J via org.slf4j:jcl-over-slf4j or to Log4j 2 via org.apache.logging.log4j:log4j-jcl.

XCom type mapping

XCom values are stored as JSON in Airflow’s metadata database. The table below shows how JSON types are represented as Java objects when read back via getXCom.

Python type	JSON	Java type (from `getXCom`)
`int`	number (integer)	`Long` (for values that fit; `BigInteger` otherwise)
`float`	number (decimal)	`Double`
`str`	string	`String`
`bool`	boolean	`Boolean`
`None`	null	`null`
`list`	array	`List<Object>`
`dict`	object	`Map<String, Object>`

Note

An @Builder.XCom parameter that reads a value which was never pushed resolves to null. A boxed parameter (Integer, Long, Boolean, …) receives null safely, but a primitive parameter (int, long, boolean, …) cannot represent null and the task fails with MissingXComException. Declare the parameter with a boxed type when the upstream XCom may be absent.

Building and packaging

The Java SDK is distributed as a JAR. The sections below show how to build a bundle with Gradle or Maven.

Gradle

Apply the Airflow SDK Gradle plugin in your build.gradle:

plugins {
    id("org.apache.airflow.sdk") version "${version}"
}

dependencies {
    annotationProcessor("org.apache.airflow:airflow-sdk-processor:${version}")
    implementation("org.apache.airflow:airflow-sdk:${version}")
}

airflowBundle {
    mainClass = "com.example.Main"  // Point to your main class instead.
}

Then run:

./gradlew bundle

The build/bundle/ directory contains all required JAR(s). Copy or mount it into the directory pointed to by jars_root in the coordinator configuration. JavaCoordinator scans jars_root recursively and builds the classpath automatically.

Note

You only need the annotationProcessor entry if you use the annotation-based API. It is not needed for the interface-based API.

Note

The plugin generates a fat JAR with the Shadow plugin by default. This is generally a good idea since you only deploy one JAR file to avoid dependency issues between projects. If this does not suit you, set fatJar = false in airflowBundle to produce thin JARs instead. The rest of the process stays the same, but you will need to put all dependency JARs somewhere Airflow can find with jars_root.

Maven

Import the airflow-sdk-bom Bill of Materials so that artifact versions and the ${airflow.supervisor.schema.version} property are managed in one place:

<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>org.apache.airflow</groupId>
            <artifactId>airflow-sdk-bom</artifactId>
            <version>${version}</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
    </dependencies>
</dependencyManagement>

Add the SDK as a dependency (version is managed by the BOM):

<dependencies>
    <dependency>
        <groupId>org.apache.airflow</groupId>
        <artifactId>airflow-sdk</artifactId>
    </dependency>
</dependencies>

Wire the annotation processor through maven-compiler-plugin so it stays off the runtime classpath:

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-compiler-plugin</artifactId>
    <configuration>
        <annotationProcessorPaths>
            <path>
                <groupId>org.apache.airflow</groupId>
                <artifactId>airflow-sdk-processor</artifactId>
                <version>${version}</version>
            </path>
        </annotationProcessorPaths>
    </configuration>
</plugin>

Option 1 (recommended): fat JAR

Use maven-shade-plugin to bundle your code and all dependencies into a single JAR. This is the simplest deployment: one file, no dependency management at runtime.

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-shade-plugin</artifactId>
    <version>3.6.0</version>
    <executions>
        <execution>
            <phase>package</phase>
            <goals><goal>shade</goal></goals>
            <configuration>
                <transformers>
                    <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                        <!-- Replace with your BundleBuilder implementation. -->
                        <mainClass>com.example.Main</mainClass>
                        <manifestEntries>
                            <!-- Resolved from the BOM; do not hard-code this value. -->
                            <Airflow-Supervisor-Schema-Version>${airflow.supervisor.schema.version}</Airflow-Supervisor-Schema-Version>
                        </manifestEntries>
                    </transformer>
                </transformers>
            </configuration>
        </execution>
    </executions>
</plugin>

Then run:

mvn package

The fat JAR is written to target/<artifactId>-<version>.jar. Copy it to the directory configured as jars_root in your coordinator.

Option 2: thin JAR with separate dependencies

If a fat JAR does not suit your project, use maven-jar-plugin to set Main-Class on the regular JAR and maven-dependency-plugin to collect all runtime dependencies alongside it. Note that Airflow-Supervisor-Schema-Version does not need to be set here since Airflow reads it directly from the airflow-sdk JAR on the classpath.

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-jar-plugin</artifactId>
    <configuration>
        <archive>
            <manifestEntries>
                <Main-Class>com.example.Main</Main-Class>
            </manifestEntries>
        </archive>
    </configuration>
</plugin>

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-dependency-plugin</artifactId>
    <executions>
        <execution>
            <id>copy-dependencies</id>
            <phase>package</phase>
            <goals><goal>copy-dependencies</goal></goals>
            <configuration>
                <outputDirectory>${project.build.directory}/bundle</outputDirectory>
                <includeScope>runtime</includeScope>
            </configuration>
        </execution>
        <execution>
            <id>copy-artifact</id>
            <phase>package</phase>
            <goals><goal>copy</goal></goals>
            <configuration>
                <artifactItems>
                    <artifactItem>
                        <groupId>${project.groupId}</groupId>
                        <artifactId>${project.artifactId}</artifactId>
                        <version>${project.version}</version>
                        <outputDirectory>${project.build.directory}/bundle</outputDirectory>
                    </artifactItem>
                </artifactItems>
            </configuration>
        </execution>
    </executions>
</plugin>

Then run:

mvn package

target/bundle/ will contain the thin JAR and all runtime dependency JARs. Point jars_root at this directory.

Note

You only need the annotationProcessorPaths entry if you use the annotation-based API.

Note

Unlike the Gradle plugin, Maven has no equivalent of the verifyBundleMainClass validation step. A wrong <mainClass> value will not be caught until runtime.

`JavaCoordinator` configuration

All kwargs in the coordinators config entry are passed to the JavaCoordinator constructor:

Parameter	Default	Description
`jars_root`	(required)	One or more directories scanned recursively for `.jar` files. Accepts a string, a path, or a list of strings/paths.
`java_executable`	`"java"`	Path to the `java` binary. Defaults to `java` on `$PATH`.
`jvm_args`	`[]`	Extra JVM arguments such as `["-Xmx1g", "-Dsome.property=value"]`.
`main_class`	(auto-detect)	Explicit entry-point class. If omitted, `JavaCoordinator` scans `jars_root` for a JAR whose manifest sets `Main-Class`. If multiple executable JARs are found the result is non-deterministic; set `main_class` explicitly in that case.
`task_startup_timeout`	`10.0`	Seconds to wait for the JVM subprocess to connect after launch. Increase this if your JVM startup is slow (e.g. on constrained hardware or with a large classpath).

Note

The [sdk] configuration is read at startup, so changes to coordinators or queue_to_coordinator (for example adding jvm_args) only take effect after you restart the scheduler (or airflow standalone). A rebuilt bundle JAR, by contrast, is picked up on the next task launch without a restart, because a fresh JVM is spawned per task instance.

Pinning the Java executable

As a general recommendation, set java_executable to an absolute path rather than relying on java resolving from $PATH. This pins tasks to a known JDK, which matters most in production or corporate environments where the Airflow admin may not control the system-wide java (the same reasoning behind pinning a Python version).

For example, if you install the JDK with Homebrew on macOS, its java is not on $PATH, so point java_executable at it explicitly:

[sdk]
coordinators = {
  "java-jdk17": {
    "classpath": "airflow.sdk.coordinators.java.JavaCoordinator",
    "kwargs": {
      "jars_root": ["/opt/airflow/jars"],
      "java_executable": "/opt/homebrew/opt/openjdk@17/bin/java"
    }
  }
}
queue_to_coordinator = {"java": "java-jdk17"}

Limitations

One JVM subprocess per task instance. Each task instance spawns a fresh JVM. Tasks that need to share in-process state between instances should use XCom or an external store instead.
Limited support for assets, deferral, and other Airflow features. They may be implemented in the future based on user feedback and demand.