airflow.providers.apache.spark.hooks.spark_pipelines

Exceptions

SparkPipelinesException

Exception raised when spark-pipelines command fails.

Classes

SparkPipelinesHook

Hook for interacting with Spark Declarative Pipelines via the spark-pipelines CLI.

Module Contents

exception airflow.providers.apache.spark.hooks.spark_pipelines.SparkPipelinesException[source]

Bases: airflow.providers.common.compat.sdk.AirflowException

Exception raised when spark-pipelines command fails.

class airflow.providers.apache.spark.hooks.spark_pipelines.SparkPipelinesHook(pipeline_spec=None, pipeline_command='run', **kwargs)[source]

Bases: airflow.providers.apache.spark.hooks.spark_submit.SparkSubmitHook

Hook for interacting with Spark Declarative Pipelines via the spark-pipelines CLI.

Extends SparkSubmitHook to leverage existing connection management while providing pipeline-specific functionality.

Two connection modes are supported:

  • Legacy spark-submit-style (spark / yarn / k8s connection types) — invokes the spark-pipelines launcher with --master, --deploy-mode and the rest of the standard cluster-manager flags assembled by SparkSubmitHook.

  • Spark Connect (spark_connect connection type, Spark 4.x+) — sets SPARK_REMOTE from the connection’s sc:// URI and invokes the Connect-native pyspark.pipelines.cli Python module directly. The cluster-manager flags are not emitted: the Connect-native CLI rejects them with SparkException: Remote cannot be specified with master and/or deploy mode, and the spark-pipelines bash launcher itself starts a JVM SparkContext that collides with the Connect daemon’s gRPC port.

Parameters:
  • pipeline_spec (str | None) – Path to the pipeline specification file (YAML)

  • pipeline_command (str) – The spark-pipelines command to run (‘run’, ‘dry-run’)

pipeline_spec = None[source]
pipeline_command = 'run'[source]
submit_pipeline(**kwargs)[source]

Execute the spark-pipelines command.

Parameters:

kwargs (Any) – extra arguments to Popen (see subprocess.Popen)

submit(application='', **kwargs)[source]

Override submit to use pipeline-specific logic.

Was this entry helpful?