airflow.providers.microsoft.azure.operators.synapse

Module Contents

Classes

AzureSynapseRunSparkBatchOperator

Execute a Spark job on Azure Synapse.

AzureSynapsePipelineRunLink

Construct a link to monitor a pipeline run in Azure Synapse.

AzureSynapseRunPipelineOperator

Execute a Synapse Pipeline.

class airflow.providers.microsoft.azure.operators.synapse.AzureSynapseRunSparkBatchOperator(*, azure_synapse_conn_id=AzureSynapseHook.default_conn_name, wait_for_termination=True, spark_pool='', payload, timeout=60 * 60 * 24 * 7, check_interval=60, **kwargs)[source]

Bases: airflow.models.BaseOperator

Execute a Spark job on Azure Synapse.

Parameters
  • azure_synapse_conn_id (str) – The connection identifier for connecting to Azure Synapse.

  • wait_for_termination (bool) – Flag to wait on a job run’s termination.

  • spark_pool (str) – The target synapse spark pool used to submit the job

  • payload (azure.synapse.spark.models.SparkBatchJobOptions) – Livy compatible payload which represents the spark job that a user wants to submit

  • timeout (int) – Time in seconds to wait for a job to reach a terminal status for non-asynchronous waits. Used only if wait_for_termination is True.

  • check_interval (int) – Time in seconds to check on a job run’s status for non-asynchronous waits. Used only if wait_for_termination is True.

template_fields: collections.abc.Sequence[str] = ('azure_synapse_conn_id', 'spark_pool')[source]
template_fields_renderers[source]
ui_color = '#0678d4'[source]
hook()[source]

Create and return an AzureSynapseHook (cached).

execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

on_kill()[source]

Override this method to clean up subprocesses when a task instance gets killed.

Any use of the threading, subprocess or multiprocessing module within an operator needs to be cleaned up, or it will leave ghost processes behind.

Bases: airflow.models.BaseOperatorLink

Construct a link to monitor a pipeline run in Azure Synapse.

name = 'Monitor Pipeline Run'[source]
get_fields_from_url(workspace_url)[source]

Extract the workspace_name, subscription_id and resource_group from the Synapse workspace url.

Parameters

workspace_url – The workspace url.

Link to external system.

Note: The old signature of this function was (self, operator, dttm: datetime). That is still supported at runtime but is deprecated.

Parameters
Returns

link to external system

class airflow.providers.microsoft.azure.operators.synapse.AzureSynapseRunPipelineOperator(pipeline_name, azure_synapse_conn_id, azure_synapse_workspace_dev_endpoint, wait_for_termination=True, reference_pipeline_run_id=None, is_recovery=None, start_activity_name=None, parameters=None, timeout=60 * 60 * 24 * 7, check_interval=60, **kwargs)[source]

Bases: airflow.models.BaseOperator

Execute a Synapse Pipeline.

Parameters
  • pipeline_name (str) – The name of the pipeline to execute.

  • azure_synapse_conn_id (str) – The Airflow connection ID for Azure Synapse.

  • azure_synapse_workspace_dev_endpoint (str) – The Azure Synapse workspace development endpoint.

  • wait_for_termination (bool) – Flag to wait on a pipeline run’s termination.

  • reference_pipeline_run_id (str | None) – The pipeline run identifier. If this run ID is specified the parameters of the specified run will be used to create a new run.

  • is_recovery (bool | None) – Recovery mode flag. If recovery mode is set to True, the specified referenced pipeline run and the new run will be grouped under the same groupId.

  • start_activity_name (str | None) – In recovery mode, the rerun will start from this activity. If not specified, all activities will run.

  • parameters (dict[str, Any] | None) – Parameters of the pipeline run. These parameters are referenced in a pipeline via @pipeline().parameters.parameterName and will be used only if the reference_pipeline_run_id is not specified.

  • timeout (int) – Time in seconds to wait for a pipeline to reach a terminal status for non-asynchronous waits. Used only if wait_for_termination is True.

  • check_interval (int) – Time in seconds to check on a pipeline run’s status for non-asynchronous waits. Used only if wait_for_termination is True.

template_fields: collections.abc.Sequence[str] = ('azure_synapse_conn_id',)[source]
hook()[source]

Create and return an AzureSynapsePipelineHook (cached).

execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

execute_complete(event)[source]

Return immediately - callback for when the trigger fires.

Relies on trigger to throw an exception, otherwise it assumes execution was successful.

on_kill()[source]

Override this method to clean up subprocesses when a task instance gets killed.

Any use of the threading, subprocess or multiprocessing module within an operator needs to be cleaned up, or it will leave ghost processes behind.

Was this entry helpful?