airflow.providers.google.cloud.hooks.datafusion

This module contains Google DataFusion hook.

Attributes

Operation

FAILURE_STATES

SUCCESS_STATES

Exceptions

ConflictException

Exception to catch 409 error.

Classes

PipelineStates

Data Fusion pipeline states.

DataFusionHook

Hook for Google DataFusion.

DataFusionAsyncHook

Class to get asynchronous hook for DataFusion.

Module Contents

airflow.providers.google.cloud.hooks.datafusion.Operation[source]
exception airflow.providers.google.cloud.hooks.datafusion.ConflictException[source]

Bases: airflow.exceptions.AirflowException

Exception to catch 409 error.

class airflow.providers.google.cloud.hooks.datafusion.PipelineStates[source]

Data Fusion pipeline states.

PENDING = 'PENDING'[source]
STARTING = 'STARTING'[source]
RUNNING = 'RUNNING'[source]
SUSPENDED = 'SUSPENDED'[source]
RESUMING = 'RESUMING'[source]
COMPLETED = 'COMPLETED'[source]
FAILED = 'FAILED'[source]
KILLED = 'KILLED'[source]
REJECTED = 'REJECTED'[source]
airflow.providers.google.cloud.hooks.datafusion.FAILURE_STATES[source]
airflow.providers.google.cloud.hooks.datafusion.SUCCESS_STATES[source]
class airflow.providers.google.cloud.hooks.datafusion.DataFusionHook(api_version='v1beta1', gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]

Bases: airflow.providers.google.common.hooks.base_google.GoogleBaseHook

Hook for Google DataFusion.

api_version = 'v1beta1'[source]
wait_for_operation(operation)[source]

Wait for long-lasting operation to complete.

wait_for_pipeline_state(pipeline_name, pipeline_id, instance_url, pipeline_type=DataFusionPipelineType.BATCH, namespace='default', success_states=None, failure_states=None, timeout=5 * 60)[source]

Poll for pipeline state and raises an exception if the state fails or times out.

get_conn()[source]

Retrieve connection to DataFusion.

restart_instance(instance_name, location, project_id)[source]

Restart a single Data Fusion instance.

At the end of an operation instance is fully restarted.

Parameters:
  • instance_name (str) – The name of the instance to restart.

  • location (str) – The Cloud Data Fusion location in which to handle the request.

  • project_id (str) – The ID of the Google Cloud project that the instance belongs to.

delete_instance(instance_name, location, project_id)[source]

Delete a single Date Fusion instance.

Parameters:
  • instance_name (str) – The name of the instance to delete.

  • location (str) – The Cloud Data Fusion location in which to handle the request.

  • project_id (str) – The ID of the Google Cloud project that the instance belongs to.

create_instance(instance_name, instance, location, project_id=PROVIDE_PROJECT_ID)[source]

Create a new Data Fusion instance in the specified project and location.

Parameters:
get_instance(instance_name, location, project_id)[source]

Get details of a single Data Fusion instance.

Parameters:
  • instance_name (str) – The name of the instance.

  • location (str) – The Cloud Data Fusion location in which to handle the request.

  • project_id (str) – The ID of the Google Cloud project that the instance belongs to.

get_instance_artifacts(instance_url, namespace='default', scope='SYSTEM')[source]
patch_instance(instance_name, instance, update_mask, location, project_id=PROVIDE_PROJECT_ID)[source]

Update a single Data Fusion instance.

Parameters:
create_pipeline(pipeline_name, pipeline, instance_url, namespace='default')[source]

Create a batch Cloud Data Fusion pipeline.

Parameters:
delete_pipeline(pipeline_name, instance_url, version_id=None, namespace='default')[source]

Delete a batch Cloud Data Fusion pipeline.

Parameters:
  • pipeline_name (str) – Your pipeline name.

  • version_id (str | None) – Version of pipeline to delete

  • instance_url (str) – Endpoint on which the REST APIs is accessible for the instance.

  • namespace (str) – if your pipeline belongs to a Basic edition instance, the namespace ID is always default. If your pipeline belongs to an Enterprise edition instance, you can create a namespace.

list_pipelines(instance_url, artifact_name=None, artifact_version=None, namespace='default')[source]

List Cloud Data Fusion pipelines.

Parameters:
  • artifact_version (str | None) – Artifact version to filter instances

  • artifact_name (str | None) – Artifact name to filter instances

  • instance_url (str) – Endpoint on which the REST APIs is accessible for the instance.

  • namespace (str) – f your pipeline belongs to a Basic edition instance, the namespace ID is always default. If your pipeline belongs to an Enterprise edition instance, you can create a namespace.

get_pipeline_workflow(pipeline_name, instance_url, pipeline_id, pipeline_type=DataFusionPipelineType.BATCH, namespace='default')[source]
start_pipeline(pipeline_name, instance_url, pipeline_type=DataFusionPipelineType.BATCH, namespace='default', runtime_args=None)[source]

Start a Cloud Data Fusion pipeline. Works for both batch and stream pipelines.

Parameters:
  • pipeline_name (str) – Your pipeline name.

  • pipeline_type (airflow.providers.google.cloud.utils.datafusion.DataFusionPipelineType) – Optional pipeline type (BATCH by default).

  • instance_url (str) – Endpoint on which the REST APIs is accessible for the instance.

  • runtime_args (dict[str, Any] | None) – Optional runtime JSON args to be passed to the pipeline

  • namespace (str) – if your pipeline belongs to a Basic edition instance, the namespace ID is always default. If your pipeline belongs to an Enterprise edition instance, you can create a namespace.

stop_pipeline(pipeline_name, instance_url, namespace='default')[source]

Stop a Cloud Data Fusion pipeline. Works for both batch and stream pipelines.

Parameters:
  • pipeline_name (str) – Your pipeline name.

  • instance_url (str) – Endpoint on which the REST APIs is accessible for the instance.

  • namespace (str) – f your pipeline belongs to a Basic edition instance, the namespace ID is always default. If your pipeline belongs to an Enterprise edition instance, you can create a namespace.

static cdap_program_type(pipeline_type)[source]

Retrieve CDAP Program type depending on the pipeline type.

Parameters:

pipeline_type (airflow.providers.google.cloud.utils.datafusion.DataFusionPipelineType) – Pipeline type.

static cdap_program_id(pipeline_type)[source]

Retrieve CDAP Program id depending on the pipeline type.

Parameters:

pipeline_type (airflow.providers.google.cloud.utils.datafusion.DataFusionPipelineType) – Pipeline type.

class airflow.providers.google.cloud.hooks.datafusion.DataFusionAsyncHook(**kwargs)[source]

Bases: airflow.providers.google.common.hooks.base_google.GoogleBaseAsyncHook

Class to get asynchronous hook for DataFusion.

sync_hook_class[source]
scopes = ['https://www.googleapis.com/auth/cloud-platform'][source]
async get_pipeline(instance_url, namespace, pipeline_name, pipeline_id, session, pipeline_type=DataFusionPipelineType.BATCH)[source]
async get_pipeline_status(pipeline_name, instance_url, pipeline_id, pipeline_type=DataFusionPipelineType.BATCH, namespace='default', success_states=None)[source]

Get a Cloud Data Fusion pipeline status asynchronously.

Parameters:
  • pipeline_name (str) – Your pipeline name.

  • instance_url (str) – Endpoint on which the REST APIs is accessible for the instance.

  • pipeline_id (str) – Unique pipeline ID associated with specific pipeline.

  • pipeline_type (airflow.providers.google.cloud.utils.datafusion.DataFusionPipelineType) – Optional pipeline type (by default batch).

  • namespace (str) – if your pipeline belongs to a Basic edition instance, the namespace ID is always default. If your pipeline belongs to an Enterprise edition instance, you can create a namespace.

  • success_states (list[str] | None) – If provided the operator will wait for pipeline to be in one of the provided states.

Was this entry helpful?