airflow.providers.google.cloud.sensors.dataplex

This module contains Google Dataplex sensors.

Module Contents

Classes

TaskState

Dataplex Task states.

DataplexTaskStateSensor

Check the status of the Dataplex task.

DataplexDataQualityJobStatusSensor

Check the status of the Dataplex DataQuality job.

DataplexDataProfileJobStatusSensor

Check the status of the Dataplex DataProfile job.

class airflow.providers.google.cloud.sensors.dataplex.TaskState[source]

Dataplex Task states.

STATE_UNSPECIFIED = 0[source]
ACTIVE = 1[source]
CREATING = 2[source]
DELETING = 3[source]
ACTION_REQUIRED = 4[source]
class airflow.providers.google.cloud.sensors.dataplex.DataplexTaskStateSensor(project_id, region, lake_id, dataplex_task_id, api_version='v1', retry=DEFAULT, metadata=(), gcp_conn_id='google_cloud_default', impersonation_chain=None, *args, **kwargs)[source]

Bases: airflow.sensors.base.BaseSensorOperator

Check the status of the Dataplex task.

Parameters
  • project_id (str) – Required. The ID of the Google Cloud project that the task belongs to.

  • region (str) – Required. The ID of the Google Cloud region that the task belongs to.

  • lake_id (str) – Required. The ID of the Google Cloud lake that the task belongs to.

  • dataplex_task_id (str) – Required. Task identifier.

  • api_version (str) – The version of the api that will be requested for example ‘v3’.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – A retry object used to retry requests. If None is specified, requests will not be retried.

  • metadata (collections.abc.Sequence[tuple[str, str]]) – Additional metadata that is provided to the method.

  • gcp_conn_id (str) – The connection ID to use when fetching connection info.

  • impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

template_fields = ['dataplex_task_id'][source]
poke(context)[source]

Override when deriving this class.

class airflow.providers.google.cloud.sensors.dataplex.DataplexDataQualityJobStatusSensor(project_id, region, data_scan_id, job_id, api_version='v1', retry=DEFAULT, metadata=(), gcp_conn_id='google_cloud_default', impersonation_chain=None, fail_on_dq_failure=False, result_timeout=60.0 * 10, start_sensor_time=None, *args, **kwargs)[source]

Bases: airflow.sensors.base.BaseSensorOperator

Check the status of the Dataplex DataQuality job.

Parameters
  • project_id (str) – Required. The ID of the Google Cloud project that the task belongs to.

  • region (str) – Required. The ID of the Google Cloud region that the task belongs to.

  • data_scan_id (str) – Required. Data Quality scan identifier.

  • job_id (str) – Required. Job ID.

  • api_version (str) – The version of the api that will be requested for example ‘v3’.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – A retry object used to retry requests. If None is specified, requests will not be retried.

  • metadata (collections.abc.Sequence[tuple[str, str]]) – Additional metadata that is provided to the method.

  • gcp_conn_id (str) – The connection ID to use when fetching connection info.

  • impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

  • result_timeout (float) – Value in seconds for which operator will wait for the Data Quality scan result. Throws exception if there is no result found after specified amount of seconds.

  • fail_on_dq_failure (bool) – If set to true and not all Data Quality scan rules have been passed, an exception is thrown. If set to false and not all Data Quality scan rules have been passed, execution will finish with success.

Returns

Boolean indicating if the job run has reached the DataScanJob.State.SUCCEEDED.

template_fields = ['job_id'][source]
poke(context)[source]

Override when deriving this class.

class airflow.providers.google.cloud.sensors.dataplex.DataplexDataProfileJobStatusSensor(project_id, region, data_scan_id, job_id, api_version='v1', retry=DEFAULT, metadata=(), gcp_conn_id='google_cloud_default', impersonation_chain=None, result_timeout=60.0 * 10, start_sensor_time=None, *args, **kwargs)[source]

Bases: airflow.sensors.base.BaseSensorOperator

Check the status of the Dataplex DataProfile job.

Parameters
  • project_id (str) – Required. The ID of the Google Cloud project that the task belongs to.

  • region (str) – Required. The ID of the Google Cloud region that the task belongs to.

  • data_scan_id (str) – Required. Data Quality scan identifier.

  • job_id (str) – Required. Job ID.

  • api_version (str) – The version of the api that will be requested for example ‘v3’.

  • retry (google.api_core.retry.Retry | google.api_core.gapic_v1.method._MethodDefault) – A retry object used to retry requests. If None is specified, requests will not be retried.

  • metadata (collections.abc.Sequence[tuple[str, str]]) – Additional metadata that is provided to the method.

  • gcp_conn_id (str) – The connection ID to use when fetching connection info.

  • impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

  • result_timeout (float) – Value in seconds for which operator will wait for the Data Quality scan result. Throws exception if there is no result found after specified amount of seconds.

Returns

Boolean indicating if the job run has reached the DataScanJob.State.SUCCEEDED.

template_fields = ['job_id'][source]
poke(context)[source]

Override when deriving this class.

Was this entry helpful?