airflow.providers.google.cloud.sensors.bigquery

This module contains Google BigQuery sensors.

Classes

BigQueryTableExistenceSensor

Checks for the existence of a table in Google Bigquery.

BigQueryRoutineExistenceSensor

Checks for the existence of a routine (UDF, procedure, or TVF) in a BigQuery dataset.

BigQueryTablePartitionExistenceSensor

Checks for the existence of a partition within a table in Google Bigquery.

BigQueryStreamingBufferEmptySensor

Wait for the streaming buffer of a BigQuery table to be empty.

Module Contents

class airflow.providers.google.cloud.sensors.bigquery.BigQueryTableExistenceSensor(*, project_id, dataset_id, table_id, gcp_conn_id='google_cloud_default', impersonation_chain=None, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), **kwargs)[source]

Bases: airflow.providers.common.compat.sdk.BaseSensorOperator

Checks for the existence of a table in Google Bigquery.

Parameters:
  • project_id (str) – The Google cloud project in which to look for the table. The connection supplied to the hook must provide access to the specified project.

  • dataset_id (str) – The name of the dataset in which to look for the table. storage bucket.

  • table_id (str) – The name of the table to check the existence of.

  • gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud.

  • impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

template_fields: collections.abc.Sequence[str] = ('project_id', 'dataset_id', 'table_id', 'impersonation_chain')[source]
ui_color = '#f0eee4'[source]
project_id[source]
dataset_id[source]
table_id[source]
gcp_conn_id = 'google_cloud_default'[source]
impersonation_chain = None[source]
deferrable[source]
poke(context)[source]

Override when deriving this class.

execute(context)[source]

Airflow runs this method on the worker and defers using the trigger.

execute_complete(context, event=None)[source]

Act as a callback for when the trigger fires - returns immediately.

Relies on trigger to throw an exception, otherwise it assumes execution was successful.

class airflow.providers.google.cloud.sensors.bigquery.BigQueryRoutineExistenceSensor(*, project_id, dataset_id, routine_id, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]

Bases: airflow.providers.common.compat.sdk.BaseSensorOperator

Checks for the existence of a routine (UDF, procedure, or TVF) in a BigQuery dataset.

Parameters:
  • project_id (str) – The Google Cloud project that owns the dataset.

  • dataset_id (str) – The dataset that owns the routine.

  • routine_id (str) – The identifier of the routine to check.

  • gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud.

  • impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

template_fields: collections.abc.Sequence[str] = ('project_id', 'dataset_id', 'routine_id', 'impersonation_chain')[source]
ui_color = '#f0eee4'[source]
project_id[source]
dataset_id[source]
routine_id[source]
gcp_conn_id = 'google_cloud_default'[source]
impersonation_chain = None[source]
poke(context)[source]

Override when deriving this class.

class airflow.providers.google.cloud.sensors.bigquery.BigQueryTablePartitionExistenceSensor(*, project_id, dataset_id, table_id, partition_id, gcp_conn_id='google_cloud_default', impersonation_chain=None, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), **kwargs)[source]

Bases: airflow.providers.common.compat.sdk.BaseSensorOperator

Checks for the existence of a partition within a table in Google Bigquery.

Parameters:
  • project_id (str) – The Google cloud project in which to look for the table. The connection supplied to the hook must provide access to the specified project.

  • dataset_id (str) – The name of the dataset in which to look for the table. storage bucket.

  • table_id (str) – The name of the table to check the existence of.

  • partition_id (str) – The name of the partition to check the existence of.

  • gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud.

  • impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

template_fields: collections.abc.Sequence[str] = ('project_id', 'dataset_id', 'table_id', 'partition_id', 'impersonation_chain')[source]
ui_color = '#f0eee4'[source]
project_id[source]
dataset_id[source]
table_id[source]
partition_id[source]
gcp_conn_id = 'google_cloud_default'[source]
impersonation_chain = None[source]
deferrable[source]
poke(context)[source]

Override when deriving this class.

execute(context)[source]

Airflow runs this method on the worker and defers using the triggers if deferrable is True.

execute_complete(context, event=None)[source]

Act as a callback for when the trigger fires - returns immediately.

Relies on trigger to throw an exception, otherwise it assumes execution was successful.

class airflow.providers.google.cloud.sensors.bigquery.BigQueryStreamingBufferEmptySensor(*, project_id, dataset_id, table_id, gcp_conn_id='google_cloud_default', impersonation_chain=None, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), **kwargs)[source]

Bases: airflow.providers.common.compat.sdk.BaseSensorOperator

Wait for the streaming buffer of a BigQuery table to be empty.

BigQuery DML statements (UPDATE, DELETE, MERGE) cannot run against rows that are still in the streaming buffer; the buffer is flushed within ~90 minutes. Use this sensor between a streaming insert and a DML step to avoid UPDATE/MERGE/DELETE statement over table ... would affect rows in the streaming buffer errors.

Warning

The sensor reads table.streaming_buffer from BigQuery’s table metadata, which is eventually consistent. For a short window right after a streaming insert the buffer metadata is still absent, so the sensor may report the buffer empty before it actually is. Known limitation tracked at https://github.com/apache/airflow/issues/66963

Parameters:
  • project_id (str) – Google Cloud project containing the table.

  • dataset_id (str) – Dataset of the table to monitor.

  • table_id (str) – Table to monitor.

  • gcp_conn_id (str) – Airflow connection ID for GCP.

  • impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate, or a chained list of accounts. See the Google provider docs for details.

  • deferrable (bool) – Run in deferrable mode using BigQueryStreamingBufferEmptyTrigger.

template_fields: collections.abc.Sequence[str] = ('project_id', 'dataset_id', 'table_id', 'impersonation_chain')[source]
ui_color = '#f0eee4'[source]
project_id[source]
dataset_id[source]
table_id[source]
gcp_conn_id = 'google_cloud_default'[source]
impersonation_chain = None[source]
deferrable[source]
execute(context)[source]

Derive when creating an operator.

The main method to execute the task. Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

execute_complete(context, event=None)[source]
poke(context)[source]

Override when deriving this class.

Was this entry helpful?