airflow.providers.google.cloud.sensors.bigquery¶
This module contains Google BigQuery sensors.
Classes¶
Checks for the existence of a table in Google Bigquery. |
|
Checks for the existence of a routine (UDF, procedure, or TVF) in a BigQuery dataset. |
|
Checks for the existence of a partition within a table in Google Bigquery. |
|
Wait for the streaming buffer of a BigQuery table to be empty. |
Module Contents¶
- class airflow.providers.google.cloud.sensors.bigquery.BigQueryTableExistenceSensor(*, project_id, dataset_id, table_id, gcp_conn_id='google_cloud_default', impersonation_chain=None, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), **kwargs)[source]¶
Bases:
airflow.providers.common.compat.sdk.BaseSensorOperatorChecks for the existence of a table in Google Bigquery.
- Parameters:
project_id (str) – The Google cloud project in which to look for the table. The connection supplied to the hook must provide access to the specified project.
dataset_id (str) – The name of the dataset in which to look for the table. storage bucket.
table_id (str) – The name of the table to check the existence of.
gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud.
impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- template_fields: collections.abc.Sequence[str] = ('project_id', 'dataset_id', 'table_id', 'impersonation_chain')[source]¶
- class airflow.providers.google.cloud.sensors.bigquery.BigQueryRoutineExistenceSensor(*, project_id, dataset_id, routine_id, gcp_conn_id='google_cloud_default', impersonation_chain=None, **kwargs)[source]¶
Bases:
airflow.providers.common.compat.sdk.BaseSensorOperatorChecks for the existence of a routine (UDF, procedure, or TVF) in a BigQuery dataset.
- Parameters:
project_id (str) – The Google Cloud project that owns the dataset.
dataset_id (str) – The dataset that owns the routine.
routine_id (str) – The identifier of the routine to check.
gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud.
impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- template_fields: collections.abc.Sequence[str] = ('project_id', 'dataset_id', 'routine_id', 'impersonation_chain')[source]¶
- class airflow.providers.google.cloud.sensors.bigquery.BigQueryTablePartitionExistenceSensor(*, project_id, dataset_id, table_id, partition_id, gcp_conn_id='google_cloud_default', impersonation_chain=None, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), **kwargs)[source]¶
Bases:
airflow.providers.common.compat.sdk.BaseSensorOperatorChecks for the existence of a partition within a table in Google Bigquery.
- Parameters:
project_id (str) – The Google cloud project in which to look for the table. The connection supplied to the hook must provide access to the specified project.
dataset_id (str) – The name of the dataset in which to look for the table. storage bucket.
table_id (str) – The name of the table to check the existence of.
partition_id (str) – The name of the partition to check the existence of.
gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud.
impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
- template_fields: collections.abc.Sequence[str] = ('project_id', 'dataset_id', 'table_id', 'partition_id', 'impersonation_chain')[source]¶
- class airflow.providers.google.cloud.sensors.bigquery.BigQueryStreamingBufferEmptySensor(*, project_id, dataset_id, table_id, gcp_conn_id='google_cloud_default', impersonation_chain=None, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), **kwargs)[source]¶
Bases:
airflow.providers.common.compat.sdk.BaseSensorOperatorWait for the streaming buffer of a BigQuery table to be empty.
BigQuery DML statements (UPDATE, DELETE, MERGE) cannot run against rows that are still in the streaming buffer; the buffer is flushed within ~90 minutes. Use this sensor between a streaming insert and a DML step to avoid
UPDATE/MERGE/DELETE statement over table ... would affect rows in the streaming buffererrors.Warning
The sensor reads
table.streaming_bufferfrom BigQuery’s table metadata, which is eventually consistent. For a short window right after a streaming insert the buffer metadata is still absent, so the sensor may report the buffer empty before it actually is. Known limitation tracked at https://github.com/apache/airflow/issues/66963- Parameters:
project_id (str) – Google Cloud project containing the table.
dataset_id (str) – Dataset of the table to monitor.
table_id (str) – Table to monitor.
gcp_conn_id (str) – Airflow connection ID for GCP.
impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate, or a chained list of accounts. See the Google provider docs for details.
deferrable (bool) – Run in deferrable mode using
BigQueryStreamingBufferEmptyTrigger.
- template_fields: collections.abc.Sequence[str] = ('project_id', 'dataset_id', 'table_id', 'impersonation_chain')[source]¶