Google Cloud Bigtable Operators¶
Prerequisite Tasks¶
To use these operators, you must do a few things:
Select or create a Cloud Platform project using the Cloud Console.
Enable billing for your project, as described in the Google Cloud documentation.
Enable the API, as described in the Cloud Console documentation.
Install API libraries via pip.
pip install 'apache-airflow[google]'Detailed information is available for Installation.
BigtableCreateInstanceOperator¶
Use the BigtableCreateInstanceOperator
to create a Google Cloud Bigtable instance.
This operator provisions a Bigtable instance along with one or more clusters. It is typically used during environment setup or infrastructure provisioning before running tasks that depend on Bigtable.
If the Cloud Bigtable instance with the given ID exists, the operator does not compare its configuration and immediately succeeds. No changes are made to the existing instance.
Using the operator¶
You can create the operator with or without project id. If project id is missing it will be retrieved from the Google Cloud connection used. Both variants are shown:
create_instance_task = BigtableCreateInstanceOperator(
project_id=PROJECT_ID,
instance_id=CBT_INSTANCE_ID_1,
main_cluster_id=CBT_CLUSTER_ID,
main_cluster_zone=CBT_CLUSTER_ZONE,
instance_display_name=CBT_INSTANCE_DISPLAY_NAME,
instance_type=CBT_INSTANCE_TYPE, # type: ignore[arg-type]
instance_labels=CBT_INSTANCE_LABELS,
cluster_nodes=None,
cluster_storage_type=CBT_CLUSTER_STORAGE_TYPE, # type: ignore[arg-type]
task_id="create_instance_task",
)
BigtableUpdateInstanceOperator¶
Use the BigtableUpdateInstanceOperator
to update an existing Google Cloud Bigtable instance.
This operator allows modifying instance properties such as display name, instance type, and labels without recreating the instance. It is useful for configuration updates while keeping the existing data and clusters intact.
Only the following configuration can be updated for an existing instance: instance_display_name, instance_type and instance_labels.
Using the operator¶
You can create the operator with or without project id. If project id is missing it will be retrieved from the Google Cloud connection used. Both variants are shown:
update_instance_task = BigtableUpdateInstanceOperator(
instance_id=CBT_INSTANCE_ID_1,
instance_display_name=CBT_INSTANCE_DISPLAY_NAME_UPDATED,
instance_type=CBT_INSTANCE_TYPE_PROD,
instance_labels=CBT_INSTANCE_LABELS_UPDATED,
task_id="update_instance_task",
)
BigtableDeleteInstanceOperator¶
Use the BigtableDeleteInstanceOperator
to delete a Google Cloud Bigtable instance.
This operator permanently removes a Bigtable instance and all associated clusters and tables. Use it carefully, typically during cleanup or infrastructure teardown tasks.
Using the operator¶
You can create the operator with or without project id. If project id is missing it will be retrieved from the Google Cloud connection used. Both variants are shown:
delete_instance_task = BigtableDeleteInstanceOperator(
project_id=PROJECT_ID,
instance_id=CBT_INSTANCE_ID_1,
task_id="delete_instance_task",
)
BigtableUpdateClusterOperator¶
Use the BigtableUpdateClusterOperator
to modify number of nodes in a Cloud Bigtable cluster.
This operator updates the size of an existing cluster by increasing or decreasing the number of nodes. It helps scale Bigtable capacity up or down based on workload requirements.
Using the operator¶
You can create the operator with or without project id. If project id is missing it will be retrieved from the Google Cloud connection used. Both variants are shown:
cluster_update_task = BigtableUpdateClusterOperator(
project_id=PROJECT_ID,
instance_id=CBT_INSTANCE_ID_1,
cluster_id=CBT_CLUSTER_ID,
nodes=CBT_CLUSTER_NODES_UPDATED,
task_id="update_cluster_task",
)
BigtableCreateTableOperator¶
Use the BigtableCreateTableOperator
to create a table in a Cloud Bigtable instance.
This operator creates a new table with specified column families and optional split keys. It is typically used when initializing schema or preparing storage for application data.
If the table with given ID exists in the Cloud Bigtable instance, the operator compares the Column Families. If the Column Families are identical, the operator succeeds. Otherwise, the operator fails with the appropriate error message.
Using the operator¶
You can create the operator with or without project id. If project id is missing it will be retrieved from the Google Cloud connection used. Both variants are shown:
create_table_task = BigtableCreateTableOperator(
project_id=PROJECT_ID,
instance_id=CBT_INSTANCE_ID_1,
table_id=CBT_TABLE_ID,
task_id="create_table",
)
Advanced¶
When creating a table, you can specify the optional initial_split_keys and column_families.
Please refer to the Python Client for Google Cloud Bigtable documentation
for Table and for Column
Families.
BigtableDeleteTableOperator¶
Use the BigtableDeleteTableOperator
to delete a table in Google Cloud Bigtable.
This operator removes a table from an instance. It is commonly used for cleanup tasks or when decommissioning unused datasets.
Using the operator¶
You can create the operator with or without project id. If project id is missing it will be retrieved from the Google Cloud connection used. Both variants are shown:
delete_table_task = BigtableDeleteTableOperator(
project_id=PROJECT_ID,
instance_id=CBT_INSTANCE_ID_1,
table_id=CBT_TABLE_ID,
task_id="delete_table_task",
)
BigtableTableReplicationCompletedSensor¶
You can create the operator with or without project id. If project id is missing it will be retrieved from the Google Cloud connection used. Both variants are shown:
Use the BigtableTableReplicationCompletedSensor
to wait for the table to replicate fully.
This sensor periodically checks the replication status and blocks execution until replication is complete. It is useful in workflows that depend on data being fully available across clusters.
The same arguments apply to this sensor as the BigtableCreateTableOperator.
Note: If the table or the Cloud Bigtable instance does not exist, this sensor waits for the table until timeout hits and does not raise any exception.
Using the operator¶
wait_for_table_replication_task = BigtableTableReplicationCompletedSensor(
instance_id=CBT_INSTANCE_ID_2,
table_id=CBT_TABLE_ID,
poke_interval=CBT_POKE_INTERVAL,
timeout=180,
task_id="wait_for_table_replication_task2",
)
Reference¶
For further information, look at: