Google Cloud Storage to Azure Blob Storage transfer¶
Google Cloud Storage and Azure Blob Storage are object stores commonly used for data lakes and file exchange. This guide describes copying objects from GCS into an Azure Blob container.
Install the optional dependency when using this operator:
pip install 'apache-airflow-providers-microsoft-azure[google]'
Prerequisite Tasks¶
To use these operators, you must do a few things:
Create necessary resources using AZURE PORTAL or AZURE CLI.
Install API libraries via pip.
pip install 'apache-airflow[azure]'Detailed information is available Installation of Airflow®
Operator¶
Use GCSToAzureBlobStorageOperator
to list objects under a GCS prefix and upload them to a container using blob_prefix as the base path.
Use keep_directory_structure and flatten_structure the same way as
GCSToS3Operator (flatten_structure wins when both apply).
Object keys ending with / (GCS console folder markers) are not copied.
Example:
copy_gcs_to_azure = GCSToAzureBlobStorageOperator(
task_id="gcs_to_azure_blob",
gcs_bucket="my-gcs-bucket",
prefix="exports/daily/",
container_name="my-container",
blob_prefix="imports/daily",
gcp_conn_id="google_cloud_default",
wasb_conn_id="wasb_default",
replace=True,
)