airflow.providers.microsoft.azure.transfers.gcs_to_wasb

This module contains Google Cloud Storage to Azure Blob Storage operator.

Classes

GCSToAzureBlobStorageOperator

Synchronizes objects from a Google Cloud Storage bucket to Azure Blob Storage.

Module Contents

class airflow.providers.microsoft.azure.transfers.gcs_to_wasb.GCSToAzureBlobStorageOperator(*, gcs_bucket, container_name, blob_prefix='', prefix=None, gcp_conn_id='google_cloud_default', google_impersonation_chain=None, wasb_conn_id='wasb_default', replace=False, keep_directory_structure=True, flatten_structure=False, match_glob=None, gcp_user_project=None, create_container=False, **kwargs)[source]

Bases: airflow.providers.common.compat.sdk.BaseOperator

Synchronizes objects from a Google Cloud Storage bucket to Azure Blob Storage.

Note

When flatten_structure=True, it takes precedence over keep_directory_structure. For example, with flatten_structure=True, folder/subfolder/file.txt becomes file.txt regardless of the keep_directory_structure setting.

Objects whose names end with / (GCS console folder markers) and keys that become an empty destination path after flatten_structure are skipped.

See also

For more information on how to use this operator, take a look at the guide: Operator

Parameters:
  • gcs_bucket (str) – The GCS bucket to list objects from. (templated)

  • prefix (str | None) – Prefix to filter object names under the bucket. (templated)

  • gcp_conn_id (str) – Airflow connection ID for Google Cloud.

  • google_impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional impersonation chain for GCP credentials.

  • gcp_user_project (str | None) – Requester-pays billing project for GCS requests, if required.

  • match_glob (str | None) – Optional glob filter for object names (requires apache-airflow-providers-google>=10.3.0).

  • container_name (str) – Azure Blob container to upload into. (templated)

  • blob_prefix (str) – Base blob path for uploaded objects. (templated)

  • wasb_conn_id (str) – Airflow connection ID for Azure Blob Storage.

  • replace (bool) – If True, overwrite existing blobs (overwrite=True on upload) and upload all listed objects. If False, skip objects that already exist under blob_prefix with the same relative path and pass overwrite=False on upload.

  • keep_directory_structure (bool) – When False and prefix is set (and flatten_structure is False), append prefix to blob_prefix.

  • flatten_structure (bool) – If True, upload each object using only its file name under blob_prefix. Takes precedence over keep_directory_structure.

  • create_container (bool) – If True, create the container when missing before upload.

template_fields: collections.abc.Sequence[str] = ('gcs_bucket', 'prefix', 'blob_prefix', 'container_name', 'google_impersonation_chain',...[source]
ui_color = '#f0eee4'[source]
gcs_bucket[source]
prefix = None[source]
gcp_conn_id = 'google_cloud_default'[source]
google_impersonation_chain = None[source]
container_name[source]
blob_prefix = ''[source]
wasb_conn_id = 'wasb_default'[source]
replace = False[source]
keep_directory_structure = True[source]
flatten_structure = False[source]
gcp_user_project = None[source]
create_container = False[source]
match_glob = None[source]
execute(context)[source]

Derive when creating an operator.

The main method to execute the task. Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

get_openlineage_facets_on_start()[source]

Was this entry helpful?