airflow.providers.samba.transfers.gcs_to_samba

This module contains Google Cloud Storage to Samba operator.

Module Contents

Classes

GCSToSambaOperator

Transfer files from a Google Cloud Storage bucket to SMB server.

Attributes

WILDCARD

airflow.providers.samba.transfers.gcs_to_samba.WILDCARD = '*'[source]
class airflow.providers.samba.transfers.gcs_to_samba.GCSToSambaOperator(*, source_bucket, source_object, destination_path, keep_directory_structure=True, move_object=False, gcp_conn_id='google_cloud_default', samba_conn_id='samba_default', impersonation_chain=None, buffer_size=None, **kwargs)[source]

Bases: airflow.models.BaseOperator

Transfer files from a Google Cloud Storage bucket to SMB server.

with models.DAG(
    "example_gcs_to_smb",
    start_date=datetime(2020, 6, 19),
    schedule=None,
) as dag:
    # downloads file to media/folder/subfolder/file.txt
    copy_file_from_gcs_to_smb = GCSToSambaOperator(
        task_id="file-copy-gcs-to-smb",
        source_bucket="test-gcs-sftp-bucket-name",
        source_object="folder/subfolder/file.txt",
        destination_path="media",
    )

    # moves file to media/data.txt
    move_file_from_gcs_to_smb = GCSToSambaOperator(
        task_id="file-move-gcs-to-smb",
        source_bucket="test-gcs-sftp-bucket-name",
        source_object="folder/subfolder/data.txt",
        destination_path="media",
        move_object=True,
        keep_directory_structure=False,
    )

See also

For more information on how to use this operator, take a look at the guide: Operator

Parameters
  • source_bucket (str) – The source Google Cloud Storage bucket where the object is. (templated)

  • source_object (str) – The source name of the object to copy in the Google cloud storage bucket. (templated) You can use only one wildcard for objects (filenames) within your bucket. The wildcard can appear inside the object name or at the end of the object name. Appending a wildcard to the bucket name is unsupported.

  • destination_path (str) – The SMB remote path. This is the specified directory path in the SMB share name for uploading files to the SMB server.

  • keep_directory_structure (bool) – (Optional) When set to False the path of the file on the bucket is recreated within path passed in destination_path.

  • move_object (bool) – When move object is True, the object is moved instead of copied to the new location. This is the equivalent of a mv command as opposed to a cp command.

  • gcp_conn_id (str) – (Optional) The connection ID used to connect to Google Cloud.

  • samba_conn_id (str) – The SMB connection id. The name or identifier for establishing a connection to the SMB server.

  • impersonation_chain (str | collections.abc.Sequence[str] | None) – Optional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

  • buffer_size (int | None) – Optional specification of the size in bytes of the chunks sent to Samba. Larger buffer lengths may decrease the time to upload large files. The default length is determined by shutil, which is 64 KB.

template_fields: collections.abc.Sequence[str] = ('source_bucket', 'source_object', 'destination_path', 'impersonation_chain')[source]
ui_color = '#f0eee4'[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

Was this entry helpful?