airflow.providers.amazon.aws.transfers.s3_to_dynamodb

Module Contents

Classes

AttributeDefinition

Attribute Definition Type.

KeySchema

Key Schema Type.

S3ToDynamoDBOperator

Load Data from S3 into a DynamoDB.

class airflow.providers.amazon.aws.transfers.s3_to_dynamodb.AttributeDefinition[source]

Bases: TypedDict

Attribute Definition Type.

AttributeName: str[source]
AttributeType: Literal[S, N, B][source]
class airflow.providers.amazon.aws.transfers.s3_to_dynamodb.KeySchema[source]

Bases: TypedDict

Key Schema Type.

AttributeName: str[source]
KeyType: Literal[HASH, RANGE][source]
class airflow.providers.amazon.aws.transfers.s3_to_dynamodb.S3ToDynamoDBOperator(*, s3_bucket, s3_key, dynamodb_table_name, dynamodb_key_schema, dynamodb_attributes=None, dynamodb_tmp_table_prefix='tmp', delete_on_error=False, use_existing_table=False, input_format='DYNAMODB_JSON', billing_mode='PAY_PER_REQUEST', import_table_kwargs=None, import_table_creation_kwargs=None, wait_for_completion=True, check_interval=30, max_attempts=240, aws_conn_id='aws_default', **kwargs)[source]

Bases: airflow.models.BaseOperator

Load Data from S3 into a DynamoDB.

Data stored in S3 can be uploaded to a new or existing DynamoDB. Supported file formats CSV, DynamoDB JSON and Amazon ION.

Parameters
  • s3_bucket (str) – The S3 bucket that is imported

  • s3_key (str) – Key prefix that imports single or multiple objects from S3

  • dynamodb_table_name (str) – Name of the table that shall be created

  • dynamodb_key_schema (list[KeySchema]) – Primary key and sort key. Each element represents one primary key attribute. AttributeName is the name of the attribute. KeyType is the role for the attribute. Valid values HASH or RANGE

  • dynamodb_attributes (list[AttributeDefinition] | None) – Name of the attributes of a table. AttributeName is the name for the attribute AttributeType is the data type for the attribute. Valid values for AttributeType are S - attribute is of type String N - attribute is of type Number B - attribute is of type Binary

  • dynamodb_tmp_table_prefix (str) – Prefix for the temporary DynamoDB table

  • delete_on_error (bool) – If set, the new DynamoDB table will be deleted in case of import errors

  • use_existing_table (bool) – Whether to import to an existing non new DynamoDB table. If set to true data is loaded first into a temporary DynamoDB table (using the AWS ImportTable Service), then retrieved as chunks into memory and loaded into the target table. If set to false, a new DynamoDB table will be created and S3 data is bulk loaded by the AWS ImportTable Service.

  • input_format (Literal[CSV, DYNAMODB_JSON, ION]) – The format for the imported data. Valid values for InputFormat are CSV, DYNAMODB_JSON or ION

  • billing_mode (Literal[PROVISIONED, PAY_PER_REQUEST]) – Billing mode for the table. Valid values are PROVISIONED or PAY_PER_REQUEST

  • on_demand_throughput – Extra options for maximum number of read and write units

  • import_table_kwargs (dict[str, Any] | None) – Any additional optional import table parameters to pass, such as ClientToken, InputCompressionType, or InputFormatOptions. See: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb/client/import_table.html

  • import_table_creation_kwargs (dict[str, Any] | None) – Any additional optional import table creation parameters to pass, such as ProvisionedThroughput, SSESpecification, or GlobalSecondaryIndexes. See: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb/client/import_table.html

  • wait_for_completion (bool) – Whether to wait for cluster to stop

  • check_interval (int) – Time in seconds to wait between status checks

  • max_attempts (int) – Maximum number of attempts to check for job completion

  • aws_conn_id (str | None) – The reference to the AWS connection details

property tmp_table_name[source]

Temporary table name.

template_fields: collections.abc.Sequence[str] = ('s3_bucket', 's3_key', 'dynamodb_table_name', 'dynamodb_key_schema', 'dynamodb_attributes',...[source]
ui_color = '#e2e8f0'[source]
execute(context)[source]

Execute S3 to DynamoDB Job from Airflow.

Parameters

context (airflow.utils.context.Context) – The current context of the task instance

Returns

The Amazon resource number (ARN)

Return type

str

Was this entry helpful?