Airflow Summit 2025 is coming October 07-09. Register now for early bird ticket!

Amazon S3 to SQL

Use the S3ToSqlOperator transfer to copy data from an Amazon Simple Storage Service (S3) file into an existing SQL table. By providing a parser function which is applied to the downloaded file, this operator can accept a variety of file formats.

Prerequisite Tasks

To use these operators, you must do a few things:

Operators

Amazon S3 To SQL Transfer Operator

To get more information about this operator visit: S3ToSqlOperator

Example usage with a parser for a csv file. This parser loads the file into memory and returns a list of rows:

tests/system/amazon/aws/example_s3_to_sql.py[source]

#
# This operator requires a parser method. The Parser should take a filename as input
# and return an iterable of rows.
# This example parser uses the builtin csv library and returns a list of rows
#
def parse_csv_to_list(filepath):
    import csv

    with open(filepath, newline="") as file:
        return list(csv.reader(file))

transfer_s3_to_sql = S3ToSqlOperator(
    task_id="transfer_s3_to_sql",
    s3_bucket=s3_bucket_name,
    s3_key=s3_key,
    table=SQL_TABLE_NAME,
    column_list=SQL_COLUMN_LIST,
    parser=parse_csv_to_list,
    sql_conn_id=conn_id_name,
)

Example usage with a parser function that returns a generator.

tests/system/amazon/aws/example_s3_to_sql.py[source]

#
# As the parser can return any kind of iterator, a generator is also allowed.
# This example parser returns a generator which prevents python from loading
# the whole file into memory.
#

def parse_csv_to_generator(filepath):
    import csv

    with open(filepath, newline="") as file:
        yield from csv.reader(file)

transfer_s3_to_sql_generator = S3ToSqlOperator(
    task_id="transfer_s3_to_sql_paser_to_generator",
    s3_bucket=s3_bucket_name,
    s3_key=s3_key,
    table=SQL_TABLE_NAME,
    column_list=SQL_COLUMN_LIST,
    parser=parse_csv_to_generator,
    sql_conn_id=conn_id_name,
)

Reference

Was this entry helpful?