airflow.providers.amazon.aws.transfers.mongo_to_s3
¶
Module Contents¶
Classes¶
Move data from MongoDB to S3. |
- class airflow.providers.amazon.aws.transfers.mongo_to_s3.MongoToS3Operator(*, mongo_conn_id='mongo_default', aws_conn_id='aws_default', mongo_collection, mongo_query, s3_bucket, s3_key, mongo_db=None, mongo_projection=None, replace=False, allow_disk_use=False, compression=None, **kwargs)[source]¶
Bases:
airflow.models.BaseOperator
Move data from MongoDB to S3.
See also
For more information on how to use this operator, take a look at the guide: MongoDB To Amazon S3 transfer operator
- Parameters
mongo_conn_id (str) – reference to a specific mongo connection
aws_conn_id (str | None) – reference to a specific S3 connection
mongo_collection (str) – reference to a specific collection in your mongo db
mongo_query (list | dict) – query to execute. A list including a dict of the query
mongo_projection (list | dict | None) – optional parameter to filter the returned fields by the query. It can be a list of fields names to include or a dictionary for excluding fields (e.g
projection={"_id": 0}
)s3_bucket (str) – reference to a specific S3 bucket to store the data
s3_key (str) – in which S3 key the file will be stored
mongo_db (str | None) – reference to a specific mongo database
replace (bool) – whether or not to replace the file in S3 if it previously existed
allow_disk_use (bool) – enables writing to temporary files in the case you are handling large dataset. This only takes effect when mongo_query is a list - running an aggregate pipeline
compression (str | None) – type of compression to use for output file in S3. Currently only gzip is supported.
- template_fields: collections.abc.Sequence[str] = ('s3_bucket', 's3_key', 'mongo_query', 'mongo_collection')[source]¶
- static transform(docs)[source]¶
Transform the data for transfer.
This method is meant to be extended by child classes to perform transformations unique to those operators needs. Processes pyMongo cursor and returns an iterable with each element being a JSON serializable dictionary
The default implementation assumes no processing is needed, i.e. input is a pyMongo cursor of documents and just needs to be passed through.
Override this method for custom transformations.