airflow.providers.amazon.aws.operators.glue_databrew
¶
Module Contents¶
Classes¶
Start an AWS Glue DataBrew job. |
- class airflow.providers.amazon.aws.operators.glue_databrew.GlueDataBrewStartJobOperator(job_name, wait_for_completion=True, delay=None, waiter_delay=30, waiter_max_attempts=60, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), **kwargs)[source]¶
Bases:
airflow.providers.amazon.aws.operators.base_aws.AwsBaseOperator
[airflow.providers.amazon.aws.hooks.glue_databrew.GlueDataBrewHook
]Start an AWS Glue DataBrew job.
AWS Glue DataBrew is a visual data preparation tool that makes it easier for data analysts and data scientists to clean and normalize data to prepare it for analytics and machine learning (ML).
See also
For more information on how to use this operator, take a look at the guide: Start an AWS Glue DataBrew job
- Parameters
job_name (str) – unique job name per AWS Account
wait_for_completion (bool) – Whether to wait for job run completion. (default: True)
deferrable (bool) – If True, the operator will wait asynchronously for the job to complete. This implies waiting for completion. This mode requires aiobotocore module to be installed. (default: False)
waiter_delay (int) – Time in seconds to wait between status checks. Default is 30.
waiter_max_attempts (int) – Maximum number of attempts to check for job completion. (default: 60)
aws_conn_id – The Airflow connection used for AWS credentials. If this is
None
or empty then the default boto3 behaviour is used. If running Airflow in a distributed manner and aws_conn_id is None or empty, then default boto3 configuration would be used (and must be maintained on each worker node).region_name – AWS region_name. If not specified then the default boto3 behaviour is used.
verify – Whether or not to verify SSL certificates. See: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html
botocore_config – Configuration dictionary (key-values) for botocore client. See: https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html
- Returns
dictionary with key run_id and value of the resulting job’s run_id.
- template_fields: collections.abc.Sequence[str][source]¶