airflow.providers.amazon.aws.hooks.glue_crawler

Classes

GlueCrawlerHook

Interacts with AWS Glue Crawler.

Module Contents

class airflow.providers.amazon.aws.hooks.glue_crawler.GlueCrawlerHook(*args, **kwargs)[source]

Bases: airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook

Interacts with AWS Glue Crawler.

Provide thin wrapper around boto3.client("glue").

Additional arguments (such as aws_conn_id) may be specified and are passed down to the underlying AwsBaseHook.

property glue_client[source]
Returns:

AWS Glue client

has_crawler(crawler_name)[source]

Check if the crawler already exists.

Parameters:

crawler_name – unique crawler name per AWS account

Returns:

Returns True if the crawler already exists and False if not.

Return type:

bool

get_crawler(crawler_name)[source]

Get crawler configurations.

Parameters:

crawler_name (str) – unique crawler name per AWS account

Returns:

Nested dictionary of crawler configurations

Return type:

dict

update_crawler(**crawler_kwargs)[source]

Update crawler configurations.

Parameters:

crawler_kwargs – Keyword args that define the configurations used for the crawler

Returns:

True if crawler was updated and false otherwise

Return type:

bool

update_tags(crawler_name, crawler_tags)[source]

Update crawler tags.

Parameters:
  • crawler_name (str) – Name of the crawler for which to update tags

  • crawler_tags (dict) – Dictionary of new tags. If empty, all tags will be deleted

Returns:

True if tags were updated and false otherwise

Return type:

bool

create_crawler(**crawler_kwargs)[source]

Create an AWS Glue Crawler.

Parameters:

crawler_kwargs – Keyword args that define the configurations used to create the crawler

Returns:

Name of the crawler

Return type:

str

start_crawler(crawler_name)[source]

Triggers the AWS Glue Crawler.

Parameters:

crawler_name (str) – unique crawler name per AWS account

Returns:

Empty dictionary

Return type:

dict

wait_for_crawler_completion(crawler_name, poll_interval=5)[source]

Wait until Glue crawler completes; returns the status of the latest crawl or raises AirflowException.

Parameters:
  • crawler_name (str) – unique crawler name per AWS account

  • poll_interval (int) – Time (in seconds) to wait between two consecutive calls to check crawler status

Returns:

Crawler’s status

Return type:

str

Was this entry helpful?