AWS Glue Data Catalog¶

The AWS Glue Data Catalog is a centralized metadata repository for data assets. Use the operators below to manage Glue Data Catalog resources.

Create a Catalog Database¶

To create a database in the AWS Glue Data Catalog, use GlueCatalogCreateDatabaseOperator.

tests/system/amazon/aws/example_glue_catalog.py[source]

create_database = GlueCatalogCreateDatabaseOperator(
    task_id="create_database",
    database_name=db_name,
    description="Test database for Glue Catalog",
)

Reference¶

AWS boto3 Library Documentation for Glue

Create a Table¶

To create a table in an AWS Glue Data Catalog database, use GlueCatalogCreateTableOperator.

tests/system/amazon/aws/example_glue_catalog.py[source]

create_table = GlueCatalogCreateTableOperator(
    task_id="create_table",
    database_name=db_name,
    table_name=table_name,
    table_input=table_input,
)

Delete a Catalog Database¶

To delete a database from the AWS Glue Data Catalog, use GlueCatalogDeleteDatabaseOperator.

tests/system/amazon/aws/example_glue_catalog.py[source]

delete_database = GlueCatalogDeleteDatabaseOperator(
    task_id="delete_database",
    database_name=db_name,
    trigger_rule=TriggerRule.ALL_DONE,
)

Delete a Table¶

To delete a table from an AWS Glue Data Catalog database, use GlueCatalogDeleteTableOperator.

tests/system/amazon/aws/example_glue_catalog.py[source]

delete_table = GlueCatalogDeleteTableOperator(
    task_id="delete_table",
    database_name=db_name,
    table_name=table_name,
    trigger_rule=TriggerRule.ALL_DONE,
)

Create a Partition¶

To create a partition in an AWS Glue Data Catalog table, use GlueCatalogCreatePartitionOperator.

tests/system/amazon/aws/example_glue_catalog.py[source]

create_partition = GlueCatalogCreatePartitionOperator(
    task_id="create_partition",
    database_name=db_name,
    table_name=table_name,
    partition_input={
        "Values": ["2024-01-01"],
        "StorageDescriptor": {
            "Columns": [{"Name": "id", "Type": "int"}],
            "Location": "s3://test-bucket/dt=2024-01-01/",
            "InputFormat": "org.apache.hadoop.mapred.TextInputFormat",
            "OutputFormat": "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
            "SerdeInfo": {"SerializationLibrary": "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"},
        },
    },
)

Batch Delete Partitions¶

To delete one or more partitions from an AWS Glue Data Catalog table, use GlueCatalogBatchDeletePartitionOperator.

tests/system/amazon/aws/example_glue_catalog.py[source]

batch_delete_partition = GlueCatalogBatchDeletePartitionOperator(
    task_id="batch_delete_partition",
    database_name=db_name,
    table_name=table_name,
    partitions_to_delete=[{"Values": ["2024-01-01"]}],
    trigger_rule=TriggerRule.ALL_DONE,
)