AWS Glue Data Catalog¶
The AWS Glue Data Catalog is a centralized metadata repository for data assets. Use the operators below to manage Glue Data Catalog resources.
Create a Catalog Database¶
To create a database in the AWS Glue Data Catalog, use
GlueCatalogCreateDatabaseOperator.
create_database = GlueCatalogCreateDatabaseOperator(
task_id="create_database",
database_name=db_name,
description="Test database for Glue Catalog",
)
Reference¶
Create a Table¶
To create a table in an AWS Glue Data Catalog database, use
GlueCatalogCreateTableOperator.
create_table = GlueCatalogCreateTableOperator(
task_id="create_table",
database_name=db_name,
table_name=table_name,
table_input=table_input,
)
Delete a Catalog Database¶
To delete a database from the AWS Glue Data Catalog, use
GlueCatalogDeleteDatabaseOperator.
delete_database = GlueCatalogDeleteDatabaseOperator(
task_id="delete_database",
database_name=db_name,
trigger_rule=TriggerRule.ALL_DONE,
)
Delete a Table¶
To delete a table from an AWS Glue Data Catalog database, use
GlueCatalogDeleteTableOperator.
delete_table = GlueCatalogDeleteTableOperator(
task_id="delete_table",
database_name=db_name,
table_name=table_name,
trigger_rule=TriggerRule.ALL_DONE,
)
Create a Partition¶
To create a partition in an AWS Glue Data Catalog table, use
GlueCatalogCreatePartitionOperator.
create_partition = GlueCatalogCreatePartitionOperator(
task_id="create_partition",
database_name=db_name,
table_name=table_name,
partition_input={
"Values": ["2024-01-01"],
"StorageDescriptor": {
"Columns": [{"Name": "id", "Type": "int"}],
"Location": "s3://test-bucket/dt=2024-01-01/",
"InputFormat": "org.apache.hadoop.mapred.TextInputFormat",
"OutputFormat": "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
"SerdeInfo": {"SerializationLibrary": "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"},
},
},
)