Connecting to Iceberg

The Iceberg connection type connects to an Iceberg REST catalog using pyiceberg. The hook provides catalog introspection (list namespaces, list tables, read schemas, inspect partitions and snapshots) and OAuth2 token generation for external engines like Spark, Trino, and Flink.

After installing the Iceberg provider in your Airflow environment, the corresponding connection type of iceberg will be available.

Default Connection IDs

Iceberg Hook uses the parameter iceberg_conn_id for Connection IDs and the value of the parameter as iceberg_default by default. You can create multiple connections in case you want to switch between environments.

Configuring the Connection

Catalog URI (Host)

The URL of the Iceberg REST catalog endpoint. Example: https://your-catalog.example.com/ws/v1

Client ID (Login)

The OAuth2 Client ID for authenticating with the catalog. Leave empty for catalogs that don’t require OAuth2 credentials (e.g., local catalogs).

Client Secret (Password)

The OAuth2 Client Secret for authenticating with the catalog.

Extra (Optional)

A JSON object with additional catalog properties passed to pyiceberg.catalog.load_catalog(). Common properties:

{
    "warehouse": "s3://my-warehouse/",
    "s3.endpoint": "https://s3.us-east-1.amazonaws.com",
    "s3.region": "us-east-1",
    "s3.access-key-id": "AKIA...",
    "s3.secret-access-key": "..."
}

For AWS/GCP/Azure deployments, prefer using IAM roles or environment-based credentials and pass only the warehouse path in extra.

Migration from 1.x

In version 2.0.0, get_conn() now returns a pyiceberg.catalog.Catalog instance instead of a token string. If you were using get_conn() to obtain OAuth2 tokens, switch to get_token():

# Before (1.x)
token = IcebergHook().get_conn()

# After (2.0)
token = IcebergHook().get_token()

The get_token_macro() method has been updated to use get_token() automatically, so Jinja2 templates continue to work without changes.

Was this entry helpful?