airflow.providers.clickhousedb.hooks.clickhouse¶
Classes¶
Minimal DB-API 2.0 connection adapter wrapping a |
|
Interact with ClickHouse via the HTTP interface ( |
Module Contents¶
- class airflow.providers.clickhousedb.hooks.clickhouse.ClickHouseConnection(client)[source]¶
Minimal DB-API 2.0 connection adapter wrapping a
clickhouse_connectClient.SQL execution is delegated to
clickhouse_connect.dbapi.cursor.Cursor, which routes each statement toclient.query()orclient.command()automatically by inspecting the SQL keyword after stripping comments — the same logic used by theclickhouse-connectSQLAlchemy dialect.ClickHouse has no multi-statement transactions. Every statement is effectively auto-committed, so
commit()androllback()are intentional no-ops and theautocommitattribute is alwaysTrue.DbApiHook.run()checksconn.autocommitviaClickHouseHook.get_autocommit()and skips theconn.commit()call when it is truthy.
- class airflow.providers.clickhousedb.hooks.clickhouse.ClickHouseHook(*args, database=None, session_settings=None, client_kwargs=None, **kwargs)[source]¶
Bases:
airflow.providers.common.sql.hooks.sql.DbApiHookInteract with ClickHouse via the HTTP interface (
clickhouse-connect).This hook wraps
clickhouse_connect.get_client()behind a thin DB-API 2.0 adapter (ClickHouseConnection), so all standardSQLExecuteQueryOperatorfeatures work out of the box (templating,handler,split_statements, etc.).- Parameters:
database (str | None) – Optional database name. Overrides the
schemafield of the Airflow connection. Useful when one connection points to a ClickHouse cluster and individual tasks need to target different databases.session_settings (dict[str, Any] | None) – Optional dict of ClickHouse session-level settings (e.g.
{"max_execution_time": 60, "max_threads": 4}). Values supplied here are merged on top of anysession_settingsdict already present in the connection’sextraJSON field, with the constructor argument taking precedence. For a full list of available session settings visit https://clickhouse.com/docs/operations/settings/settingsclient_kwargs (dict[str, Any] | None) – Optional dict of additional keyword arguments forwarded verbatim to
clickhouse_connect.get_client(). Use this to pass any parameter not otherwise exposed by the hook (e.g.http_proxy,pool_mgr_params). Values supplied here are merged on top of anyclient_kwargsdict already present in the connection’sextraJSON field, with the constructor argument taking precedence on conflicting keys. Keys that the hook manages itself (host,port,username,password,database,secure,verify,client_name,settings) are silently ignored so that hook-managed values always take precedence.
See also
- get_client()[source]¶
Return the raw
clickhouse_connectClient for ClickHouse-specific operations.Use this for bulk inserts, streaming queries, or any operation that benefits from the native
clickhouse_connectAPI rather than DB-API cursors. The caller is responsible for closing the client.
- bulk_insert_rows(table, rows, column_names, batch_size=None)[source]¶
Insert rows into a ClickHouse table using the native columnar insert.
Uses
clickhouse_connect’s optimized insert path, which is significantly faster than row-by-row cursor inserts for large datasets.By default (
batch_size=None) all rows are sent in a single call. Setbatch_sizeto bound peak memory usage on very large inputs; an insert context is created once and reused across chunks to avoid a repeatedDESCRIBE TABLEround-trip per batch.
- get_uri()[source]¶
Return a SQLAlchemy-compatible URI for
get_sqlalchemy_engine().Always uses the
clickhousedb://scheme registered byclickhouse-connect. TLS and tuning parameters are forwarded as query-string arguments so that SQLAlchemy-path users get the same settings as DB-API-path users:secure— enables HTTPS/TLS (?secure=true)verify— TLS certificate verification (?verify=falseto disable)connect_timeout,send_receive_timeout,compress— forwarded when explicitly set in the connection’sextraJSON