airflow.providers.databricks.hooks.databricks_sql¶

Attributes¶

T

Classes¶

DatabricksSqlHook

Hook to interact with Databricks SQL.

Functions¶

create_timeout_thread(cur, execution_timeout)

Module Contents¶

airflow.providers.databricks.hooks.databricks_sql.T[source]¶

airflow.providers.databricks.hooks.databricks_sql.create_timeout_thread(cur, execution_timeout)[source]¶

class airflow.providers.databricks.hooks.databricks_sql.DatabricksSqlHook(databricks_conn_id=BaseDatabricksHook.default_conn_name, http_path=None, sql_endpoint_name=None, session_configuration=None, http_headers=None, catalog=None, schema=None, caller='DatabricksSqlHook', **kwargs)[source]¶

Bases: airflow.providers.databricks.hooks.databricks_base.BaseDatabricksHook, airflow.providers.common.sql.hooks.sql.DbApiHook

Hook to interact with Databricks SQL.

Parameters:

databricks_conn_id (str) – Reference to the Databricks connection.
http_path (str | None) – Optional string specifying HTTP path of Databricks SQL Endpoint or cluster. If not specified, it should be either specified in the Databricks connection’s extra parameters, or sql_endpoint_name must be specified.
sql_endpoint_name (str | None) – Optional name of Databricks SQL Endpoint. If not specified, http_path must be provided as described above.
session_configuration (dict[str, str] | None) – An optional dictionary of Spark session parameters. Defaults to None. If not specified, it could be specified in the Databricks connection’s extra parameters.
http_headers (list[tuple[str, str]] | None) – An optional list of (k, v) pairs that will be set as HTTP headers on every request
catalog (str | None) – An optional initial catalog to use. Requires DBR version 9.0+
schema (str | None) – An optional initial schema to use. Requires DBR version 9.0+
kwargs – Additional parameters internal to Databricks SQL Connector parameters

hook_name = 'Databricks SQL'[source]¶

supports_autocommit = True[source]¶

session_config = None[source]¶

http_headers = None[source]¶

catalog = None[source]¶

schema = None[source]¶

additional_params[source]¶

query_ids: list[str] = [][source]¶

get_conn()[source]¶

Return a Databricks SQL connection object.

property sqlalchemy_url: sqlalchemy.engine.URL[source]¶

Return a Sqlalchemy.engine.URL object from the connection.

Returns:: the extracted sqlalchemy.engine.URL object.
Return type:: sqlalchemy.engine.URL

get_uri()[source]¶

Extract the URI from the connection.

Returns:: the extracted uri.
Return type:: str

run(sql: str | collections.abc.Iterable[str], autocommit: bool = ..., parameters: collections.abc.Iterable | collections.abc.Mapping[str, Any] | None = ..., handler: None = ..., split_statements: bool = ..., return_last: bool = ..., execution_timeout: datetime.timedelta | None = None) → None[source]¶

Run a command or a list of commands.

Pass a list of SQL statements to the SQL parameter to get them to execute sequentially.

Parameters:

sql – the sql statement to be executed (str) or a list of sql statements to execute
autocommit – What to set the connection’s autocommit setting to before executing the query. Note that currently there is no commit functionality in Databricks SQL so this flag has no effect.
parameters – The parameters to render the SQL query with.
handler – The result handler which is called with the result of each statement.
split_statements – Whether to split a single SQL string into statements and run separately
return_last – Whether to return result for only last statement or for all after split
execution_timeout – max time allowed for the execution of this task instance, if it goes beyond it will raise and fail.

Returns:

return only result of the LAST SQL expression if handler was provided unless return_last is set to False.

abstract bulk_dump(table, tmp_file)[source]¶

Dump a database table into a tab-delimited file.

Parameters:

table – The name of the source table
tmp_file – The path of the target file

abstract bulk_load(table, tmp_file)[source]¶

Load a tab-delimited file into a database table.

Parameters:

table – The name of the target table
tmp_file – The path of the file to load into the table

get_openlineage_database_info(connection)[source]¶

Return database specific information needed to generate and parse lineage metadata.

This includes information helpful for constructing information schema query and creating correct namespace.

Parameters:: connection – Airflow connection to reduce calls of get_connection method

get_openlineage_database_dialect(_)[source]¶

Return database dialect used for SQL parsing.

For a list of supported dialects check: https://openlineage.io/docs/development/sql#sql-dialects

get_openlineage_default_schema()[source]¶

Return default schema specific to database.