airflow.providers.apache.drill.hooks.drill

Module Contents

Classes

DrillHook

Interact with Apache Drill via sqlalchemy-drill.

class airflow.providers.apache.drill.hooks.drill.DrillHook(*args, schema=None, log_sql=True, **kwargs)[source]

Bases: airflow.providers.common.sql.hooks.sql.DbApiHook

Interact with Apache Drill via sqlalchemy-drill.

You can specify the SQLAlchemy dialect and driver that sqlalchemy-drill will employ to communicate with Drill in the extras field of your connection, e.g. {"dialect_driver": "drill+sadrill"} for communication over Drill’s REST API. See the sqlalchemy-drill documentation for descriptions of the supported dialects and drivers.

You can specify the default storage_plugin for the sqlalchemy-drill connection using the extras field e.g. {"storage_plugin": "dfs"}.

conn_name_attr = 'drill_conn_id'[source]
default_conn_name = 'drill_default'[source]
conn_type = 'drill'[source]
hook_name = 'Drill'[source]
supports_autocommit = False[source]
get_conn()[source]

Establish a connection to Drillbit.

get_uri()[source]

Return the connection URI.

e.g: drill://localhost:8047/dfs

abstract set_autocommit(conn, autocommit)[source]

Set the autocommit flag on the connection.

abstract insert_rows(table, rows, target_fields=None, commit_every=1000, replace=False, **kwargs)[source]

Insert a collection of tuples into a table.

Rows are inserted in chunks, each chunk (of size commit_every) is done in a new transaction.

Parameters
  • table (str) – Name of the target table

  • rows (collections.abc.Iterable[tuple[str]]) – The rows to insert into the table

  • target_fields (collections.abc.Iterable[str] | None) – The names of the columns to fill in the table

  • commit_every (int) – The maximum number of rows to insert in one transaction. Set to 0 to insert all rows in one transaction.

  • replace (bool) – Whether to replace instead of insert

  • executemany – If True, all rows are inserted at once in chunks defined by the commit_every parameter. This only works if all rows have same number of column names, but leads to better performance.

  • fast_executemany – If True, the fast_executemany parameter will be set on the cursor used by executemany which leads to better performance, if supported by driver.

  • autocommit – What to set the connection’s autocommit setting to before executing the query.

Was this entry helpful?