SQLExecuteQueryOperator to connect to Apache Hive

Use the SQLExecuteQueryOperator to execute Hive commands in an Apache Hive database.

Note

Previously, HiveOperator was used to perform this kind of operation. After deprecation this has been removed. Please use SQLExecuteQueryOperator instead.

Note

Make sure you have installed the apache-airflow-providers-apache-hive package to enable Hive support.

Using the Operator

Use the conn_id argument to connect to your Apache Hive instance where the connection metadata is structured as follows:

Hive Airflow Connection Metadata

Parameter

Input

Host: string

HiveServer2 hostname or IP address

Schema: string

Default database name (optional)

Login: string

Hive username (if applicable)

Password: string

Hive password (if applicable)

Port: int

HiveServer2 port (default: 10000)

Extra: JSON

Additional connection configuration, such as the authentication method: {"auth": "NOSASL"}

An example usage of the SQLExecuteQueryOperator to connect to Apache Hive is as follows:

tests/system/apache/hive/example_hive.py


    create_table_hive_task = SQLExecuteQueryOperator(
        task_id="create_table_hive",
        sql="create table hive_example(a string, b int) partitioned by(c int)",
    )

Reference

For further information, look at:

Note

Parameters provided directly via SQLExecuteQueryOperator() take precedence over those specified in the Airflow connection metadata (such as schema, login, password, etc).

Was this entry helpful?