SQLExecuteQueryOperator to connect to Apache Hive¶
Use the SQLExecuteQueryOperator
to execute
Hive commands in an Apache Hive database.
Note
Previously, HiveOperator
was used to perform this kind of operation.
After deprecation this has been removed. Please use SQLExecuteQueryOperator
instead.
Note
Make sure you have installed the apache-airflow-providers-apache-hive
package
to enable Hive support.
Using the Operator¶
Use the conn_id
argument to connect to your Apache Hive instance where
the connection metadata is structured as follows:
Parameter |
Input |
---|---|
Host: string |
HiveServer2 hostname or IP address |
Schema: string |
Default database name (optional) |
Login: string |
Hive username (if applicable) |
Password: string |
Hive password (if applicable) |
Port: int |
HiveServer2 port (default: 10000) |
Extra: JSON |
Additional connection configuration, such as the authentication method:
|
An example usage of the SQLExecuteQueryOperator to connect to Apache Hive is as follows:
tests/system/apache/hive/example_hive.py
create_table_hive_task = SQLExecuteQueryOperator(
task_id="create_table_hive",
sql="create table hive_example(a string, b int) partitioned by(c int)",
)
Reference¶
For further information, look at:
Note
Parameters provided directly via SQLExecuteQueryOperator() take precedence
over those specified in the Airflow connection metadata (such as schema
, login
, password
, etc).