SQLExecuteQueryOperator to connect to Apache Impala

Use the SQLExecuteQueryOperator to execute SQL queries against an Apache Impala cluster.

Note

Previously, a dedicated operator for Impala might have been used. After deprecation, please use the SQLExecuteQueryOperator instead.

Note

Make sure you have installed the apache-airflow-providers-apache-impala package to enable Impala support.

Using the Operator

Use the conn_id argument to connect to your Apache Impala instance where the connection metadata is structured as follows:

Impala Airflow Connection Metadata

Parameter

Input

Host: string

Impala daemon hostname or IP address

Schema: string

The default database name (optional)

Login: string

Username for authentication (if applicable)

Password: string

Password for authentication (if applicable)

Port: int

Impala service port (default: 21050)

Extra: JSON

Additional connection configuration, such as: {"use_ssl": false, "auth": "NOSASL"}

An example usage of the SQLExecuteQueryOperator to connect to Apache Impala is as follows:

tests/system/apache/impala/example_impala.py


    create_table_impala_task = SQLExecuteQueryOperator(
        task_id="create_table_impala",
        sql="""
            CREATE TABLE IF NOT EXISTS impala_example (
                a STRING,
                b INT
            )
            PARTITIONED BY (c INT)
        """,
    )

Reference

For further information, see:

Note

Parameters provided directly via SQLExecuteQueryOperator() take precedence over those specified in the Airflow connection metadata (such as schema, login, password, etc).

Was this entry helpful?