SQLExecuteQueryOperator to connect to Apache Druid¶
Use the SQLExecuteQueryOperator
to execute SQL queries against an
Apache Druid cluster.
Note
Previously, a dedicated operator for Druid might have been used.
After deprecation, please use the SQLExecuteQueryOperator
instead.
Note
Make sure you have installed the apache-airflow-providers-apache-druid
package to enable Druid support.
Using the Operator¶
Use the conn_id
argument to connect to your Apache Druid instance where
the connection metadata is structured as follows:
Parameter |
Input |
---|---|
Host: string |
Druid broker hostname or IP address |
Schema: string |
Not applicable (leave blank) |
Login: string |
Not applicable (leave blank) |
Password: string |
Not applicable (leave blank) |
Port: int |
Druid broker port (default: 8082) |
Extra: JSON |
Additional connection configuration, such as:
|
An example usage of the SQLExecuteQueryOperator to connect to Apache Druid is as follows:
tests/system/apache/druid/example_druid.py
# Task: List all published datasources in Druid.
list_datasources_task = SQLExecuteQueryOperator(
task_id="list_datasources",
sql="SELECT DISTINCT datasource FROM sys.segments WHERE is_published = 1",
)
# Task: Describe the schema for the 'wikipedia' datasource.
# Note: This query returns column information if the datasource exists.
describe_wikipedia_task = SQLExecuteQueryOperator(
task_id="describe_wikipedia",
sql=dedent("""
SELECT COLUMN_NAME, DATA_TYPE
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'wikipedia'
""").strip(),
)
# Task: Count rows for the 'wikipedia' datasource.
# Here we count the segments for 'wikipedia'. If the datasource is not ingested, it returns 0.
select_count_from_datasource = SQLExecuteQueryOperator(
task_id="select_count_from_datasource",
sql="SELECT COUNT(*) FROM sys.segments WHERE datasource = 'wikipedia'",
)
Reference¶
For further information, see:
Note
Parameters provided directly via SQLExecuteQueryOperator() take precedence over those specified
in the Airflow connection metadata (such as schema
, login
, password
, etc).