airflow.providers.common.ai.operators.llm_sql¶

Operator for generating SQL queries from natural language using LLMs.

Classes¶

LLMSQLQueryOperator

Generate SQL queries from natural language using an LLM.

Module Contents¶

class airflow.providers.common.ai.operators.llm_sql.LLMSQLQueryOperator(*, db_conn_id=None, table_names=None, schema_context=None, validate_sql=True, allowed_sql_types=DEFAULT_ALLOWED_TYPES, dialect=None, datasource_config=None, **kwargs)[source]¶

Bases: airflow.providers.common.ai.operators.llm.LLMOperator

Generate SQL queries from natural language using an LLM.

Inherits from LLMOperator for LLM access and optionally uses a DbApiHook for schema introspection. The operator generates SQL but does not execute it — the generated SQL is returned as XCom and can be passed to SQLExecuteQueryOperator or used in downstream tasks.

When system_prompt is provided, it is appended to the built-in SQL safety instructions — use it for domain-specific guidance (e.g. “prefer CTEs over subqueries”, “always use LEFT JOINs”).

Parameters:

prompt – Natural language description of the desired query.
llm_conn_id – Connection ID for the LLM provider.
model_id – Model identifier (e.g. "openai:gpt-4o"). Overrides the model stored in the connection’s extra field.
system_prompt – Additional instructions appended to the built-in SQL safety prompt. Use for domain-specific guidance.
agent_params – Additional keyword arguments passed to the pydantic-ai Agent constructor (e.g. retries, model_settings).
db_conn_id (str | None) – Connection ID for database schema introspection. The connection must resolve to a DbApiHook.
table_names (list[str] | None) – Tables to include in the LLM’s schema context. Used with db_conn_id for automatic introspection.
schema_context (str | None) – Manual schema context string. When provided, this is used instead of db_conn_id introspection.
validate_sql (bool) – Whether to validate generated SQL via AST parsing. Default True (safe by default).
allowed_sql_types (tuple[type[sqlglot.exp.Expr], Ellipsis]) – SQL statement types to allow. Default: (Select, Union, Intersect, Except).
dialect (str | None) – SQL dialect for parsing (postgres, mysql, etc.). Auto-detected from the database hook if not set.

Human-in-the-Loop approval parameters are inherited from LLMOperator (require_approval, approval_timeout, allow_modifications). When allow_modifications=True and the reviewer edits the SQL, the modified query is re-validated against the same safety rules before being returned.

template_fields: collections.abc.Sequence[str] = ('prompt', 'llm_conn_id', 'model_id', 'system_prompt', 'agent_params', 'db_conn_id',...[source]¶

db_conn_id = None[source]¶

table_names = None[source]¶

schema_context = None[source]¶

validate_sql = True[source]¶

allowed_sql_types[source]¶

dialect = None[source]¶

datasource_config = None[source]¶

property db_hook: airflow.providers.common.sql.hooks.sql.DbApiHook | None[source]¶

Return DbApiHook for the configured database connection, or None.

execute(context)[source]¶

Derive when creating an operator.

The main method to execute the task. Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

execute_complete(context, generated_output, event)[source]¶

Resume after human review, re-validating if the reviewer modified the SQL.