airflow.providers.common.ai.operators.llm_sql

Operator for generating SQL queries from natural language using LLMs.

Classes

LLMSQLQueryOperator

Generate SQL queries from natural language using an LLM.

Module Contents

class airflow.providers.common.ai.operators.llm_sql.LLMSQLQueryOperator(*, db_conn_id=None, table_names=None, schema_context=None, validate_sql=True, allowed_sql_types=DEFAULT_ALLOWED_TYPES, dialect=None, datasource_config=None, **kwargs)[source]

Bases: airflow.providers.common.ai.operators.llm.LLMOperator

Generate SQL queries from natural language using an LLM.

Inherits from LLMOperator for LLM access and optionally uses a DbApiHook for schema introspection. The operator generates SQL but does not execute it — the generated SQL is returned as XCom and can be passed to SQLExecuteQueryOperator or used in downstream tasks.

When system_prompt is provided, it is appended to the built-in SQL safety instructions — use it for domain-specific guidance (e.g. “prefer CTEs over subqueries”, “always use LEFT JOINs”).

Parameters:
  • prompt – Natural language description of the desired query.

  • llm_conn_id – Connection ID for the LLM provider.

  • model_id – Model identifier (e.g. "openai:gpt-4o"). Overrides the model stored in the connection’s extra field.

  • system_prompt – Additional instructions appended to the built-in SQL safety prompt. Use for domain-specific guidance.

  • agent_params – Additional keyword arguments passed to the pydantic-ai Agent constructor (e.g. retries, model_settings).

  • db_conn_id (str | None) – Connection ID for database schema introspection. The connection must resolve to a DbApiHook.

  • table_names (list[str] | None) – Tables to include in the LLM’s schema context. Used with db_conn_id for automatic introspection.

  • schema_context (str | None) – Manual schema context string. When provided, this is used instead of db_conn_id introspection.

  • validate_sql (bool) – Whether to validate generated SQL via AST parsing. Default True (safe by default).

  • allowed_sql_types (tuple[type[sqlglot.exp.Expression], Ellipsis]) – SQL statement types to allow. Default: (Select, Union, Intersect, Except).

  • dialect (str | None) – SQL dialect for parsing (postgres, mysql, etc.). Auto-detected from the database hook if not set.

Human-in-the-Loop approval parameters are inherited from LLMOperator (require_approval, approval_timeout, allow_modifications). When allow_modifications=True and the reviewer edits the SQL, the modified query is re-validated against the same safety rules before being returned.

template_fields: collections.abc.Sequence[str] = ('prompt', 'llm_conn_id', 'model_id', 'system_prompt', 'agent_params', 'db_conn_id',...[source]
db_conn_id = None[source]
table_names = None[source]
schema_context = None[source]
validate_sql = True[source]
allowed_sql_types[source]
dialect = None[source]
datasource_config = None[source]
property db_hook: airflow.providers.common.sql.hooks.sql.DbApiHook | None[source]

Return DbApiHook for the configured database connection, or None.

execute(context)[source]

Derive when creating an operator.

The main method to execute the task. Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

execute_complete(context, generated_output, event)[source]

Resume after human review, re-validating if the reviewer modified the SQL.

Was this entry helpful?