airflow.providers.common.ai.toolsets.datafusion¶
Curated SQL toolset wrapping DataFusionEngine for agentic object-store workflows.
Attributes¶
Classes¶
Curated toolset that gives an LLM agent SQL access to object-storage data via Apache DataFusion. |
Module Contents¶
- class airflow.providers.common.ai.toolsets.datafusion.DataFusionToolset(datasource_configs, *, allow_writes=False, max_rows=50)[source]¶
Bases:
pydantic_ai.toolsets.abstract.AbstractToolset[Any]Curated toolset that gives an LLM agent SQL access to object-storage data via Apache DataFusion.
Provides three tools —
list_tables,get_schema, andquery— backed byDataFusionEngine.Each
DataSourceConfigentry registers a table backed by Parquet, CSV, Avro, or Iceberg data on S3 or local storage. Multiple configs can be registered so that SQL queries can join across tables.Requires the
datafusionextra ofapache-airflow-providers-common-sql.- Parameters:
datasource_configs (list[airflow.providers.common.sql.config.DataSourceConfig]) – One or more DataFusion data-source configurations.
allow_writes (bool) – Allow data-modifying SQL (CREATE TABLE, CREATE VIEW, INSERT INTO, etc.). Default
False— only SELECT-family statements are permitted.max_rows (int) – Maximum number of rows returned from the
querytool. Default50.
- property id: str[source]¶
An ID for the toolset that is unique among all toolsets registered with the same agent.
If you’re implementing a concrete implementation that users can instantiate more than once, you should let them optionally pass a custom ID to the constructor and return that here.
A toolset needs to have an ID in order to be used in a durable execution environment like Temporal, in which case the ID will be used to identify the toolset’s activities within the workflow.
- async call_tool(name, tool_args, ctx, tool)[source]¶
Call a tool with the given arguments.
- Args:
name: The name of the tool to call. tool_args: The arguments to pass to the tool. ctx: The run context. tool: The tool definition returned by [get_tools][pydantic_ai.toolsets.AbstractToolset.get_tools] that was called.