airflow.providers.common.ai.toolsets.datafusion

Curated SQL toolset wrapping DataFusionEngine for agentic object-store workflows.

Attributes

log

Classes

DataFusionToolset

Curated toolset that gives an LLM agent SQL access to object-storage data via Apache DataFusion.

Module Contents

airflow.providers.common.ai.toolsets.datafusion.log[source]
class airflow.providers.common.ai.toolsets.datafusion.DataFusionToolset(datasource_configs, *, allow_writes=False, max_rows=50)[source]

Bases: pydantic_ai.toolsets.abstract.AbstractToolset[Any]

Curated toolset that gives an LLM agent SQL access to object-storage data via Apache DataFusion.

Provides three tools — list_tables, get_schema, and query — backed by DataFusionEngine.

Each DataSourceConfig entry registers a table backed by Parquet, CSV, Avro, or Iceberg data on S3 or local storage. Multiple configs can be registered so that SQL queries can join across tables.

Requires the datafusion extra of apache-airflow-providers-common-sql.

Parameters:
  • datasource_configs (list[airflow.providers.common.sql.config.DataSourceConfig]) – One or more DataFusion data-source configurations.

  • allow_writes (bool) – Allow data-modifying SQL (CREATE TABLE, CREATE VIEW, INSERT INTO, etc.). Default False — only SELECT-family statements are permitted.

  • max_rows (int) – Maximum number of rows returned from the query tool. Default 50.

property id: str[source]

An ID for the toolset that is unique among all toolsets registered with the same agent.

If you’re implementing a concrete implementation that users can instantiate more than once, you should let them optionally pass a custom ID to the constructor and return that here.

A toolset needs to have an ID in order to be used in a durable execution environment like Temporal, in which case the ID will be used to identify the toolset’s activities within the workflow.

async get_tools(ctx)[source]

The tools that are available in this toolset.

async call_tool(name, tool_args, ctx, tool)[source]

Call a tool with the given arguments.

Args:

name: The name of the tool to call. tool_args: The arguments to pass to the tool. ctx: The run context. tool: The tool definition returned by [get_tools][pydantic_ai.toolsets.AbstractToolset.get_tools] that was called.

Was this entry helpful?