airflow.providers.common.ai.hooks.llamaindex¶

Hook for LlamaIndex integration with Airflow connections.

Classes¶

LlamaIndexHook

Bridge an Airflow connection to LlamaIndex chat and embedding models.

Module Contents¶

class airflow.providers.common.ai.hooks.llamaindex.LlamaIndexHook(llm_conn_id=None, embed_conn_id=None, embed_model=None, llm_model=None, **kwargs)[source]¶

Bases: airflow.providers.common.compat.sdk.BaseHook

Bridge an Airflow connection to LlamaIndex chat and embedding models.

The hook resolves credentials (API key, optional API base URL) from the Airflow connection and returns native LlamaIndex objects ready to pass to VectorStoreIndex(..., embed_model=...), load_index_from_storage(..., embed_model=...), or index.as_retriever(..., llm=...).

LlamaIndex does not ship a universal init_chat_model / init_embedding_model equivalent (each vendor is a separate package under llama-index-llms-* / llama-index-embeddings-* with its own constructor kwargs). The hook therefore covers the OpenAI-compatible surface that matches LlamaIndex’s own resolve_embed_model("default") behaviour. For other vendors (Cohere, Bedrock, Vertex, HuggingFace, …) instantiate the LlamaIndex class directly in your @task and pass it to the operator’s embed_model= / llm= parameter – both LlamaIndexEmbeddingOperator and LlamaIndexRetrievalOperator accept a pre-built BaseEmbedding / LLM instance and bypass the hook in that case.

Note

The hook deliberately does not mutate LlamaIndex’s global Settings singleton. Operators pass the resolved model directly to LlamaIndex constructors so concurrent tasks in the same worker don’t race on shared state.

Connection fields:

password: API key passed as api_key=.
host: Optional base URL passed as api_base= (custom endpoints, Ollama, vLLM).
extra JSON: {"embed_model": "text-embedding-3-small", "llm_model": "gpt-4o"} – default model identifiers stored on the connection.

Parameters:

llm_conn_id (str | None) – Airflow connection ID for the LLM provider. Falls back to default_conn_name ("llamaindex_default") when not provided.
embed_conn_id (str | None) – Optional separate Airflow connection ID for the embedding provider. Falls back to llm_conn_id when not set.
embed_model (str | None) – Embedding model name (e.g. "text-embedding-3-small"). Overrides extra["embed_model"] on the connection.
llm_model (str | None) – LLM model name (e.g. "gpt-4o"). Overrides extra["llm_model"] on the connection. Required when calling get_llm().

conn_name_attr = 'llm_conn_id'[source]¶

default_conn_name = 'llamaindex_default'[source]¶

conn_type = 'llamaindex'[source]¶

hook_name = 'LlamaIndex'[source]¶

llm_conn_id[source]¶

embed_conn_id[source]¶

embed_model = None[source]¶

llm_model = None[source]¶

static get_ui_field_behaviour()[source]¶

Return custom field behaviour for the Airflow connection form.

get_embedding_model()[source]¶

Return a LlamaIndex embedding model configured from the Airflow connection.

Uses embed_conn_id (falls back to llm_conn_id) for credentials. Returns an OpenAIEmbedding instance; for other vendors, instantiate the LlamaIndex class directly and pass it to the operator’s embed_model= parameter.

get_llm()[source]¶

Return a LlamaIndex LLM configured from the Airflow connection.

Returns an OpenAI LLM instance; for other vendors, instantiate the LlamaIndex class directly and pass it to the operator’s llm= parameter.