airflow.providers.common.ai.hooks.llamaindex¶
Hook for LlamaIndex integration with Airflow connections.
Classes¶
Bridge an Airflow connection to LlamaIndex chat and embedding models. |
Module Contents¶
- class airflow.providers.common.ai.hooks.llamaindex.LlamaIndexHook(llm_conn_id=None, embed_conn_id=None, embed_model=None, llm_model=None, **kwargs)[source]¶
Bases:
airflow.providers.common.compat.sdk.BaseHookBridge an Airflow connection to LlamaIndex chat and embedding models.
The hook resolves credentials (API key, optional API base URL) from the Airflow connection and returns native LlamaIndex objects ready to pass to
VectorStoreIndex(..., embed_model=...),load_index_from_storage(..., embed_model=...), orindex.as_retriever(..., llm=...).LlamaIndex does not ship a universal
init_chat_model/init_embedding_modelequivalent (each vendor is a separate package underllama-index-llms-*/llama-index-embeddings-*with its own constructor kwargs). The hook therefore covers the OpenAI-compatible surface that matches LlamaIndex’s ownresolve_embed_model("default")behaviour. For other vendors (Cohere, Bedrock, Vertex, HuggingFace, …) instantiate the LlamaIndex class directly in your@taskand pass it to the operator’sembed_model=/llm=parameter – bothLlamaIndexEmbeddingOperatorandLlamaIndexRetrievalOperatoraccept a pre-builtBaseEmbedding/LLMinstance and bypass the hook in that case.Note
The hook deliberately does not mutate LlamaIndex’s global
Settingssingleton. Operators pass the resolved model directly to LlamaIndex constructors so concurrent tasks in the same worker don’t race on shared state.Connection fields:
password: API key passed as
api_key=.host: Optional base URL passed as
api_base=(custom endpoints, Ollama, vLLM).extra JSON:
{"embed_model": "text-embedding-3-small", "llm_model": "gpt-4o"}– default model identifiers stored on the connection.
- Parameters:
llm_conn_id (str | None) – Airflow connection ID for the LLM provider. Falls back to
default_conn_name("llamaindex_default") when not provided.embed_conn_id (str | None) – Optional separate Airflow connection ID for the embedding provider. Falls back to
llm_conn_idwhen not set.embed_model (str | None) – Embedding model name (e.g.
"text-embedding-3-small"). Overridesextra["embed_model"]on the connection.llm_model (str | None) – LLM model name (e.g.
"gpt-4o"). Overridesextra["llm_model"]on the connection. Required when callingget_llm().
- static get_ui_field_behaviour()[source]¶
Return custom field behaviour for the Airflow connection form.
- get_embedding_model()[source]¶
Return a LlamaIndex embedding model configured from the Airflow connection.
Uses
embed_conn_id(falls back tollm_conn_id) for credentials. Returns anOpenAIEmbeddinginstance; for other vendors, instantiate the LlamaIndex class directly and pass it to the operator’sembed_model=parameter.