`LlamaIndexHook`¶

Use LlamaIndexHook to bridge an Airflow connection to LlamaIndex chat and embedding models. The hook reads credentials (API key, optional base URL) from a connection of type llamaindex and returns native LlamaIndex objects ready to pass to VectorStoreIndex(..., embed_model=...), load_index_from_storage(..., embed_model=...), or index.as_retriever(..., llm=...).

The hook deliberately does not mutate LlamaIndex’s global Settings singleton. Operators pass the resolved model directly to LlamaIndex constructors, so concurrent tasks in the same worker don’t race on shared state.

OpenAI by default, BYO for other vendors¶

LlamaIndex does not ship a universal init_chat_model / init_embedding_model equivalent (each vendor is a separate package under llama-index-llms-* / llama-index-embeddings-* with its own constructor kwargs). The hook therefore covers the OpenAI-compatible surface that matches LlamaIndex’s own resolve_embed_model("default") behaviour:

hook.get_embedding_model() returns an OpenAIEmbedding configured from the connection.
hook.get_llm() returns an OpenAI LLM configured from the connection.

For other vendors (Cohere, Bedrock, Vertex AI, HuggingFace, …), instantiate the LlamaIndex class directly in a @task and pass it to the operator’s embed_model= / llm= parameter – both LlamaIndexEmbeddingOperator and LlamaIndexRetrievalOperator accept a pre-built BaseEmbedding / LLM instance and bypass the hook:

airflow/providers/common/ai/example_dags/example_llamaindex_hook.py[source]

@dag(schedule=None, tags=["example"])
def example_llamaindex_byo_embed_model():
    """Use a non-OpenAI embedding by instantiating the LlamaIndex class directly.

    LlamaIndex doesn't ship a universal init helper, so the operator accepts
    a pre-built ``BaseEmbedding`` instance and bypasses the hook entirely.
    Install the matching extra:
    ``pip install llama-index-embeddings-cohere``.
    """

    @task
    def build_cohere_embedder():
        from llama_index.embeddings.cohere import CohereEmbedding

        from airflow.providers.common.compat.sdk import BaseHook

        conn = BaseHook.get_connection("cohere_default")
        return CohereEmbedding(model_name="embed-english-v3.0", cohere_api_key=conn.password)

    @task
    def empty_doc_list() -> list[dict]:
        return [{"text": "Cohere demo content", "metadata": {}}]

    embed = LlamaIndexEmbeddingOperator(
        task_id="embed",
        documents=empty_doc_list(),
        embed_model=build_cohere_embedder(),
        persist_dir="/opt/airflow/data/cohere_index",
    )

    embed

Install the per-vendor LlamaIndex integration package separately: pip install llama-index-embeddings-cohere, ...-bedrock, ...-huggingface, llama-index-llms-anthropic, etc.

Connection Configuration¶

The hook reads credentials from the Airflow connection of type llamaindex:

password – API key (passed as api_key to OpenAIEmbedding / OpenAI).
host – Optional base URL (passed as api_base; useful for custom OpenAI-compatible endpoints, Ollama, vLLM).
extra JSON – {"embed_model": "text-embedding-3-small", "llm_model": "gpt-4o"} – default model identifiers stored on the connection.

Parameters¶

Parameter	Default	Description
`llm_conn_id`	`llamaindex_default`	Airflow connection ID for the LLM/embedding provider.
`embed_conn_id`	`None` (falls back to `llm_conn_id`)	Optional separate Airflow connection ID for the embedding provider.
`embed_model`	`None` (falls back to `extra["embed_model"]`)	Embedding model name, e.g. `text-embedding-3-small`.
`llm_model`	`None` (falls back to `extra["llm_model"]`)	LLM model name, e.g. `gpt-4o`. Required when calling `get_llm()`.

Dependencies¶

Install the llamaindex extra:

pip install apache-airflow-providers-common-ai[llamaindex]

That extra installs llama-index-core, llama-index-embeddings-openai, and llama-index-llms-openai – enough to back the hook’s default OpenAI return values. For other LlamaIndex vendor packages, install their integration package separately.

LlamaIndexHook¶

OpenAI by default, BYO for other vendors¶

Connection Configuration¶

Parameters¶

Dependencies¶

`LlamaIndexHook`¶