LangChainHook

Use LangChainHook to bridge an Airflow connection to LangChain chat and embedding models. The hook reads credentials (API key, optional base URL) from the connection and returns configured LangChain model objects via two universal entry-point functions:

  • langchain.chat_models.init_chat_model for chat models, dispatching to the right vendor based on the provider:name prefix.

  • langchain.embeddings.init_embeddings for embedding models, same dispatch story.

The hook owns its own langchain connection type so the UI is honest about which framework a connection configures.

Chat model usage

Pass llm_model to the constructor (or set extra["model"] on the connection) and call get_chat_model():

airflow/providers/common/ai/example_dags/example_langchain_hook.py[source]

@dag(schedule=None, tags=["example"])
def example_langchain_chat():
    @task
    def summarize(text: str) -> str:
        hook = LangChainHook(
            llm_conn_id="langchain_default",
            llm_model="openai:gpt-4o",
        )
        llm = hook.get_chat_model()
        # LangChain BaseMessage.content is `str | list[...]` (multi-modal union);
        # coerce to str for the text-only path this example demonstrates.
        return str(llm.invoke(f"Summarize concisely: {text}").content)

    summarize("Apache Airflow is a platform for authoring, scheduling, and monitoring workflows.")


The returned model is a LangChain BaseChatModel, so it composes with the rest of LangChain’s runnable surface (ChatPromptTemplate / StrOutputParser / RunnableSequence / …).

Supported chat providers

Any model identifier accepted by langchain.chat_models.init_chat_model works out of the box. Common identifiers:

  • openai:gpt-4o, openai:gpt-4o-mini – requires langchain-openai

  • anthropic:claude-3-7-sonnet – requires langchain-anthropic

  • groq:llama-3.3-70b-versatile – requires langchain-groq

  • mistralai:mistral-large-latest – requires langchain-mistralai

  • ollama:llama3 – requires langchain-ollama (point host at the Ollama URL)

  • deepseek:deepseek-chat – requires langchain-deepseek

Cloud providers with non-standard auth (AWS Bedrock, Google Vertex AI, Azure OpenAI) are not covered by the api_key + base_url surface here and are deferred to per-vendor hooks (mirroring the pydantic-ai cloud-auth subclass pattern).

Embedding model usage

Pass embed_model to the constructor (or set extra["embed_model"] on the connection) and call get_embedding_model():

airflow/providers/common/ai/example_dags/example_langchain_hook.py[source]

@dag(schedule=None, tags=["example"])
def example_langchain_embedding():
    @task
    def embed_documents(texts: list[str]) -> int:
        hook = LangChainHook(
            llm_conn_id="langchain_default",
            embed_model="openai:text-embedding-3-small",
        )
        embeddings = hook.get_embedding_model()
        vectors = embeddings.embed_documents(texts)
        return len(vectors[0])

    embed_documents(
        [
            "Apache Airflow is a workflow orchestrator.",
            "Workflows are defined as Python DAGs.",
        ]
    )


The same hook instance can serve both chat and embedding models when both identifiers are set:

airflow/providers/common/ai/example_dags/example_langchain_hook.py[source]

@dag(schedule=None, tags=["example"])
def example_langchain_chat_and_embedding():
    """One hook instance serves both chat and embeddings when both models are set."""

    @task
    def use_both() -> dict:
        hook = LangChainHook(
            llm_conn_id="langchain_default",
            llm_model="openai:gpt-4o",
            embed_model="openai:text-embedding-3-small",
        )
        chat = hook.get_chat_model()
        embeddings = hook.get_embedding_model()
        return {
            "answer": str(chat.invoke("In one sentence: what does Airflow do?").content),
            "embedding_dim": len(embeddings.embed_query("Airflow")),
        }

    use_both()


Supported embedding providers

The hook passes api_key and (optional) base_url from the connection to langchain.embeddings.init_embeddings. Providers whose embedding classes accept this kwarg shape work directly:

  • openai:text-embedding-3-small, openai:text-embedding-3-large – requires langchain-openai

  • openai:<model> against an OpenAI-compatible endpoint (point host at Ollama / vLLM / LM Studio) – requires langchain-openai

init_embeddings advertises more providers (Cohere, Mistral AI, HuggingFace, Bedrock, Vertex AI, Azure OpenAI, …), but their embedding classes expect provider-specific credential kwargs (cohere_api_key, AWS auth chain, GCP service-account, …) rather than the generic api_key / base_url this hook forwards. Those are deferred to per-vendor subclasses mirroring the pydantic-ai pattern (PydanticAIBedrockHook / PydanticAIVertexHook / PydanticAIAzureHook).

Different connections for chat and embeddings

If chat and embeddings live on different API keys (e.g. premium chat key vs free-tier embeddings key), pass an explicit embed_conn_id. When unset it falls back to llm_conn_id, so the common one-provider case stays simple:

airflow/providers/common/ai/example_dags/example_langchain_hook.py[source]

@dag(schedule=None, tags=["example"])
def example_langchain_different_conns():
    """Use separate connections when chat and embeddings live on different API keys."""

    @task
    def use_separate_conns() -> dict:
        hook = LangChainHook(
            llm_conn_id="openai_chat",
            embed_conn_id="openai_embed",
            llm_model="openai:gpt-4o",
            embed_model="openai:text-embedding-3-small",
        )
        chat = hook.get_chat_model()
        embeddings = hook.get_embedding_model()
        return {
            "answer": str(chat.invoke("In one sentence: what does Airflow do?").content),
            "embedding_dim": len(embeddings.embed_query("Airflow")),
        }

    use_separate_conns()


Connection Configuration

The hook reads credentials from the Airflow connection of type langchain:

  • password – API key (passed as api_key to init_chat_model and init_embeddings).

  • host – Optional base URL (passed as base_url; useful for custom OpenAI-compatible endpoints, Ollama, vLLM).

  • extra JSON – {"model": "openai:gpt-4o", "embed_model": "openai:text-embedding-3-small"} to set default chat and embedding model identifiers on the connection.

Parameters

Parameter

Default

Description

llm_conn_id

langchain_default

Airflow connection ID for the LLM provider.

embed_conn_id

None (falls back to llm_conn_id)

Optional separate Airflow connection ID for the embedding provider. Useful when chat and embeddings live on different API keys; in the common one-provider case, leave unset and the hook reuses llm_conn_id.

llm_model

None (falls back to extra["model"] on the connection)

Chat model identifier in provider:name form, e.g. openai:gpt-4o. Only required when calling get_chat_model().

embed_model

None (falls back to extra["embed_model"] on the connection)

Embedding model identifier in provider:name form, e.g. openai:text-embedding-3-small. Only required when calling get_embedding_model().

Dependencies

Install the langchain extra to use this hook:

pip install apache-airflow-providers-common-ai[langchain]

That extra installs only langchain itself, since the framework is vendor-agnostic. Install the LangChain integration package for whichever provider(s) you intend to use:

  • langchain-openai – OpenAI and OpenAI-compatible endpoints (Ollama, vLLM)

  • langchain-anthropic – Anthropic

  • langchain-groq, langchain-mistralai, langchain-deepseek, langchain-ollama, …

Was this entry helpful?