Models 模型

LLMs 是强大的人工智能工具，能够像人类一样解释和生成文本。它们功能多样，可以编写内容、翻译语言、总结和回答问题，无需为每个任务进行专门训练。除了文本生成，许多模型还支持：

工具调用 - 调用外部工具（如数据库查询或API调用）并在响应中使用结果。
结构化输出 - 模型的响应被约束为遵循定义的格式。
多模态 - 处理和返回文本以外的数据，如图像、音频和视频。
推理 - 模型执行多步推理以得出结论。

模型是智能体的推理引擎。它们驱动智能体的决策过程，决定调用哪些工具、如何解释结果以及何时提供最终答案。您选择的模型的质量和能力直接影响您智能体的可靠性和性能。不同的模型擅长不同的任务——有些更擅长遵循复杂指令，有些在结构化推理方面更出色，还有些支持更大的上下文窗口以处理更多信息。 LangChain 的标准模型接口让您可以访问许多不同的提供商集成，这使得您可以轻松试验不同模型并根据您的用例找到最合适的模型。

有关特定提供商的集成信息和功能，请参阅提供商的集成页面。

基本用法

模型可以通过两种方式使用：

与智能体一起使用 - 在创建智能体时可以动态指定模型。
独立使用 - 可以直接调用模型（在智能体循环之外）执行文本生成、分类或提取等任务，而无需智能体框架。

相同的模型接口在这两种上下文中都有效，这为您提供了灵活性，可以从简单开始，并根据需要扩展到更复杂的基于智能体的工作流。

初始化模型

在 LangChain 中使用独立模型最简单的方法是使用 init_chat_model 从您选择的提供商初始化一个模型（示例如下）：

OpenAI
Anthropic
Azure
Google Gemini
AWS Bedrock

👉 Read the OpenAI chat model integration docs

pip install -U "langchain[openai]"

import os
from langchain.chat_models import init_chat_model

os.environ["OPENAI_API_KEY"] = "sk-..."

model = init_chat_model("openai:gpt-4.1")

👉 Read the Anthropic chat model integration docs

pip install -U "langchain[anthropic]"

import os
from langchain.chat_models import init_chat_model

os.environ["ANTHROPIC_API_KEY"] = "sk-..."

model = init_chat_model("anthropic:claude-sonnet-4-5")

👉 Read the Azure chat model integration docs

pip install -U "langchain[openai]"

import os
from langchain.chat_models import init_chat_model

os.environ["AZURE_OPENAI_API_KEY"] = "..."
os.environ["AZURE_OPENAI_ENDPOINT"] = "..."
os.environ["OPENAI_API_VERSION"] = "2025-03-01-preview"

model = init_chat_model(
    "azure_openai:gpt-4.1",
    azure_deployment=os.environ["AZURE_OPENAI_DEPLOYMENT_NAME"],
)

👉 Read the Google GenAI chat model integration docs

pip install -U "langchain[google-genai]"

import os
from langchain.chat_models import init_chat_model

os.environ["GOOGLE_API_KEY"] = "..."

model = init_chat_model("google_genai:gemini-2.5-flash-lite")

👉 Read the AWS Bedrock chat model integration docs

pip install -U "langchain[aws]"

from langchain.chat_models import init_chat_model

# Follow the steps here to configure your credentials:
# https://docs.aws.amazon.com/bedrock/latest/userguide/getting-started.html

model = init_chat_model(
    "anthropic.claude-3-5-sonnet-20240620-v1:0",
    model_provider="bedrock_converse",
)

response = model.invoke("Why do parrots talk?")

有关更多详细信息，包括如何传递模型参数的信息，请参阅 init_chat_model。

关键方法

Invoke

模型接收消息作为输入，并在生成完整响应后输出消息。

Stream

调用模型，但实时流式传输生成的输出。

Batch

批量向模型发送多个请求以提高处理效率。

除了聊天模型，LangChain 还支持其他相关技术，例如嵌入模型和向量存储。详情请参阅集成页面。

参数

聊天模型接受可用于配置其行为的参数。支持的完整参数集因模型和提供商而异，但标准参数包括：

model

string

required

您希望与提供商一起使用的特定模型的名称或标识符。

api_key

string

用于验证模型提供商身份所需的密钥。通常在您注册访问模型时颁发。通常通过设置来访问。

temperature

number

控制模型输出的随机性。数值越高，响应越有创造性；数值越低，响应越具确定性。

timeout

number

在取消请求之前等待模型响应的最长时间（以秒为单位）。

max_tokens

number

限制响应中的总数，有效控制输出的长度。

max_retries

number

如果请求因网络超时或速率限制等问题失败，系统将尝试重新发送请求的最大次数。

使用 init_chat_model 时，将这些参数作为内联的传递：

Initialize using model parameters

model = init_chat_model(
    "anthropic:claude-sonnet-4-5",
    # Kwargs passed to the model:
    temperature=0.7,
    timeout=30,
    max_tokens=1000,
)

每个聊天模型集成可能都有额外的参数，用于控制特定于提供商的功能。例如，ChatOpenAI 有 use_responses_api 来指示是使用 OpenAI Responses 还是 Completions API。要查找给定聊天模型支持的所有参数，请访问聊天模型集成页面。

调用

必须调用聊天模型才能生成输出。有三种主要的调用方法，每种方法适用于不同的用例。

调用

调用模型最直接的方法是使用 invoke() 并传入单个消息或消息列表。

Single message

response = model.invoke("Why do parrots have colorful feathers?")
print(response)

可以向模型提供消息列表以表示对话历史记录。每个消息都有一个角色，模型使用该角色来指示对话中谁发送了消息。有关角色、类型和内容的更多详细信息，请参阅消息指南。

Dictionary format

from langchain.messages import HumanMessage, AIMessage, SystemMessage

conversation = [
    {"role": "system", "content": "You are a helpful assistant that translates English to French."},
    {"role": "user", "content": "Translate: I love programming."},
    {"role": "assistant", "content": "J'adore la programmation."},
    {"role": "user", "content": "Translate: I love building applications."}
]

response = model.invoke(conversation)
print(response)  # AIMessage("J'adore créer des applications.")

Message objects

from langchain_core.messages import HumanMessage, AIMessage, SystemMessage

conversation = [
    SystemMessage("You are a helpful assistant that translates English to French."),
    HumanMessage("Translate: I love programming."),
    AIMessage("J'adore la programmation."),
    HumanMessage("Translate: I love building applications.")
]

response = model.invoke(conversation)
print(response)  # AIMessage("J'adore créer des applications.")

流式传输

大多数模型可以在生成输出内容的同时进行流式传输。通过逐步显示输出，流式传输显著改善了用户体验，特别是对于较长的响应。调用 stream() 会返回一个，该迭代器在输出块生成时产生它们。您可以使用循环实时处理每个块：

for chunk in model.stream("Why do parrots have colorful feathers?"):
    print(chunk.text, end="|", flush=True)

与 invoke()（在模型完成生成完整响应后返回单个 AIMessage）不同，stream() 返回多个 AIMessageChunk 对象，每个对象包含部分输出文本。重要的是，流中的每个块都设计为通过求和聚合成完整的消息：

Construct an AIMessage

full = None  # None | AIMessageChunk
for chunk in model.stream("What color is the sky?"):
    full = chunk if full is None else full + chunk
    print(full.text)

# The
# The sky
# The sky is
# The sky is typically
# The sky is typically blue
# ...

print(full.content_blocks)
# [{"type": "text", "text": "The sky is typically blue..."}]

生成的消息可以像使用 invoke() 生成的消息一样处理——例如，它可以聚合到消息历史记录中，并作为对话上下文传递回模型。

只有当程序中的所有步骤都知道如何处理块流时，流式传输才有效。例如，一个需要在处理之前将整个输出存储在内存中的应用程序就不具备流式传输能力。

高级流式传输主题

"自动流式传输"聊天模型

LangChain 通过在某些情况下自动启用流式模式来简化从聊天模型的流式传输，即使您没有显式调用流式方法。当您使用非流式调用方法但仍希望流式传输整个应用程序（包括来自聊天模型的中间结果）时，这尤其有用。例如，在 LangGraph 智能体中，您可以在节点内调用 model.invoke()，但如果运行在流式模式下，LangChain 会自动委托给流式传输。

工作原理

当您 invoke() 一个聊天模型时，如果 LangChain 检测到您正尝试流式传输整个应用程序，它将自动切换到内部流式模式。就使用调用的代码而言，调用的结果将是相同的；然而，在聊天模型被流式传输时，LangChain 将负责在 LangChain 的回调系统中调用 on_llm_new_token 事件。回调事件允许 LangGraph stream() 和 astream_events() 实时显示聊天模型的输出。

流式传输事件

LangChain 聊天模型也可以使用 astream_events() 流式传输语义事件。这简化了基于事件类型和其他元数据的过滤，并将在后台聚合完整的消息。请参阅下面的示例。

async for event in model.astream_events("Hello"):

    if event["event"] == "on_chat_model_start":
        print(f"Input: {event['data']['input']}")

    elif event["event"] == "on_chat_model_stream":
        print(f"Token: {event['data']['chunk'].text}")

    elif event["event"] == "on_chat_model_end":
        print(f"Full message: {event['data']['output'].text}")

    else:
        pass

Input: Hello
Token: Hi
Token:  there
Token: !
Token:  How
Token:  can
Token:  I
...
Full message: Hi there! How can I help today?

有关事件类型和其他详细信息，请参阅 astream_events() 参考。

批量处理

将一批独立的请求批量发送到模型可以显著提高性能并降低成本，因为处理可以并行完成：

Batch

responses = model.batch([
    "Why do parrots have colorful feathers?",
    "How do airplanes fly?",
    "What is quantum computing?"
])
for response in responses:
    print(response)

本节描述的是聊天模型方法 batch()，它在客户端并行化模型调用。它不同于推理提供商支持的批量 API，例如 OpenAI 或 Anthropic。

默认情况下，batch() 仅返回整个批处理的最终输出。如果您希望在每个单独输入的输出生成完成时就接收它们，可以使用 batch_as_completed() 流式传输结果：

Yield batch responses upon completion

for response in model.batch_as_completed([
    "Why do parrots have colorful feathers?",
    "How do airplanes fly?",
    "What is quantum computing?"
]):
    print(response)

使用 batch_as_completed() 时，结果可能不按顺序到达。每个结果都包含输入索引，以便在需要时匹配以重建原始顺序。

当使用 batch() 或 batch_as_completed() 处理大量输入时，您可能希望控制最大并行调用数。这可以通过在 RunnableConfig 字典中设置 max_concurrency 属性来完成。

Batch with max concurrency

model.batch(
    list_of_inputs,
    config={
        'max_concurrency': 5,  # Limit to 5 parallel calls
    }
)

有关支持的属性的完整列表，请参阅 RunnableConfig 参考。

有关批处理的更多详细信息，请参阅参考。

工具调用

模型可以请求调用执行任务（例如从数据库获取数据、搜索网络或运行代码）的工具。工具是以下两者的配对：

一个模式，包括工具的名称、描述和/或参数定义（通常是 JSON 模式）
要执行的函数或。

您可能听说过术语”函数调用”。我们将其与”工具调用”互换使用。

要使您定义的工具可供模型使用，您必须使用 bind_tools() 绑定它们。在后续调用中，模型可以根据需要选择调用任何绑定的工具。一些模型提供商提供内置工具，可以通过模型或调用参数启用（例如 ChatOpenAI, ChatAnthropic）。有关详细信息，请查看相应的提供商参考。

有关创建工具的详细信息和其它选项，请参阅工具指南。

Binding user tools

from langchain.tools import tool

@tool
def get_weather(location: str) -> str:
    """Get the weather at a location."""
    return f"It's sunny in {location}."


model_with_tools = model.bind_tools([get_weather])  

response = model_with_tools.invoke("What's the weather like in Boston?")
for tool_call in response.tool_calls:
    # View tool calls made by the model
    print(f"Tool: {tool_call['name']}")
    print(f"Args: {tool_call['args']}")

绑定用户定义的工具时，模型的响应包括执行工具的请求。当将模型与智能体分开使用时，由您来执行请求的操作并将结果返回给模型以用于后续推理。请注意，当使用智能体时，智能体循环将为您处理工具执行循环。下面，我们展示了一些使用工具调用的常见方式。

工具执行循环

当模型返回工具调用时，您需要执行工具并将结果传递回模型。这创建了一个对话循环，模型可以在其中使用工具结果来生成其最终响应。LangChain 包含为您处理此编排的智能体抽象。这是一个如何执行此操作的简单示例：

Tool execution loop

# Bind (potentially multiple) tools to the model
model_with_tools = model.bind_tools([get_weather])

# Step 1: Model generates tool calls
messages = [{"role": "user", "content": "What's the weather in Boston?"}]
ai_msg = model_with_tools.invoke(messages)
messages.append(ai_msg)

# Step 2: Execute tools and collect results
for tool_call in ai_msg.tool_calls:
    # Execute the tool with the generated arguments
    tool_result = get_weather.invoke(tool_call)
    messages.append(tool_result)

# Step 3: Pass results back to model for final response
final_response = model_with_tools.invoke(messages)
print(final_response.text)
# "The current weather in Boston is 72°F and sunny."

每个由工具返回的 ToolMessage 都包含一个与原始工具调用匹配的 tool_call_id，帮助模型将结果与请求关联起来。

强制工具调用

默认情况下，模型可以根据用户的输入自由选择使用哪个绑定的工具。但是，您可能希望强制选择工具，确保模型使用特定工具或给定列表中的任何工具：

model_with_tools = model.bind_tools([tool_1], tool_choice="any")

并行工具调用

许多模型在适当时支持并行调用多个工具。这允许模型同时从不同来源收集信息。

Parallel tool calls

model_with_tools = model.bind_tools([get_weather])

response = model_with_tools.invoke(
    "What's the weather in Boston and Tokyo?"
)


# The model may generate multiple tool calls
print(response.tool_calls)
# [
#   {'name': 'get_weather', 'args': {'location': 'Boston'}, 'id': 'call_1'},
#   {'name': 'get_weather', 'args': {'location': 'Tokyo'}, 'id': 'call_2'},
# ]


# Execute all tools (can be done in parallel with async)
results = []
for tool_call in response.tool_calls:
    if tool_call['name'] == 'get_weather':
        result = get_weather.invoke(tool_call)
    ...
    results.append(result)

模型根据请求操作的独立性智能地确定何时适合并行执行。

大多数支持工具调用的模型默认启用并行工具调用。有些（包括 OpenAI 和 Anthropic）允许您禁用此功能。为此，设置 parallel_tool_calls=False：

model.bind_tools([get_weather], parallel_tool_calls=False)

流式传输工具调用

当流式传输响应时，工具调用通过 ToolCallChunk 逐步构建。这允许您在工具调用生成时就看到它们，而不是等待完整的响应。

Streaming tool calls

for chunk in model_with_tools.stream(
    "What's the weather in Boston and Tokyo?"
):
    # Tool call chunks arrive progressively
    for tool_chunk in chunk.tool_call_chunks:
        if name := tool_chunk.get("name"):
            print(f"Tool: {name}")
        if id_ := tool_chunk.get("id"):
            print(f"ID: {id_}")
        if args := tool_chunk.get("args"):
            print(f"Args: {args}")

# Output:
# Tool: get_weather
# ID: call_SvMlU1TVIZugrFLckFE2ceRE
# Args: {"lo
# Args: catio
# Args: n": "B
# Args: osto
# Args: n"}
# Tool: get_weather
# ID: call_QMZdy6qInx13oWKE7KhuhOLR
# Args: {"lo
# Args: catio
# Args: n": "T
# Args: okyo
# Args: "}

您可以累积块以构建完整的工具调用：

Accumulate tool calls

gathered = None
for chunk in model_with_tools.stream("What's the weather in Boston?"):
    gathered = chunk if gathered is None else gathered + chunk
    print(gathered.tool_calls)

结构化输出

可以要求模型以匹配给定模式的格式提供其响应。这对于确保输出易于解析并在后续处理中使用非常有用。LangChain 支持多种模式类型和强制执行结构化输出的方法。

Pydantic
TypedDict
JSON Schema

Pydantic 模型提供最丰富的功能集，包括字段验证、描述和嵌套结构。

from pydantic import BaseModel, Field

class Movie(BaseModel):
    """A movie with details."""
    title: str = Field(..., description="The title of the movie")
    year: int = Field(..., description="The year the movie was released")
    director: str = Field(..., description="The director of the movie")
    rating: float = Field(..., description="The movie's rating out of 10")

model_with_structure = model.with_structured_output(Movie)
response = model_with_structure.invoke("Provide details about the movie Inception")
print(response)  # Movie(title="Inception", year=2010, director="Christopher Nolan", rating=8.8)

TypedDict 使用 Python 的内置类型提供了一个更简单的替代方案，当您不需要运行时验证时非常理想。

from typing_extensions import TypedDict, Annotated

class MovieDict(TypedDict):
    """A movie with details."""
    title: Annotated[str, ..., "The title of the movie"]
    year: Annotated[int, ..., "The year the movie was released"]
    director: Annotated[str, ..., "The director of the movie"]
    rating: Annotated[float, ..., "The movie's rating out of 10"]

model_with_structure = model.with_structured_output(MovieDict)
response = model_with_structure.invoke("Provide details about the movie Inception")
print(response)  # {'title': 'Inception', 'year': 2010, 'director': 'Christopher Nolan', 'rating': 8.8}

为了最大限度地控制或实现互操作性，您可以提供原始的 JSON Schema。

import json

json_schema = {
    "title": "Movie",
    "description": "A movie with details",
    "type": "object",
    "properties": {
        "title": {
            "type": "string",
            "description": "The title of the movie"
        },
        "year": {
            "type": "integer",
            "description": "The year the movie was released"
        },
        "director": {
            "type": "string",
            "description": "The director of the movie"
        },
        "rating": {
            "type": "number",
            "description": "The movie's rating out of 10"
        }
    },
    "required": ["title", "year", "director", "rating"]
}

model_with_structure = model.with_structured_output(
    json_schema,
    method="json_schema",
)
response = model_with_structure.invoke("Provide details about the movie Inception")
print(response)  # {'title': 'Inception', 'year': 2010, ...}

结构化输出的关键考虑因素：

方法参数：一些提供商支持不同的方法（'json_schema'、'function_calling'、'json_mode'）
- 'json_schema' 通常指提供商提供的专用结构化输出功能
- 'function_calling' 通过强制按照给定模式进行工具调用来派生结构化输出
- 'json_mode' 是 'json_schema' 的前身，由一些提供商提供 - 它生成有效的 json，但模式必须在提示中描述
包含原始数据：使用 include_raw=True 来同时获取解析后的输出和原始的 AI 消息
验证：Pydantic 模型提供自动验证，而 TypedDict 和 JSON Schema 需要手动验证

示例：消息输出与解析后的结构一起返回

将原始的 AIMessage 对象与解析后的表示一起返回非常有用，以便访问响应元数据，例如令牌计数。为此，在调用 with_structured_output 时设置 include_raw=True：

from pydantic import BaseModel, Field

class Movie(BaseModel):
    """A movie with details."""
    title: str = Field(..., description="The title of the movie")
    year: int = Field(..., description="The year the movie was released")
    director: str = Field(..., description="The director of the movie")
    rating: float = Field(..., description="The movie's rating out of 10")

model_with_structure = model.with_structured_output(Movie, include_raw=True)  
response = model_with_structure.invoke("Provide details about the movie Inception")
response
# {
#     "raw": AIMessage(...),
#     "parsed": Movie(title=..., year=..., ...),
#     "parsing_error": None,
# }

示例：嵌套结构

模式可以是嵌套的：

from pydantic import BaseModel, Field

class Actor(BaseModel):
    name: str
    role: str

class MovieDetails(BaseModel):
    title: str
    year: int
    cast: list[Actor]
    genres: list[str]
    budget: float | None = Field(None, description="Budget in millions USD")

model_with_structure = model.with_structured_output(MovieDetails)

支持的模型

LangChain 支持所有主要的模型提供商，包括 OpenAI、Anthropic、Google、Azure、AWS Bedrock 等。每个提供商都提供具有不同功能的各种模型。有关 LangChain 中支持的模型的完整列表，请参阅集成页面。

高级主题

多模态

某些模型可以处理和返回非文本数据，如图像、音频和视频。您可以通过提供内容块将非文本数据传递给模型。

所有具有底层多模态功能的 LangChain 聊天模型都支持：

跨提供商标准格式的数据（请参阅我们的消息指南）
OpenAI 聊天补全格式
该特定提供商原生的任何格式（例如，Anthropic 模型接受 Anthropic 原生格式）

有关详细信息，请参阅消息指南的多模态部分。可以在其响应中返回多模态数据。如果被调用这样做，生成的 AIMessage 将包含具有多模态类型的内容块。

Multimodal output

response = model.invoke("Create a picture of a cat")
print(response.content_blocks)
# [
#     {"type": "text", "text": "Here's a picture of a cat"},
#     {"type": "image", "base64": "...", "mime_type": "image/jpeg"},
# ]

有关特定提供商的详细信息，请参阅集成页面。

推理

较新的模型能够执行多步推理以得出结论。这涉及将复杂问题分解为更小、更易管理的步骤。 如果底层模型支持， 您可以展示此推理过程，以更好地理解模型如何得出其最终答案。

for chunk in model.stream("Why do parrots have colorful feathers?"):
    reasoning_steps = [r for r in chunk.content_blocks if r["type"] == "reasoning"]
    print(reasoning_steps if reasoning_steps else chunk.text)

根据模型的不同，您有时可以指定它应在推理上投入的努力程度。同样，您可以要求模型完全关闭推理。这可能采用分类的推理”层级”（例如，'low' 或 'high'）或整数令牌预算的形式。有关详细信息，请参阅您相应聊天模型的集成页面或参考文档。

本地模型

LangChain 支持在您自己的硬件上本地运行模型。这对于数据隐私至关重要、您希望调用自定义模型或者希望避免使用基于云的模型所产生的成本的情况非常有用。 Ollama 是在本地运行模型的最简单方法之一。有关本地集成的完整列表，请参阅集成页面。

提示缓存

许多提供商提供提示缓存功能，以减少重复处理相同令牌时的延迟和成本。这些功能可以是隐式或显式的：

隐式提示缓存： 如果请求命中缓存，提供商会自动传递成本节省。例如：OpenAI 和 Gemini（Gemini 2.5 及以上版本）。
显式缓存： 提供商允许您手动指示缓存点，以进行更精细的控制或保证成本节省。例如：ChatOpenAI（通过 prompt_cache_key）、Anthropic 的 AnthropicPromptCachingMiddleware 和 cache_control 选项、AWS Bedrock、Gemini。

提示缓存通常仅在达到最小输入令牌阈值时才会启用。有关详细信息，请参阅提供商页面。

缓存使用情况将反映在模型响应的使用情况元数据中。

服务器端工具使用

一些提供商支持服务器端工具调用循环：模型可以在单个对话轮次中与网络搜索、代码解释器和其他工具交互并分析结果。如果模型在服务器端调用了一个工具，则响应消息的内容将包含表示工具调用和结果的内容。访问响应的内容块将以与提供商无关的格式返回服务器端工具调用和结果：

Invoke with server-side tool use

from langchain.chat_models import init_chat_model

model = init_chat_model("openai:gpt-4.1-mini")

tool = {"type": "web_search"}
model_with_tools = model.bind_tools([tool])

response = model_with_tools.invoke("What was a positive news story from today?")
response.content_blocks

Result

[
    {
        "type": "server_tool_call",
        "name": "web_search",
        "args": {
            "query": "positive news stories today",
            "type": "search"
        },
        "id": "ws_abc123"
    },
    {
        "type": "server_tool_result",
        "tool_call_id": "ws_abc123",
        "status": "success"
    },
    {
        "type": "text",
        "text": "Here are some positive news stories from today...",
        "annotations": [
            {
                "end_index": 410,
                "start_index": 337,
                "title": "article title",
                "type": "citation",
                "url": "..."
            }
        ]
    }
]

这代表单个对话轮次；没有像客户端工具调用中那样需要传入的关联 ToolMessage 对象。有关可用工具和使用详情，请参阅您给定提供商的集成页面。

速率限制

许多聊天模型提供商对在给定时间段内可以进行的调用次数施加了限制。如果您达到速率限制，通常会收到来自提供商的速率限制错误响应，并且需要等待才能发出更多请求。为了帮助管理速率限制，聊天模型集成在初始化时接受 rate_limiter 参数，该参数可用于控制发出请求的速率。

初始化和使用速率限制器

LangChain 附带（一个可选的）内置的 InMemoryRateLimiter。此限制器是线程安全的，可以在同一进程中的多个线程之间共享。

Define a rate limiter

from langchain_core.rate_limiters import InMemoryRateLimiter

rate_limiter = InMemoryRateLimiter(
    requests_per_second=0.1,  # 1 request every 10s
    check_every_n_seconds=0.1,  # Check every 100ms whether allowed to make a request
    max_bucket_size=10,  # Controls the maximum burst size.
)

model = init_chat_model(
    model="gpt-5",
    model_provider="openai",
    rate_limiter=rate_limiter  
)

提供的速率限制器只能限制单位时间内的请求数量。如果您还需要基于请求大小进行限制，它将无法提供帮助。

基础 URL 或代理

对于许多聊天模型集成，您可以配置 API 请求的基础 URL，这允许您使用具有 OpenAI 兼容 API 的模型提供商或使用代理服务器。

基础 URL

许多模型提供商提供 OpenAI 兼容的 API（例如，Together AI, vLLM）。您可以通过指定适当的 base_url 参数，将 init_chat_model 与这些提供商一起使用：

model = init_chat_model(
    model="MODEL_NAME",
    model_provider="openai",
    base_url="BASE_URL",
    api_key="YOUR_API_KEY",
)

当使用直接的聊天模型类实例化时，参数名称可能因提供商而异。请查看相应的参考文档了解详情。

代理配置

对于需要 HTTP 代理的部署，某些模型集成支持代理配置：

from langchain_openai import ChatOpenAI

model = ChatOpenAI(
    model="gpt-4o",
    openai_proxy="http://proxy.example.com:8080"
)

代理支持因集成而异。请查看特定模型提供商的参考文档了解代理配置选项。

对数概率

某些模型可以通过在初始化模型时设置 logprobs 参数来配置为返回令牌级别的对数概率，表示给定令牌的可能性：

model = init_chat_model(
    model="gpt-4o",
    model_provider="openai"
).bind(logprobs=True)

response = model.invoke("Why do parrots talk?")
print(response.response_metadata["logprobs"])

令牌使用情况

许多模型提供商将令牌使用情况信息作为调用响应的一部分返回。当可用时，此信息将包含在相应模型生成的 AIMessage 对象上。有关更多详细信息，请参阅消息指南。

一些提供商 API，特别是 OpenAI 和 Azure OpenAI 聊天补全，要求用户在流式上下文中选择接收令牌使用情况数据。有关详细信息，请参阅集成指南的流式使用情况元数据部分。

您可以使用回调或上下文管理器跟踪应用程序中跨模型的聚合令牌计数，如下所示：

回调处理器
上下文管理器

from langchain.chat_models import init_chat_model
from langchain_core.callbacks import UsageMetadataCallbackHandler

model_1 = init_chat_model(model="openai:gpt-4o-mini")
model_2 = init_chat_model(model="anthropic:claude-3-5-haiku-latest")

callback = UsageMetadataCallbackHandler()
result_1 = model_1.invoke("Hello", config={"callbacks": [callback]})
result_2 = model_2.invoke("Hello", config={"callbacks": [callback]})
callback.usage_metadata

{
    'gpt-4o-mini-2024-07-18': {
        'input_tokens': 8,
        'output_tokens': 10,
        'total_tokens': 18,
        'input_token_details': {'audio': 0, 'cache_read': 0},
        'output_token_details': {'audio': 0, 'reasoning': 0}
    },
    'claude-3-5-haiku-20241022': {
        'input_tokens': 8,
        'output_tokens': 21,
        'total_tokens': 29,
        'input_token_details': {'cache_read': 0, 'cache_creation': 0}
    }
}

from langchain.chat_models import init_chat_model
from langchain_core.callbacks import get_usage_metadata_callback

model_1 = init_chat_model(model="openai:gpt-4o-mini")
model_2 = init_chat_model(model="anthropic:claude-3-5-haiku-latest")

with get_usage_metadata_callback() as cb:
    model_1.invoke("Hello")
    model_2.invoke("Hello")
    print(cb.usage_metadata)

{
    'gpt-4o-mini-2024-07-18': {
        'input_tokens': 8,
        'output_tokens': 10,
        'total_tokens': 18,
        'input_token_details': {'audio': 0, 'cache_read': 0},
        'output_token_details': {'audio': 0, 'reasoning': 0}
    },
    'claude-3-5-haiku-20241022': {
        'input_tokens': 8,
        'output_tokens': 21,
        'total_tokens': 29,
        'input_token_details': {'cache_read': 0, 'cache_creation': 0}
    }
}

调用配置

当调用模型时，您可以通过 config 参数使用 RunnableConfig 字典传递额外的配置。这提供了对执行行为、回调和元数据跟踪的运行时控制。常见的配置选项包括：

Invocation with config

response = model.invoke(
    "Tell me a joke",
    config={
        "run_name": "joke_generation",      # Custom name for this run
        "tags": ["humor", "demo"],          # Tags for categorization
        "metadata": {"user_id": "123"},     # Custom metadata
        "callbacks": [my_callback_handler], # Callback handlers
    }
)

这些配置值在以下情况下特别有用：

使用 LangSmith 跟踪进行调试
实现自定义日志记录或监控
控制生产中的资源使用
跨复杂管道跟踪调用

关键配置属性

run_name

string

在日志和跟踪中标识此特定调用。不会被子调用继承。

可配置模型

您还可以通过指定 configurable_fields 来创建运行时可配置的模型。如果您不指定模型值，那么 'model' 和 'model_provider' 将默认是可配置的。

from langchain.chat_models import init_chat_model

configurable_model = init_chat_model(temperature=0)

configurable_model.invoke(
    "what's your name",
    config={"configurable": {"model": "gpt-5-nano"}},  # Run with GPT-5-Nano
)
configurable_model.invoke(
    "what's your name",
    config={"configurable": {"model": "claude-sonnet-4-5"}},  # Run with Claude
)

具有默认值的可配置模型

我们可以创建一个具有默认模型值的可配置模型，指定哪些参数是可配置的，并为可配置参数添加前缀：

first_model = init_chat_model(
        model="gpt-4.1-mini",
        temperature=0,
        configurable_fields=("model", "model_provider", "temperature", "max_tokens"),
        config_prefix="first",  # Useful when you have a chain with multiple models
)

first_model.invoke("what's your name")

first_model.invoke(
    "what's your name",
    config={
        "configurable": {
            "first_model": "claude-sonnet-4-5",
            "first_temperature": 0.5,
            "first_max_tokens": 100,
        }
    },
)

以声明方式使用可配置模型

我们可以在可配置模型上调用声明性操作，如 bind_tools、with_structured_output、with_configurable 等，并以与常规实例化的聊天模型对象相同的方式链接可配置模型。

from pydantic import BaseModel, Field

class GetWeather(BaseModel):
    """Get the current weather in a given location"""

        location: str = Field(..., description="The city and state, e.g. San Francisco, CA")

class GetPopulation(BaseModel):
    """Get the current population in a given location"""

        location: str = Field(..., description="The city and state, e.g. San Francisco, CA")

model = init_chat_model(temperature=0)
model_with_tools = model.bind_tools([GetWeather, GetPopulation])

model_with_tools.invoke(
    "what's bigger in 2024 LA or NYC", config={"configurable": {"model": "gpt-4.1-mini"}}
).tool_calls

[
    {
        'name': 'GetPopulation',
        'args': {'location': 'Los Angeles, CA'},
        'id': 'call_Ga9m8FAArIyEjItHmztPYA22',
        'type': 'tool_call'
    },
    {
        'name': 'GetPopulation',
        'args': {'location': 'New York, NY'},
        'id': 'call_jh2dEvBaAHRaw5JUDthOs7rt',
        'type': 'tool_call'
    }
]

model_with_tools.invoke(
    "what's bigger in 2024 LA or NYC",
        config={"configurable": {"model": "claude-sonnet-4-5"}},
).tool_calls

[
    {
        'name': 'GetPopulation',
        'args': {'location': 'Los Angeles, CA'},
        'id': 'toolu_01JMufPf4F4t2zLj7miFeqXp',
        'type': 'tool_call'
    },
    {
        'name': 'GetPopulation',
        'args': {'location': 'New York City, NY'},
        'id': 'toolu_01RQBHcE8kEEbYTuuS8WqY1u',
        'type': 'tool_call'
    }
]

Edit the source of this page on GitHub.

Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.

LangChain v1.0

开始

核心组件

Advanced usage

Use in production

基本用法

初始化模型

关键方法

Invoke

Stream

Batch

参数

调用

调用

流式传输

工作原理

批量处理

工具调用

结构化输出

支持的模型

高级主题

多模态

推理

本地模型

提示缓存

服务器端工具使用

速率限制

基础 URL 或代理

对数概率

令牌使用情况

调用配置

可配置模型

LangChain v1.0

开始

核心组件

Advanced usage

Use in production

​基本用法

​初始化模型

​关键方法

Invoke

Stream

Batch

​参数

​调用

​调用

​流式传输

​工作原理

​批量处理

​工具调用

​结构化输出

​支持的模型

​高级主题

​多模态

​推理

​本地模型

​提示缓存

​服务器端工具使用

​速率限制

​基础 URL 或代理

​对数概率

​令牌使用情况

​调用配置

​可配置模型

基本用法

初始化模型

关键方法

参数

调用

调用

流式传输

工作原理

批量处理

工具调用

结构化输出

支持的模型

高级主题

多模态

推理

本地模型

提示缓存

服务器端工具使用

速率限制

基础 URL 或代理

对数概率

令牌使用情况

调用配置

可配置模型