用 Python 从零构建 AI Agent：Tool Use 模式实战详解

大模型不只是聊天机器人。通过 Tool Use（工具调用）模式，Python 几十行代码就能让 LLM 变成一个能查天气、读数据库、调 API 的真正 Agent。本文从原理到完整代码，带你搞懂这个 2026 年最重要的 AI 编程范式。

为什么 Tool Use 是 AI Agent 的核心

你可能已经用过 ChatGPT 的联网搜索、代码执行、文件分析——这些功能背后都是同一个模式：Tool Use（也叫 Function Calling）。

原理很简单：

你告诉 LLM：「这里有一些工具，每个工具的名字、参数、用途如下」
LLM 分析用户请求，决定是否需要调用工具
如果需要，LLM 输出一个结构化的工具调用（JSON）
你的代码执行这个调用，把结果返回给 LLM
LLM 基于结果生成最终回答

用户请求 → LLM 思考 → 调用工具 → 执行 → 结果返回 → LLM 整合回答

这不是简单的 prompt 技巧。它让 LLM 从一个「只会说话」的模型变成了一个能做事的 Agent。

环境准备

pip install anthropic  # 本文用 Claude 做示例，OpenAI 类似

你需要一个 API Key。本文使用 Anthropic 的 Claude API，但同样的模式适用于 OpenAI、Google Gemini、AWS Bedrock 等所有主流提供商。

第一步：定义工具

工具定义是一个 JSON Schema，告诉 LLM 这个工具能做什么、需要什么参数：

import anthropic
import json

# 定义可用工具
tools = [
    {
        "name": "get_weather",
        "description": "获取指定城市的当前天气信息，包括温度、湿度、天气状况",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {
                    "type": "string",
                    "description": "城市名，例如 '北京'、'上海'、'Tokyo'"
                },
                "unit": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "温度单位，默认摄氏度"
                }
            },
            "required": ["city"]
        }
    },
    {
        "name": "search_database",
        "description": "在产品数据库中搜索商品信息",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": "搜索关键词"
                },
                "max_results": {
                    "type": "integer",
                    "description": "最大返回数量，默认 5"
                }
            },
            "required": ["query"]
        }
    }
]

关键点：description 字段极其重要。LLM 靠它决定何时调用哪个工具。写得越清晰，调用越准确。

第二步：实现工具函数

import requests

def get_weather(city: str, unit: str = "celsius") -> dict:
    """实际调用天气 API"""
    # 这里用 Open-Meteo 免费 API 做示例
    # 实际项目中可替换为任何天气服务
    geocode_url = f"https://geocoding-api.open-meteo.com/v1/search?name={city}&count=1"
    geo_resp = requests.get(geocode_url).json()

    if not geo_resp.get("results"):
        return {"error": f"找不到城市: {city}"}

    lat = geo_resp["results"][0]["latitude"]
    lon = geo_resp["results"][0]["longitude"]

    weather_url = (
        f"https://api.open-meteo.com/v1/forecast?"
        f"latitude={lat}&longitude={lon}"
        f"&current=temperature_2m,relative_humidity_2m,weather_code"
    )
    weather_resp = requests.get(weather_url).json()
    current = weather_resp["current"]

    return {
        "city": city,
        "temperature": current["temperature_2m"],
        "humidity": current["relative_humidity_2m"],
        "unit": unit,
        "weather_code": current["weather_code"]
    }


def search_database(query: str, max_results: int = 5) -> dict:
    """模拟数据库搜索"""
    # 实际项目中这里连真正的数据库
    mock_products = [
        {"name": "机械键盘 K8 Pro", "price": 599, "stock": 42},
        {"name": "无线鼠标 M750", "price": 299, "stock": 108},
        {"name": "USB-C 扩展坞", "price": 399, "stock": 15},
    ]
    results = [p for p in mock_products if query.lower() in p["name"].lower()]
    return {"results": results[:max_results], "total": len(results)}


# 工具调度器
TOOL_FUNCTIONS = {
    "get_weather": get_weather,
    "search_database": search_database,
}

第三步：Agent 循环

这是核心——一个循环，让 LLM 可以连续调用多个工具，直到它认为可以给出最终回答：

def run_agent(user_message: str) -> str:
    """运行 Agent，支持多轮工具调用"""
    client = anthropic.Anthropic()

    messages = [{"role": "user", "content": user_message}]

    while True:
        # 调用 LLM
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=4096,
            tools=tools,
            messages=messages,
            system="你是一个实用的助手。当需要查询实时信息时，使用提供的工具。"
        )

        # 检查是否需要工具调用
        if response.stop_reason == "end_turn":
            # LLM 认为可以直接回答了
            text_blocks = [b.text for b in response.content if b.type == "text"]
            return "\n".join(text_blocks)

        if response.stop_reason == "tool_use":
            # 收集所有工具调用
            tool_results = []
            for block in response.content:
                if block.type == "tool_use":
                    # 执行工具
                    func = TOOL_FUNCTIONS.get(block.name)
                    if func:
                        result = func(**block.input)
                    else:
                        result = {"error": f"未知工具: {block.name}"}

                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": json.dumps(result, ensure_ascii=False)
                    })

            # 把 LLM 的回复和工具结果都加入消息历史
            messages.append({"role": "assistant", "content": response.content})
            messages.append({"role": "user", "content": tool_results})
        else:
            # 其他停止原因（不太常见）
            return "Agent 执行异常"

运行效果

# 测试
answer = run_agent("北京今天天气怎么样？适合出门跑步吗？")
print(answer)

输出类似：

北京现在的天气是 22°C，湿度 45%，晴天。非常适合出门跑步！
温度舒适，湿度不高，建议做好防晒。

LLM 自动决定调用 get_weather，拿到实时数据，再结合数据给出建议。整个过程对用户透明。

进阶：并行工具调用

现代 LLM 支持一次输出多个工具调用（Parallel Tool Use）。比如用户问「北京和东京的天气对比」，LLM 会同时输出两个 get_weather 调用。上面的代码已经支持了这种模式——循环处理所有 tool_use block。

进阶：错误处理与重试

生产环境必须处理工具执行失败的情况：

def execute_tool_safely(func, params: dict) -> str:
    """安全执行工具，捕获异常"""
    try:
        result = func(**params)
        return json.dumps(result, ensure_ascii=False)
    except Exception as e:
        # 把错误信息返回给 LLM，让它决定怎么办
        return json.dumps({
            "error": str(e),
            "suggestion": "工具调用失败，请尝试其他方式或告知用户"
        }, ensure_ascii=False)

把错误返回给 LLM 而不是直接崩溃——这是 Agent 和普通脚本的关键区别。LLM 能根据错误信息调整策略，比如换个参数重试，或者告诉用户「当前服务不可用」。

进阶：工具调用的安全边界

Tool Use 的风险在于LLM 决定调用什么。如果你的工具包含 delete_file 或 send_email，必须加安全层：

# 危险操作需要确认
DANGEROUS_TOOLS = {"delete_record", "send_email", "execute_sql"}

def run_agent_with_confirmation(user_message: str, confirm_callback=None):
    # ... 类似上面的循环 ...
    for block in response.content:
        if block.type == "tool_use" and block.name in DANGEROUS_TOOLS:
            if confirm_callback and not confirm_callback(block.name, block.input):
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": json.dumps({"error": "用户拒绝了此操作"})
                })
                continue
        # ... 正常执行 ...

原则：读操作自动执行，写操作需要确认。

设计好工具的几个技巧

工具粒度要适中——太细（open_file, read_line, close_file）LLM 调用链太长；太粗（do_everything）失去灵活性
参数用枚举约束——"enum": ["celsius", "fahrenheit"] 比自由文本更不容易出错
description 写给 LLM 看——不是写给人看的文档，是 LLM 的决策依据
返回结构化数据——让 LLM 能提取和组合信息，别返回大段自然语言
限制工具数量——一般 10-20 个以内。太多会降低选择准确率

与 OpenAI 的差异

如果你用 OpenAI 的 API，工具定义格式稍有不同：

# OpenAI 格式
tools_openai = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "获取天气信息",
        "parameters": {  # 注意这里叫 parameters，不是 input_schema
            "type": "object",
            "properties": { ... },
            "required": ["city"]
        }
    }
}]

核心模式完全一样，只是 JSON 结构的包装层不同。

现实中的 Agent 架构

生产级 Agent 通常还需要：

记忆管理：把历史对话压缩或存入向量数据库，避免上下文窗口爆炸
工具路由：根据用户意图动态加载工具集（不是每次都带 20 个工具）
可观测性：记录每次工具调用的输入输出、耗时、token 消耗
流式输出：边思考边输出文字，工具调用时显示 loading 状态
多 Agent 协作：复杂任务拆分给不同专长的 Agent

这些是下一步的话题，但核心循环就是上面那 40 行代码。

总结

Tool Use 模式是当前 AI Agent 的基础范式：

定义工具（JSON Schema）
实现工具（普通 Python 函数）
跑循环（LLM 调用 → 执行工具 → 返回结果 → 继续直到完成）

掌握这个模式，你就能用 Python 把任何 API、数据库、系统命令接入 LLM，构建真正有用的 AI Agent。不需要框架，不需要魔法——只需要理解这个循环。

延伸阅读：

Anthropic Tool Use 文档
OpenAI Function Calling 指南
OpenClaw Skills 系统 — 一个生产级的 Tool Use 实现参考