"One of the hardest parts of building an agent harness is constructing its action space." 构建 Agent 框架最难的部分之一,是设计它的动作空间。 —— Anthropic, Lessons from Building Claude Code: Seeing like an Agent
"Claude Code currently has ~20 tools, and we are constantly asking ourselves if we need all of them. The bar to add a new tool is high, because this gives the model one more option to think about." Claude Code 目前有约 20 个工具,我们一直在问自己是否每一个都有必要。添加新工具的门槛很高,因为每多一个工具,就多给模型一个需要思考的选项。
"Experiment often, read your outputs, try new things. See like an agent." 多做实验,审视输出,尝试新方法。学会站在 Agent 的视角看问题。
根源:大模型的输出是概率性的
要理解为什么给大模型设计工具和给程序设计 API 是两件完全不同的事,需要先回到一个根本事实:大模型的本质是一个概率函数。
Anthropic 在构建 Claude Code 的过程中反复踩坑、反复重构,最终沉淀出一句话:See like an agent——站在模型的位置看问题。当你理解了模型是一个概率系统,你就知道工具设计的目标不是"暴露更多能力",而是"让每一步选择都尽可能确定"。
以下四条原则,都是这一根源的具体推论。
一、输出克制:给对的数据,给够用的量
"To put myself in the mind of the model I like to imagine being given a difficult math problem. What tools would you want in order to solve it?" 为了站在模型的角度思考,我喜欢想象自己面对一道数学难题。你会想要什么工具来解题?
"This worked but we found that Claude would load a lot of results into context to find the right answer when really all you needed was the answer." 这个方法有效,但我们发现 Claude 会把大量结果加载到上下文中来寻找正确答案,而实际上你只需要那个答案本身。
"By giving Claude a Grep tool, we could let it search for files and build context itself." 通过给 Claude 一个 Grep 工具,我们可以让它自己搜索文件、自己构建上下文。
Claude Code 的搜索设计经历了一次关键转变:从 RAG 自动喂上下文,到给 Grep 工具让模型自己搜。区别不在于技术实现,在于谁来做判断。RAG 替模型做了"什么信息是相关的"这个判断,但这个判断可能是错的。而 Grep 只是一个搜索工具,模型自己决定搜什么、怎么搜、搜到的结果是否有用。
"This is a pattern we've seen as Claude gets smarter, it becomes increasingly good at building its context if it's given the right tools." 我们观察到一个规律:随着 Claude 变得更聪明,只要给它合适的工具,它构建自身上下文的能力就越来越强。
"While Claude could just ask questions in plain text, we found answering those questions felt like they took an unnecessary amount of time. How could we lower this friction?" 虽然 Claude 可以用纯文本提问,但我们发现回答这些问题花了不必要的时间。如何降低这种摩擦?
Claude Code 早期的 TodoWrite 工具就是一个有状态设计的教训。模型需要在开头写入 Todo 列表,执行过程中逐项勾选,还要时刻维护列表的一致性。为了防止模型忘记列表内容,团队甚至每 5 轮插入一次系统提醒。
"Being sent reminders of the todo list made Claude think that it had to stick to the list instead of modifying it." 不断收到待办列表的提醒,反而让 Claude 觉得自己必须严格遵守列表,而不是根据实际情况去修改它。
"Claude Code currently has ~20 tools, and we are constantly asking ourselves if we need all of them. The bar to add a new tool is high, because this gives the model one more option to think about." Claude Code 目前有约 20 个工具,我们一直在问自己是否每一个都有必要。添加新工具的门槛很高,因为每多一个工具,就多给模型一个需要思考的选项。
"Paper would be the minimum, but you'd be limited by manual calculations. A calculator would be better, but you would need to know how to operate the more advanced options. The fastest and most powerful option would be a computer, but you would have to know how to use it to write and execute code." 纸笔是最低配置,但你会受限于手工计算。计算器更好,但你需要会用高级功能。最快最强的是电脑,但你得会写代码。
"This is a useful framework for designing your agent. You want to give it tools that are shaped to its own abilities." 这是设计 Agent 的一个实用框架:给它与自身能力匹配的工具。
"As model capabilities increase, the tools that your models once needed might now be constraining them. It's important to constantly revisit previous assumptions on what tools are needed." 随着模型能力的提升,曾经需要的工具可能正在限制它们。不断回顾"需要哪些工具"这一假设,至关重要。