Agent 框架 OWL 原理详解

文章深入剖析了开源 Agent 框架 OWL 的技术实现细节。OWL 在 agent benchmark GAIA 上是表现最好的开源 agent 框架。首先介绍了 OWL 作为基于 camel 库的多 Agent 架构，包含 User Agent 和 Assistant Agent 的交互机制，以及 Critic model 的评估功能。然后详细解析了任务改写(TaskSpecifyAgent)和任务计划(TaskPlannerAgent)两个预处理环节的实现原理，以及记忆管理机制(ChatHistoryMemory、VectorDBMemory 等)。通过具体示例展示了 User Agent 和 Assistant Agent 的交互流程，包括系统提示词设计、工具调用机制和任务完成判断标准。最后全面介绍了 OWL 框架中的各类 Agent 子类和丰富的工具组件，特别是 BrowserToolkit 等复杂工具的实现细节。

OWL是一个优秀的开源agent框架，在agent benchmark GAIA上，OWL是表现最好的开源agent框架。虽然deerflow star也不少，但是基于langgraph的实现不太好细究agent底层实现原理。因此本文以OWL代码为例，讲解agent框架的细节实现。OWL其实是对camel库的进一步封装，比如像agent、toolkit等核心代码内容都是在camel库中实现的。

github仓库地址如下：

https://github.com/camel-ai/camelhttps://github.com/camel-ai/owl

官方提供的系统架构图如下所示，可以看到OWL是一个多agent架构，user agent和assistant agent不断的交互，去完善问题的答案。assistant agent比较好理解，去调用各种工具解决用户提出的问题，这些工具自己本身也可以是一个agent，比如web agent，采用了ReACT架构，比assistant Agent还要复杂。aser Agent则是人类用户的一个代理，帮助人类用户去监督assistant agent更好的解决问题。

OWL在具体的代码实现上流程更加复杂一些，笔者看的代码入口为：owl\examples\run.py，应该是最简单的一个启动demo了，所有agent使用的模型都是GPT4O，结构图如下所示。

框架中涉及到agent都是通过设置system prompt告诉他们需要做什么工作的，所有角色的system prompt在camel/camel/prompts/ai_society.py路径下。

除了上文提到的assiatant agent和user agent不断交互外，在任务开始时候，还有任务改写和任务计划两个环节，这两个环节是非必须的。

任务改写由TaskSpecifyAgent类完成，目的是借助LLM充分理解人类用户的意图，并细化任务，实现原理也很简单，只是去调用一次LLM，没有什么复杂的流程，使用的system prompt如下。细化后的任务描述会替换掉原来的人类提供的任务描述，供下游User Agent和Assistant Agent使用。

Here is a task that {assistant_role} will help {user_role} to complete: {task}.Please make it more specific. Be creative and imaginative.Please reply with the specified task in {word_limit} words or less. Do not add anything else.=============================================中文翻译分割线=============================================这是一个任务，{助理角色} 将帮助 {用户角色} 完成：{任务}。请使其更具体。发挥创意和想象力。请用不超过 {字数限制} 的文字回复指定任务，不要添加其他内容。

任务计划由TaskPlannerAgent类完成，使用的input prompt如下。任务计划生成后会和人类的任务描述（或细化后的任务描述）拼接起来，供下游User Agent和Assistant Agent使用。

Divide this task into subtasks: {task}. Be concise.=============================================中文翻译分割线=============================================请将此任务分解为子任务：{task}。要求简洁。

user agent和assistant交互过程中，critic model作为第三者，用来选择一个user / assistant agent的最佳答案给到assistant / user agent。critic model可以是人类的判断，让人工介入进行选择，在Human类中实现。OWL是camel库的更高层的封装，Human类在camel库的camel/human.py中实现。critic model也可以是一个LLM，由CriticAgent类实现，使用system prompt如下：

You are a {critic_role} who teams up with a {user_role} and a {assistant_role} to solve a task: {task}.Your job is to select an option from their proposals and provides your explanations.Your selection criteria are {criteria}.You always have to choose an option from the proposals.=============================================中文翻译分割线=============================================你是一位 {批评者角色}，与一位 {用户角色} 和一位 {助手角色} 合作来解决一项任务：{任务}。你的工作是从他们的提议中选择一个选项，并给出你的解释。你的选择标准是 {criteria（标准内容）}。你必须始终从这些提议中选择一个选项。

User Agent和Assistant Agent之间的交互

本小节笔者用一个例子来展示User Agent和Assistant Agent之间的交互细节，具体任务如下。由于任务比较简单，没涉及TaskSpecifyAgent和TaskPlannerAgent。

帮我查询一下作家王小波的3部作品

首先看下user agent和assistant agent的system prompt，这是理解这两个agent需要干什么的核心，也同时给出了中文翻译方便读者阅读。

user agent的system prompt：

===== RULES OF USER =====Never forget you are a {user_role} and I am a {assistant_role}. Never flip roles! You will always instruct me.We share a common interest in collaborating to successfully complete a task.I must help you to complete the task.Here is the task: {task}. Never forget our task!You must instruct me based on my expertise and your needs to solve the task ONLY in the following two ways:
1. Instruct with a necessary input:Instruction: <YOUR_INSTRUCTION>Input: <YOUR_INPUT>
2. Instruct without any input:Instruction: <YOUR_INSTRUCTION>Input: None
The "Instruction" describes a task or question. The paired "Input" provides further context or information for the requested "Instruction".
You must give me one instruction at a time.I must write a response that appropriately solves the requested instruction.I must decline your instruction honestly if I cannot perform the instruction due to physical, moral, legal reasons or my capability and explain the reasons.You should instruct me not ask me questions.Now you must start to instruct me using the two ways described above.Do not add anything else other than your instruction and the optional corresponding input!Keep giving me instructions and necessary inputs until you think the task is completed.When the task is completed, you must only reply with a single word <CAMEL_TASK_DONE>.Never say <CAMEL_TASK_DONE> unless my responses have solved your task."""=============================================中文翻译分割线================================================== 用户规则 =====永远不要忘记你是 {用户角色}，而我是 {助手角色}。永远不要颠倒角色！你将始终指导我。我们有着共同的兴趣，即通过协作成功完成任务。我必须帮助你完成任务。以下是任务：{任务}。永远不要忘记我们的任务！你必须仅通过以下两种方式，基于我的专业知识和你的需求来指导我解决任务：带必要输入的指导：指令：<你的指令>输入：< 你的输入 >不带任何输入的指导：指令：<你的指令>输入：无“指令” 描述一项任务或问题。配对的 “输入” 为所请求的 “指令” 提供进一步的上下文或信息。你必须一次只给我一条指令。我必须写出适当解决所请求指令的回复。如果由于物理、道德、法律原因或我的能力限制而无法执行指令，我必须诚实地拒绝并解释原因。你应该指导我，而不是问我问题。现在你必须开始使用上述两种方式指导我。除了你的指令和可选的相应输入外，不要添加任何其他内容！继续给我指令和必要的输入，直到你认为任务完成。当任务完成时，你必须只回复一个词 <CAMEL_TASK_DONE>。除非我的回复已经解决了你的任务，否则永远不要说 < CAMEL_TASK_DONE>。

Assistant Agent的System Prompt：

"""===== RULES OF ASSISTANT =====Never forget you are a {assistant_role} and I am a {user_role}. Never flip roles! Never instruct me!We share a common interest in collaborating to successfully complete a task.You must help me to complete the task.Here is the task: {task}. Never forget our task!I must instruct you based on your expertise and my needs to complete the task.
I must give you one instruction at a time.You must write a specific solution that appropriately solves the requested instruction and explain your solutions.You must decline my instruction honestly if you cannot perform the instruction due to physical, moral, legal reasons or your capability and explain the reasons.Unless I say the task is completed, you should always start with:
Solution: <YOUR_SOLUTION>
<YOUR_SOLUTION> should be very specific, include detailed explanations and provide preferable detailed implementations and examples and lists for task-solving.Always end <YOUR_SOLUTION> with: Next request."""=============================================中文翻译分割线================================================== 助手规则 =====永远不要忘记你是 {助手角色}，而我是 {用户角色}。永远不要颠倒角色！永远不要指导我！我们有着共同的兴趣，即通过协作成功完成任务。你必须帮助我完成任务。以下是任务：{任务}。永远不要忘记我们的任务！我必须基于你的专业知识和我的需求来指导你完成任务。
我必须一次只给你一条指令。你必须写出能恰当解决所请求指令的具体解决方案，并解释你的解决方案。如果由于物理、道德、法律原因或你的能力限制而无法执行指令，你必须诚实地拒绝并解释原因。除非我说任务已完成，否则你应该始终以以下内容开头：
解决方案：<你的解决方案>
<你的解决方案> 应非常具体，包括详细的解释，并最好提供用于解决任务的详细实现、示例和列表。始终以 “下一步请求。” 结束 < 你的解决方案 >

user和assistant agent的system prompt开头都是“Never forget you are a {xxx} and I am a {xxx}.”，因为是两个agent交互，这里的“I”和人类用户无关，指的是对面的角色。人类下达的任务task则是以第三人称视角放到了两个Agent的system prompt里边。Assistant Agent负责解决问题，可以调用工具。User Agent负责监督Assistant Agent干活，每轮给Assistant Agent一些指导意见。同时User Agent也负责判断Assistant Agent是否完成了任务，如果完成了，给出 < CAMEL_TASK_DONE>的信号，结束交互。

接下来，就详细看下“帮我查询一下作家王小波的3部作品”（tips：人类的问题在system prompt里，并不在agent的输入输出中）这个case下owl框架是如何运作的。assistant/user agent的system prompt每轮都一样，就不重复粘贴了，只展示agent的输入&输出部分。

ROUND 1

在user&assistant agent交互过程中，是user agent先产生一个回答当作assistant agent的输入，user agent的输入在round 0是固定的内容，以 assistant agent的口吻让user agent给一些指导。

user agent 输入：这块强调了尽量使用工具来回答任务，不然模型会优先用自身知识进行回答。

Now please give me instructions to solve over overall task step by step. If the task requires some specific knowledge, please instruct me to use tools to complete the task.=============================================中文翻译分割线=============================================现在请逐步给我下达指令，以解决整个任务。如果该任务需要一些特定知识，请指示我使用工具来完成任务。

user agent 输出 & assistant agent输入

Instruction: Search for the titles of three works by the writer Wang Xiaobo.  \nInput: None =============================================中文翻译分割线=============================================说明：搜索作家王小波三部作品的标题。\n输入：无

assistant agent输出：assistant agent触发了工具调用，尝试调用duckduckgo搜索工具，但是因为网络问题没有检索成功，最终反馈自己没有检索成功。

# 工具调用{'role': 'assistant', 'content': '', 'tool_calls': [{'id': 'call_pr82X9unq1AukguVVH8OfnBH', 'type': 'function', 'function': {'name': 'search_duckduckgo', 'arguments': '{"max_results": 3, "query": "王小波 作品", "source": "text"}'}}]}, {'role': 'tool', 'content': '{\'error\': "Error executing tool \'search_duckduckgo\': Execution of function search_duckduckgo failed with arguments () and {\'max_results\': 3, \'query\': \'王小波 作品\', \'source\': \'text\'}. Error: https://duckduckgo.com RuntimeError: error sending request for url (https://duckduckgo.com/?q=%E7%8E%8B%E5%B0%8F%E6%B3%A2+%E4%BD%9C%E5%93%81): operation timed out\\n\\nCaused by:\\n    operation timed out"}', 'tool_call_id': 'call_pr82X9unq1AukguVVH8OfnBH'}]# 回答Solution: It appears that online search tools are currently inaccessible or experiencing technical difficulties, hence I cannot complete the search using external sources. \n\nTo proceed, let me know if you would like alternative assistance or another approach. For example, if Wang Xiaobo's works are available in a document or specific content you have, I can analyze that instead. Next request.=============================================中文翻译分割线=============================================解决方案：似乎在线搜索工具目前无法访问或遇到技术困难，因此我无法使用外部资源完成搜索。\n\n要继续，请告诉我您是否需要其他帮助或其他方法。例如，如果王小波的作品存在于你的文档或特定内容中，我可以进行分析。下一个请求。

ROUND 2

user agent 输入：user agent输入为上一轮assistant的输出，同时agent保存了历史的对话内容，也会一并输入到模型里（这块省略了没有粘贴）。

Solution: It appears that online search tools are currently inaccessible or experiencing technical difficulties, hence I cannot complete the search using external sources. \n\nTo proceed, let me know if you would like alternative assistance or another approach. For example, if Wang Xiaobo's works are available in a document or specific content you have, I can analyze that instead. Next request.=============================================中文翻译分割线=============================================解决方案：似乎在线搜索工具目前无法访问或遇到技术困难，因此我无法使用外部资源完成搜索。\n\n要继续，请告诉我您是否需要其他帮助或其他方法。例如，如果王小波的作品存在于你的文档或特定内容中，我可以进行分析。下一个请求。

user agent输出 & assistant agent的输入：既然检索失败，user agent指导assistant agent根据自己的内在知识进行回答。

Instruction: Provide three titles of Wang Xiaobo’s works based on knowledge available up to October 2023.  \nInput: None =============================================中文翻译分割线=============================================说明：根据截至2023年10月的知识，提供王小波作品的三个标题。\n输入：无

assistant agent的输出：assiant agent根据自身知识回答了王小波的三部作品是啥。

Solution: Based on knowledge available up to October 2023, Wang Xiaobo was a renowned Chinese writer whose works are recognized for their intellectual depth, humor, and reflections on society and human nature. Here are three of his notable works:\n\n1. 《黄金时代》 (The Golden Age)  \n   This novella is one of Wang Xiaobo's most iconic works. It explores themes of individual freedom, societal constraints, and existentialism, set against the backdrop of China's Cultural Revolution.\n\n2. 《白银时代》 (The Silver Age)  \n   This is part of the "Age Trilogy," which includes The Golden Age, The Silver Age, and The Bronze Age. The book continues to delve into philosophical questions about life, human desires, and the absurdity of existence.\n\n3. 《沉默的大多数》 (The Silent Majority)  \n   This collection of essays exemplifies Wang Xiaobo's sharp wit and critical thinking. The essays reflect on culture, politics, and the human condition in modern society.\n\nThese works highlight Wang Xiaobo's unique literary style and his influence on contemporary Chinese literature. Let me know if you need further details about these or other works. Next request=============================================中文翻译分割线=============================================解决方案：根据截至2023年10月的知识，王小波是一位著名的中国作家，他的作品以其知识深度、幽默以及对社会和人性的反思而闻名。以下是他的三部著名作品：\n\n1。《黄金年代》\n这部中篇小说是王小波最具代表性的作品之一。它以中国文化大革命为背景，探讨了个人自由、社会约束和存在主义的主题。\n\n2。《白银时代》（The Silver Age）\n这是“时代三部曲”的一部分，其中包括黄金时代、白银时代和青铜时代。这本书继续深入探讨关于生命、人类欲望和存在荒谬性的哲学问题。\n\n3。《沉默的大多数》这本散文集体现了王小波敏锐的智慧和批判性思维。这些散文反映了现代社会的文化、政治和人类状况。\这些作品凸显了王小波独特的文学风格及其对中国当代文学的影响。如果您需要有关这些或其他作品的更多详细信息，请告诉我。下一个请求

ROUND 3

user agent输入：round 2 assistant的输出，同时也有历史的交互记录。

Solution: Based on knowledge available up to October 2023, Wang Xiaobo was a renowned Chinese writer whose works are recognized for their intellectual depth, humor, and reflections on society and human nature. Here are three of his notable works:\n\n1. 《黄金时代》 (The Golden Age)  \n   This novella is one of Wang Xiaobo's most iconic works. It explores themes of individual freedom, societal constraints, and existentialism, set against the backdrop of China's Cultural Revolution.\n\n2. 《白银时代》 (The Silver Age)  \n   This is part of the "Age Trilogy," which includes The Golden Age, The Silver Age, and The Bronze Age. The book continues to delve into philosophical questions about life, human desires, and the absurdity of existence.\n\n3. 《沉默的大多数》 (The Silent Majority)  \n   This collection of essays exemplifies Wang Xiaobo's sharp wit and critical thinking. The essays reflect on culture, politics, and the human condition in modern society.\n\nThese works highlight Wang Xiaobo's unique literary style and his influence on contemporary Chinese literature. Let me know if you need further details about these or other works. Next request=============================================中文翻译分割线=============================================解决方案：根据截至2023年10月的知识，王小波是一位著名的中国作家，他的作品以其知识深度、幽默以及对社会和人性的反思而闻名。以下是他的三部著名作品：\n\n1。《黄金年代》\n这部中篇小说是王小波最具代表性的作品之一。它以中国文化大革命为背景，探讨了个人自由、社会约束和存在主义的主题。\n\n2。《白银时代》（The Silver Age）\n这是“时代三部曲”的一部分，其中包括黄金时代、白银时代和青铜时代。这本书继续深入探讨关于生命、人类欲望和存在荒谬性的哲学问题。\n\n3。《沉默的大多数》这本散文集体现了王小波敏锐的智慧和批判性思维。这些散文反映了现代社会的文化、政治和人类状况。\这些作品凸显了王小波独特的文学风格及其对中国当代文学的影响。如果您需要有关这些或其他作品的更多详细信息，请告诉我。下一个请求

user agent的输出：user agent判断问题已经被成功解答，输出<CAMEL_TASK_DONE>，交互跳出循环。

<CAMEL_TASK_DONE>

OWL的agent及工具

agent父类的实现

所有的Agent的实现都是继承自ChatAgent类，路径在camel/camel/agents/chat_agent.py。ChatAgent类除了负责调用LLM接口外还主要负责如下两个功能：

1.记忆管理：agent之间的历史对话由AgentMemory类的子类管理，实现路径在camel/camel/memories/agent_memories.py，主要有如下三种管理模型记忆的类：

ChatHistoryMemory：返回最近的_window_size条（不是轮数）历史对话。
VectorDBMemory：将历史对话进行向量化，存到向量数据库中，默认使用的embedding模型是OpenAIEmbedding。需要历史记忆时，使用向量召回和query相关性最大的retrieve_limit条历史记录。
LongtermAgentMemory：结合了ChatHistoryMemory和VectorDBMemory的记忆召回特性，既返回最近时间窗口的内容也返回向量化召回的内容，具体实现如下：

def retrieve(self) -> List[ContextRecord]:        chat_history = self.chat_history_block.retrieve()        vector_db_retrieve = self.vector_db_block.retrieve(            self._current_topic,            self.retrieve_limit,        )        # chat_history[:1]为system prompt        return chat_history[:1] + vector_db_retrieve + chat_history[1:]

2.工具调用：传入ChatAgent的工具会存在self._internal_tools变量中，后续传入openai的api请求，如果需要进行工具调用，执行_execute_tool函数进行工具调用，工具调用基本原理见我的往期文章：大模型工具调用(function call)原理及实现。

agent子类

以ChatAgent为父类的所有agent实现都在camel库的camel/agents路径下，按照功能主要大致分为如下两类：

1.专门化Agent类：

CriticAgent：专门用于评估和批评响应或解决方案的智能体
EmbodiedAgent：为具身AI场景设计的智能体，能够理解和响应物理世界上下文
KnowledgeGraphAgent：专门构建和利用知识图谱进行增强推理的智能体
MCPAgent：支持Model Context Protocol的智能体
RepoAgent：用于代码仓库操作的智能体
RoleAssignmentAgent：负责角色分配的智能体
SearchAgent：专注于信息检索和搜索任务的智能体

2.任务管理Agent类：

TaskCreationAgent ：任务创建智能体
TaskPlannerAgent ：任务规划智能体
TaskPrioritizationAgent ：任务优先级排序智能体
TaskSpecifyAgent：任务细化智能体

工具组件

OWL的工具组件在camel/camel/toolkits路径下实现，大致可分为以下11类，各家agent框架整体流程花样并不多，效果的核心壁垒主要在于工具性能是否强大，可以看到camel-ai提供的工具组件非常全面：

1.搜索与网络工具：

SearchToolkit ：支持多种搜索引擎（Google、DuckDuckGo、Wikipedia、Bing、百度、Wolfram Alpha）
BrowserToolkit ：网页浏览和交互，包括浏览器模拟和内容提取
SearxNGToolkit ：基于SearxNG的搜索工具
BrowserNonVisualToolkit ：非可视化浏览器工具
Crawl4AIToolkit：网页爬取工具

2.学术研究工具

ArxivToolkit ：与arXiv API交互，搜索和下载学术论文
GoogleScholarToolkit：从Google Scholar检索作者和出版物信息
SemanticScholarToolkit：与Semantic Scholar API交互获取论文和作者数据
PubMedToolkit：与PubMed的E-utilities API交互访问MEDLINE数据

3.开发与代码工具

CodeExecutionToolkit ：代码执行，支持多种沙箱环境（Python、Jupyter、Docker、subprocess、e2b）
GitHubToolkit ：与GitHub仓库交互，包括检索issues和创建pull requests
TerminalToolkit ：终端操作，如按名称或内容搜索文件、执行shell命令
OpenAPIToolkit ：处理OpenAPI规范和REST APIs

4.社交媒体与通信工具

TwitterToolkit ：Twitter操作，包括创建推文、删除推文和检索用户资料
LinkedInToolkit ：LinkedIn操作，包括创建帖子、删除帖子和检索用户资料
SlackToolkit ：Slack操作，包括创建频道、加入频道和管理频道成员
WhatsAppToolkit ：与WhatsApp Business API交互
RedditToolkit ：Reddit操作，包括收集热门帖子、评论情感分析

5.文档处理工具

ExcelToolkit ：从Excel文件提取和处理内容，包括转换为markdown表格
PPTXToolkit ：创建和操作PowerPoint文件，包括添加幻灯片、文本和图像
FileWriteToolkit ：创建、写入和修改文本文件
MinerUToolkit ：使用MinerU API提取和处理文档内容，支持OCR、公式识别和表格检测
MarkItDownToolkit ：文档转换工具

6.AI与多媒体工具

DalleToolkit ：使用OpenAI的DALL-E模型生成图像
ImageAnalysisToolkit：使用视觉语言模型进行综合图像分析和理解
VideoAnalysisToolkit ：使用视觉语言模型分析视频内容，包括帧提取和视频问答
AudioAnalysisToolkit ：音频处理和分析，包括转录和音频内容问答
VideoDownloaderToolkit ：下载视频并可选择分割成块

7.数据分析与计算工具

MathToolkit ：执行基本数学运算，如加法、减法和乘法
SymPyToolkit ：使用SymPy进行符号计算，包括代数操作、微积分和线性代数
NetworkXToolkit ：使用NetworkX库进行图操作和分析
OpenBBToolkit ：通过OpenBB平台访问和分析金融市场数据
DataCommonsToolkit ：查询和检索Data Commons知识图谱数据

8.云服务与API集成工具

GoogleMapsToolkit ：访问Google Maps服务，包括地址验证、海拔数据和时区信息
GoogleCalendarToolkit ：创建、检索、更新和删除Google日历事件
NotionToolkit ：使用Notion API从Notion页面和工作区检索信息
StripeToolkit ：通过Stripe处理支付和管理金融交易
ZapierToolkit ：与Zapier的NLA API交互，通过自然语言命令执行操作

9.新闻与信息工具

AskNewsToolkit ：使用AskNews API根据用户查询获取新闻、故事和其他内容
DappierToolkit ：搜索实时数据并获取跨关键垂直领域的AI推荐
WeatherToolkit ：使用OpenWeatherMap API获取城市天气数据

10.协议与框架工具

MCPToolkit ：使用Model Context Protocol与外部工具交互
RetrievalToolkit ：基于指定查询从本地向量存储系统检索信息
MemoryToolkit ：保存、加载和清除ChatAgent的记忆
HumanToolkit ：促进AI系统中的人机交互和反馈

11.专业化工具

MeshyToolkit ：处理3D网格数据和操作
JinaRerankerToolkit ：基于与给定查询的相关性对文档进行重新排序
KlavisToolkit ：与Kavis AI的API交互创建远程托管的生产就绪MCP服务器

这些toolkit中大部分还是比较简单的，就是各种接口的集成。但是个别的tool的实现也是很复杂的，甚至比用于多轮交互的user/assitant agent还要复杂，比如BrowserToolkit（camel/camel/toolkits/browser_toolkit.py，用于网页浏览）。BrowserToolkit会去调用其他的toolkit，并且BrowserToolkit的实现也是个agent系统，有两个agent：

1.web_agent：负责操作浏览器，web_agent可以做到访问网页、截图、分析视频（调用VideoAnalysisToolkit）、滚动网页、下载等非常丰富的操作，使用的system prompt如下：

You are a helpful web agent that can assist users in browsing the web.Given a high-level task, you can leverage predefined browser tools to helpusers achieve their goals.=============================================中文翻译分割线=============================================你是一个乐于助人的网络代理，可以协助用户浏览网页。给定一个高级任务，你可以利用预定义的浏览器工具来帮助用户实现他们的目标。

2.planning_agent：用于计划浏览器的操作步骤，使用的system prompt如下：

You are a helpful planning agent that can assist users in planning complextasks which need multi-step browser interaction.=============================================中文翻译分割线=============================================你是一个实用的规划助手，能够帮助用户规划需要多步浏览器交互的复杂任务。

除此之外，在交互流程上，BrowserToolkit使用了ReAct框架，主流程中的user agent和assistant agent都没有实现的这么复杂。

{{userData.name}}已认证

准确率 92.7%逼近 Claude 3.5、成本降低 86%，开源代码定位新神器 LocAgent 来了｜机器之心

付费用户突破 1000 万，All in AI 的多邻国，是怎么用 AI 的？

Netflix 如何存储每日 1.4 亿小时观影数据：存储架构深度解析

RAG 彻底爆了！一文读懂其架构演进及核心要点

Anthropic 联创曼恩：超级智能可能最早 2028 年出现

AI 操作网页：browser-use 和 AI 大模型互动解析

借助 Skills 提升前端设计 | Claude

MiniMax 创始人闫俊杰×罗永浩！大山并非无法翻越

{{userData.name}}已认证

准确率 92.7%逼近 Claude 3.5、成本降低 86%，开源代码定位新神器 LocAgent 来了 ｜ 机器之心

付费用户突破 1000 万，All in AI 的多邻国，是怎么用 AI 的？

准确率 92.7%逼近 Claude 3.5、成本降低 86%，开源代码定位新神器 LocAgent 来了｜机器之心