DEV Community

caofan99521
caofan99521

Posted on

My Journey Building a Multi-agent System

这是一套非常完整的多Agent协作系统开发实战课程,核心是教你从底层到上层,从零构建一个工业级的Agent应用框架(基于Manus)

  1. LLM模块开发:让Agent会思考
  • 动态配置LLM提供商(OpenAI、Anthropic等)

  • 用PostgreSQL+Redis做状态检查与消息队列

  • 实现任务拆分、关注点分离的后台Task模块

  • 让Agent能稳定调用不同大模型,处理复杂任务流

  1. Agent 模块开发:让Agent 会规划与行动
  • 核心逻辑:基于Plan&ReAct智能体架构,让Agent 先做规划,再一步步执行、反思调整

  • 多Agent规划步骤与记忆模型设计

  • LLM结构化输出(Json修复、格式校验)

-工具基类与装饰器开发,让Agent能调用各种工具(搜索、计算等)

  • 通用Agent 配置模型与状态回溯,实现可观测、可调式

3.工具模块开发:给Agent 装上“手脚”

  • 开发搜索引擎工具(Bing、Jina.ai)

  • 集成MCP(Model Context Protocol)工具生态

  • 实现动态增删改工具API,让Agent能够灵活扩展能力

  • 让Agent能像人一样使用外部工具完成复杂任务

  1. Playwright 与BrowserUse:让Agent会上网
  • 用Playwright实现浏览器自动化

  • 让Agent 能浏览网页、提取交互元素、执行JS代码

  • 实现网页截图、滚动、文本输入等操作,让Agent能够处理网页任务

  1. 沙箱模块开发:给Agent一个安全的“运行环境”
  • 用Docker 构建隔离沙箱,防止Agent执行危险操作

  • 实现Shell命令执行、文件读写、进程管理

  • 让Agent在安全环境中运行代码、操作系统,避免风险

  • 部署沙箱为独立服务,支持多任务并发

  1. A2A协议集成:让多个Agent 相互协作
  • Agent-to-Agent(A2A)协议,实现多Agent之间的通信与协作

  • 分布式Agent 系统设计,让多个Agent 分工完成复杂任务

  • 客户端/服务器端管理,让Agent 能互相调用、传递信息

  1. 上下文工程:让Agent 记住“过去”
  • 会话数据库设计,让Agent能持久化记忆

  • 消息存储与检索,实现长对话上下文管理

  • 任务流与Agent状态回溯,让Agent 能复盘,优化执行路径

  1. 前端与部署:让Agent 变成可使用的产品
  • 用Next.js kai开发前端界面,让用户能和Agent交互

  • 实现聊天UI、任务管理、文件预览等功能

  • 用Docker部署整个系统,实现商业化、高可用

  • 集成VNC 等技术,让用户能实时查看Agent 操作

  1. 总结与扩展:工业级Agent开发
  • 项目架构图与开发难点复盘

  • 新增OpenClaw等自动化工具,让Agent更强大

背后逻辑系统:

本质是 大模型+工具+记忆+规划+协作的完整闭环:

  1. 感知与规划层(plan)
  • 接收用户请求-拆解成可执行的子任务-指定执行计划

  • 用ReAct范式:Reason(推理)-Act(行动)-Observe(观察)-再推理,循环直到任务完成

  1. 执行与工具层(Act)
  • 调用LLM生成下一步行动

  • 调用工具(搜索、浏览器、代码执行等)完成具体操作

  • 用沙箱隔离执行环境,保证安全

  1. 记忆与状态层(Memory)
  • 用数据库存储会话历史、任务状态、工具调用结果

  • 让Agent能记住之前做过什么,避免重复劳动

-实现状态回溯,方便调试与优化

  1. 协作与分布式层(A2A)
  • 多个Agent分工协作:有的负责规划,有的负责执行,有的负责工具调用

  • 通过A2A协议通信,完成复杂的多步骤任务

This is a complete, hands-on course for developing a multi-agent collaboration system. It teaches you how to build an industrial-grade Agent application framework from scratch (based on Manus), from low-level infrastructure to upper-layer services.

  1. LLM Module Development: Make Agents "Think"
    Dynamically configure LLM providers (OpenAI, Anthropic, etc.)
    Use PostgreSQL + Redis for state management and message queuing
    Implement task decomposition and a background Task module with separation of concerns
    Enable stable cross-LLM task execution and complex workflow handling

  2. Agent Module Development: Make Agents "Plan & Act"
    Core logic: Plan & ReAct architecture — Agents plan first, then execute step-by-step, reflect, and adjust
    Design multi-agent planning processes and memory models
    Structured LLM output (JSON repair, format validation)
    Develop tool base classes and decorators to enable tool use (search, calculation, etc.)
    General Agent configuration and state rollback for observability and adjustability

  3. Tool Module Development: Equip Agents with "Hands & Feet"
    Develop search engine tools (Bing, Jina.ai)
    Integrate the MCP (Model Context Protocol) tool ecosystem
    Implement dynamic tool API registration/updates/deletion for flexible capability expansion
    Enable Agents to use external tools like humans to complete complex tasks

  4. Playwright & BrowserUse: Make Agents "Browse the Web"
    Implement browser automation with Playwright
    Enable Agents to navigate pages, extract interactive elements, and execute JavaScript
    Support webpage screenshots, scrolling, text input, and web-based task completion

  5. Sandbox Module Development: Build a Safe "Runtime Environment"
    Build isolated sandboxes with Docker to prevent risky operations
    Support shell execution, file I/O, and process management
    Allow Agents to run code and interact with the OS in a secure environment
    Deploy sandbox as an independent service for concurrent task support

  6. A2A Protocol Integration: Enable Multi-Agent Collaboration
    Agent-to-Agent (A2A) protocol for communication and cooperation
    Distributed Agent system design for division-of-labor task completion
    Client/server management for cross-Agent invocation and information sharing

  7. Context Engineering: Let Agents "Remember the Past"
    Session database design for persistent memory
    Message storage & retrieval for long-context dialogue management
    Task flow and Agent state rollback for 复盘 (review) and execution optimization

  8. Frontend & Deployment: Turn Agents Into Usable Products
    Develop interactive frontend with Next.js
    Implement chat UI, task management, file preview, and controls
    Deploy the full system with Docker for commercialization and high availability
    Integrate VNC for real-time Agent operation monitoring

  9. Summary & Extension: Industrial-Grade Agent Development
    Review of system architecture and key development challenges
    Extend with tools like OpenClaw for stronger automation capabilities
    Underlying System Logic

This is a full closed-loop system:
LLM + Tools + Memory + Planning + Collaboration

  1. Perception & Planning Layer (Plan)
    Receive user requests → decompose into executable subtasks → generate execution plans
    Follow the ReAct paradigm:
    Reason → Act → Observe → Re-reason, repeating until task completion

  2. Execution & Tool Layer (Act)
    Call LLM to decide next actions
    Invoke tools (search, browser, code execution, etc.) to complete operations
    Use sandbox isolation for safe execution

  3. Memory & State Layer (Memory)
    Store conversation history, task status, and tool results in databases
    Enable Agents to recall prior work and avoid duplication
    Support state rollback for debugging and optimization

  4. Collaboration & Distribution Layer (A2A)
    Multi-Agent division of labor: planning, execution, tool calling
    Communicate via A2A protocol to complete complex multi-step tasks

Top comments (0)