Introduction
"If AI agents could evolve like biological organisms — autonomously discovering problems, accumulating experience, and optimizing strategies — they would no longer be static tools, but truly 'growing' intelligent entities."
This is Part 10 of the "Open Source Project of the Day" series. Today we explore AgentEvolver (GitHub).
Traditional AI agent training requires large amounts of manually annotated datasets — expensive and hard to scale. AgentEvolver uses three self-evolving mechanisms — Self-Questioning, Self-Navigating, and Self-Attributing — to enable AI agents to autonomously generate tasks, accumulate experience, and optimize strategies, achieving true self-evolution.
What You'll Learn
- Core self-evolution mechanisms and how AgentEvolver works
- How the three mechanisms (Self-Questioning, Self-Navigating, Self-Attributing) work together
- How to set up and train a self-evolving agent system
- Service-oriented data flow architecture design
- Outstanding performance on AppWorld and BFCL-v3 benchmarks
- Comparative analysis with other agent training frameworks
Prerequisites
- Basic understanding of AI agents and reinforcement learning
- Familiarity with Python programming
- Understanding of basic LLM concepts
- Basic knowledge of reinforcement learning training pipelines (optional)
Project Background
Project Introduction
AgentEvolver is an efficient self-evolving agent system that enables AI agents to autonomously learn and evolve through three core mechanisms:
- Self-Questioning: Agents autonomously explore environments and generate diverse tasks, eliminating the cost of expensive manual dataset construction
- Self-Navigating: Summarizes and reuses cross-task experience to guide higher-quality exploration and improve exploration efficiency
- Self-Attributing: Handles long trajectories, discovers causal contributions of intermediate steps, and enables fine-grained and efficient policy optimization
Core problems the project solves:
- Agent training requires large amounts of manually annotated datasets at high cost
- Lack of autonomous exploration capabilities makes it hard to discover new tasks
- Experience cannot be effectively reused, leading to low exploration efficiency
- Credit assignment in long trajectories is imprecise, making policy optimization inefficient
- Different environment integrations are difficult, lacking a unified training framework
Target user groups:
- AI agent researchers and developers
- Researchers needing to train autonomous agents
- Enterprises looking to reduce agent training costs
- Technical professionals interested in self-evolving systems
Author/Team Introduction
Team: ModelScope
- Background: Alibaba DAMO Academy ModelScope team, focused on AI model and system development
- Contributors: 10 contributors including @YunpengZhai, @TaoShuchang, @Xinji-Mai, and others
- Philosophy: Building efficient, autonomous, evolvable AI agent systems
- Website: modelscope.github.io/AgentEvolver
Project creation date: 2024 (based on GitHub activity, an actively maintained project)
Project Stats
- ⭐ GitHub Stars: 1.1k+ (continuously growing)
- 🍴 Forks: 128+
- 📦 Version: Latest version (continuously updated)
- 📄 License: Apache-2.0 (fully open source, free to use)
- 🌐 Website: modelscope.github.io/AgentEvolver
- 📚 Documentation: Includes complete usage guides and API documentation
- 💬 Community: Active GitHub Issues
- 📊 Paper: arXiv:2511.10395
Project development history:
- 2024: Project created, started building core self-evolution mechanisms
- 2024-2025: Refined the three mechanisms, added multi-environment support
- 2025: Published paper, achieved outstanding performance on AppWorld and BFCL-v3 benchmarks
- 2026: Continuous optimization, added Game Arena multi-agent scenario support
Main Features
Core Purpose
AgentEvolver's core purpose is to build an efficient self-evolving agent system that enables AI agents to:
- Autonomously generate tasks: Through Self-Questioning, agents autonomously explore environments and generate diverse tasks
- Experience-guided exploration: Through Self-Navigating, summarize and reuse cross-task experience to improve exploration efficiency
- Fine-grained credit assignment: Through Self-Attributing, precisely identify the contributions of key steps in long trajectories
- Efficient policy optimization: Based on fine-grained credit assignment, achieve more efficient policy optimization
Use Cases
-
Agent training and research
- Training autonomously exploring AI agents
- Researching the effectiveness of self-evolution mechanisms
- Reducing agent training costs
-
Complex environment interaction
- AppWorld application operation tasks
- BFCL-v3 complex reasoning tasks
- Multi-agent social games (Avalon, Diplomacy)
-
Automatic task generation
- Automatically discover new tasks in the environment
- Generate diverse training data
- Reduce manual annotation costs
-
Experience reuse and optimization
- Cross-task experience summarization and reuse
- Improve exploration efficiency
- Accelerate agent learning
Quick Start
Installation
AgentEvolver requires conda and CUDA toolkit:
# Step 1: Install base dependencies
bash install.sh
# Step 2: Set up environment service (AppWorld as example)
cd env_service/environments/appworld && bash setup.sh
# Step 3: Set up ReMe (optional, for experience management)
bash external/reme/install_reme.sh
# Step 4: Start training
conda activate agentevolver
# Method 1: Basic example (without ReMe)
python launcher.py --conf examples/basic.yaml --with-appworld
# Method 2: Full example (with ReMe, includes questioning + navigating + attributing)
python launcher.py --conf examples/overall.yaml --with-appworld --with-reme
Prerequisites
- conda: For environment management
- CUDA toolkit: For GPU acceleration
- Python 3.x: Primary programming language
Simplest Usage Example
# Copy config file
cp example.env .env
# Modify .env file, set API key and conda path
# Then run training
# Basic training (using built-in environment dataset)
python launcher.py --conf examples/basic.yaml --with-appworld
# Full self-evolving training
python launcher.py --conf examples/overall.yaml --with-appworld --with-reme
Core Features
- Self-Questioning: Agents autonomously explore environments, generate diverse tasks, eliminating manual dataset construction costs
- Self-Navigating: Summarizes and reuses cross-task experience to guide high-quality exploration, improving exploration efficiency
- Self-Attributing: Handles long trajectories, discovers causal contributions of intermediate steps, enables fine-grained policy optimization
- Environment compatibility: Standardized interfaces for seamless integration with various external environments and tool APIs
- Flexible context management: Built-in tools for managing multi-turn context and complex interaction logic
- Modular architecture: Decoupled components, easy to customize, extend, and upgrade algorithms
- Game Arena support: Extended to multi-agent social game environments, supporting interaction, evaluation, and training
Project Advantages
| Comparison | AgentEvolver | Traditional agent training | Other self-evolving frameworks |
|---|---|---|---|
| Task generation | ✅ Autonomous generation | ❌ Requires manual annotation | ⚠️ Partial support |
| Experience reuse | ✅ Cross-task experience summary | ❌ Cannot reuse | ⚠️ Limited reuse |
| Credit assignment | ✅ Fine-grained attribution | ⚠️ Coarse-grained | ⚠️ Moderate precision |
| Training efficiency | ✅ Highly efficient | ❌ Expensive | ⚠️ Moderate |
| Environment support | ✅ Standardized interface | ⚠️ Needs adaptation | ⚠️ Limited support |
| Multi-agent | ✅ Game Arena | ❌ Not supported | ⚠️ Partial support |
Why choose AgentEvolver?
Compared to traditional agent training methods, AgentEvolver uses three self-evolving mechanisms to achieve autonomous task generation, experience reuse, and fine-grained credit assignment, significantly reducing training costs, improving efficiency, and achieving outstanding performance on AppWorld and BFCL-v3 benchmarks.
Detailed Project Analysis
Architecture Design
AgentEvolver adopts a service-oriented data flow architecture, seamlessly integrating environment sandboxes, LLMs, and experience management into modular services.
Core Architecture
AgentEvolver System
├── Environment Service
│ ├── AppWorld environment
│ ├── BFCL-v3 environment
│ ├── Game Arena (Avalon, Diplomacy)
│ └── Custom environment interface
├── LLM Service
│ ├── Qwen2.5-7B/14B
│ ├── Other LLM support
│ └── API call wrapper
├── Experience Manager
│ ├── ReMe integration
│ ├── Experience pool management
│ └── Experience summarization and reuse
├── Task Manager
│ ├── Task exploration
│ ├── Synthetic task generation
│ └── Training data management
└── Advantage Processor
├── ADCA-GRPO algorithm
├── Credit assignment
└── Policy optimization
Self-Questioning Mechanism
Self-Questioning enables agents to autonomously explore environments and generate diverse tasks:
Workflow:
- Agent autonomously explores the environment
- Discovers new tasks and challenges in the environment
- Automatically generates task descriptions and training data
- Eliminates expensive manual dataset construction costs
Advantages:
- High task diversity, covering various scenarios in the environment
- No manual annotation needed, significantly reduces costs
- High task quality, based on actual environment exploration
Self-Navigating Mechanism
Self-Navigating improves exploration efficiency through experience summarization and reuse:
Workflow:
- Summarize successful cross-task experiences
- Build an experience knowledge base
- Reuse relevant experience in new tasks
- Guide higher-quality exploration
Advantages:
- Significantly improves exploration efficiency
- Experience is reusable, avoiding repeated exploration
- Guides higher-quality strategies
Self-Attributing Mechanism
Self-Attributing achieves efficient policy optimization through fine-grained credit assignment:
Workflow:
- Analyze intermediate steps in long trajectories
- Identify causal contributions of key steps
- Assign credit based on contributions
- Implement fine-grained policy optimization
Advantages:
- Precise credit assignment, avoids incorrect attribution
- High policy optimization efficiency
- Supports long trajectory processing
Performance
AgentEvolver achieves outstanding performance on AppWorld and BFCL-v3 benchmarks:
AppWorld Benchmark
- Qwen2.5-7B + AgentEvolver: avg@8: 32.4%, best@8: 51.2%
- Qwen2.5-14B + AgentEvolver: avg@8: 48.7%, best@8: 69.4%
Significant performance improvements over baseline models:
- 7B model: Improved from 1.8% to 32.4% (avg@8)
- 14B model: Improved from 18.0% to 48.7% (avg@8)
BFCL-v3 Benchmark
- Qwen2.5-7B + AgentEvolver: avg@8: 57.9%, best@8: 69.0%
- Qwen2.5-14B + AgentEvolver: avg@8: 66.5%, best@8: 76.7%
Significant performance improvements over baseline models:
- 7B model: Improved from 29.8% to 57.9% (avg@8)
- 14B model: Improved from 41.6% to 66.5% (avg@8)
Mechanism Ablation Study
Experiments show that all three mechanisms working together achieves the best results:
- +Questioning: Significant performance improvement
- +Questioning&Navigating: Further improves exploration efficiency
- +Questioning&Attributing: Fine-grained optimization brings additional gains
- AgentEvolver (Full): All three mechanisms together, best performance
Game Arena Multi-agent Scenarios
AgentEvolver Game Arena extends AgentEvolver to multi-agent social game environments:
Core Capabilities
- Web interface interaction: Real-time observation of AI agent reasoning and communication, or participate as a human player
- Scalable evaluation: Run large-scale self-play or mixed-model tournaments, supports configuration and leaderboards
- End-to-end training: Directly train LLM agents using reinforcement learning methods (like GRPO) in social game environments
Supported Games
- Avalon: Social reasoning game, tests agents' reasoning and communication abilities
- Diplomacy: Complex multi-agent strategy game, tests long-term planning and collaboration abilities
Training Example
The training curve for training the assassin role in Avalon shows that AgentEvolver can effectively improve agent performance on complex social reasoning tasks.
Environment Compatibility
AgentEvolver provides standardized interfaces supporting seamless integration with various external environments:
Environment Interface
- Standardized interface: Unified environment interface specification
- Tool API integration: Supports integration with various tools and APIs
- Custom environments: Easy to add custom environments
Supported Environments
- AppWorld: Application operation task environment
- BFCL-v3: Complex reasoning task environment
- Game Arena: Multi-agent social game environment
- Custom environments: Integrated through standard interfaces
Experience Management (ReMe)
AgentEvolver integrates ReMe for experience management:
Features
- Experience summarization: Summarize successful cross-task experiences
- Experience pool management: Manage storage and retrieval of the experience pool
- Experience reuse: Reuse relevant experience in new tasks
Usage
# Install ReMe
bash external/reme/install_reme.sh
# Train with ReMe
python launcher.py --conf examples/overall.yaml --with-appworld --with-reme
Project Resources
Official Resources
- 🌟 GitHub: https://github.com/modelscope/AgentEvolver
- 🌐 Website: modelscope.github.io/AgentEvolver
- 📄 Paper: arXiv:2511.10395
Who Should Use This
AgentEvolver is especially suitable for: AI agent researchers and developers, researchers needing to train autonomous agents, enterprises looking to reduce agent training costs, technical professionals interested in self-evolving systems, research teams needing multi-agent training.
Not suitable for: Users who only need simple agents, scenarios that don't require autonomous learning, developers lacking reinforcement learning background.
Welcome to visit my personal homepage for more useful knowledge and interesting products
Top comments (1)
Self-evolving agents that can rewrite their own workflows is one of those ideas that sounds sci-fi until you actually look at the implementation — really interesting that AgentEvolver tackles this with open source tooling.
One thing I wonder about: in a self-evolving system, the prompt templates the agent uses to generate its own next-step instructions must be really sensitive. If the base prompt is underspecified, the "evolution" might just be noise drift. How does AgentEvolver handle prompt stability vs. mutation?
I've been building flompt (flompt.dev) — a visual prompt builder that structures prompts into 12 semantic blocks (role, constraints, chain of thought, output format, etc.) and compiles to structured XML. For agentic systems, having a structured prompt format means the agent's self-modifications stay within a well-defined schema rather than free-text mutations. Also available as MCP:
claude mcp add flompt https://flompt.dev/mcp/Going to check out the repo — this series is great.