The most interesting thing to happen in AI this week wasn't a benchmark score or a product launch. It was a philosophical shift disguised as a model release.
MiniMax just dropped M2.7 -- a 229-billion parameter Mixture-of-Experts model that did something no major model has done before at this scale: it participated in its own training process. Not as a tool. Not as an evaluator on the side. As an active participant in the loop that made it better.
Two days earlier, researchers from UBC, NYU, University of Edinburgh, and Meta's Superintelligence Labs published the Darwin-Godel HyperAgent v3 paper -- a system that literally rewrites its own source code to become a better coding agent.
These aren't incremental improvements. They represent the beginning of a paradigm where AI models don't just learn from data -- they learn from themselves. And the implications for everyone building AI agents are enormous.
The Traditional Training Paradigm (And Why It's Hitting a Wall)
Traditional large language model training follows a well-worn pipeline:
Pre-training: Feed the model trillions of tokens. Hundreds of millions of dollars in compute. The model learns language patterns, factual knowledge, and reasoning.
Fine-tuning: Train further on curated, task-specific data. Models learn to follow instructions and behave like assistants.
RLHF/RLAIF: Human annotators rank outputs, training a reward model that guides improvement.
Deployment: Ship it. The model is frozen. It doesn't learn from millions of interactions.
The fundamental bottleneck: In traditional training, every improvement requires human intervention. The model itself has zero agency in the process.
This pipeline produced GPT-5.2, Claude Opus 4.6, Gemini 3 Pro -- all masterpieces. But three fundamental limitations remain:
Diminishing returns on data. We've effectively run out of high-quality internet text.
Linear improvement curves. Each training run produces a fixed improvement.
No compound learning. The model in production never improves from its interactions.
Self-evolving systems break all three constraints.
How MiniMax M2.7 Trains Itself
The Numbers
- 229 billion total parameters (Mixture-of-Experts architecture)
- 10 billion active parameters per forward pass
- Self-participatory training -- the model was involved in its own improvement loop
The MoE architecture routes each input to specialized "expert" networks -- giving you the knowledge capacity of a 229B model with the inference cost of a 10B model.
The Self-Evolution Architecture
MiniMax describes a training process built around four interconnected components:
1. Hierarchical Skills -- The model organizes capabilities into a hierarchical skill tree.
2. Persistent Memory -- During training, the model maintains memory across tasks. Solutions and lessons get stored and retrieved.
3. Guardrails and Evaluation -- Automated evaluation measures whether self-modifications improve performance.
4. The Iterative Loop -- Run tasks, evaluate, learn from results, add strategies to memory, re-enter training, become more capable, repeat.
M2.7 Self-Evolution Loop: Run -> Evaluate -> Learn -> Evolve -> Repeat. Each cycle starts from a higher baseline. Result: Compound improvement.
What M2.7 Can Actually Do
- Complex agent orchestration with dynamic tool search
- Coding: log analysis, bug hunting, refactoring, security auditing, ML, Android dev
- Professional work: Excel automation, PowerPoint creation, document generation
These capabilities were partially self-discovered during training. The model learned to code better because it practiced, evaluated its own output, and integrated the lessons.
Darwin-Godel HyperAgent v3: Self-Rewriting Code
If MiniMax M2.7 represents self-evolution at the model level, Meta's Darwin-Godel HyperAgent v3 represents it at the code level.
The paper, published March 23 by UBC, Vector Institute, NYU, University of Edinburgh, and Meta's Superintelligence Labs, builds on the Darwin-Godel Machine (DGM) framework.
How It Works
- Self-Modification: The agent examines and modifies its own source code
- Benchmark Evaluation: Each modified version is tested
- Selection Pressure: Better versions enter the archive; worse ones are pruned
- Evolutionary Trees: Branching trees of agent variants
What Makes V3 Different
V3 extends self-modification to arbitrary domains. The system autonomously discovers capabilities researchers never engineered:
- Memory tools for tracking context
- Updated evaluation criteria as capabilities expanded
- Persona modifications for different task types
- Multi-stage processing pipelines
Two Paths to Self-Evolution:
MiniMax M2.7: Self-evolution at the weight level.
Darwin-Godel HyperAgent: Self-evolution at the code level.
Both achieve systems that get better at getting better.
The Convergence
Within a single week:
- MiniMax M2.7 -- self-evolution during model training
- Darwin-Godel HyperAgent v3 -- self-evolution through code rewriting
- @omarsar0 flagging a new paper cracking the "plateau problem"
- OpenAI's Model Spec revealing deliberative alignment in reasoning models
The broader AI research community is noticing: "It's still early innings for RL. The ceiling for open models keeps moving up."
As Chamath noted, the competitive dynamics are "much more nuanced than what appears on the surface."
What Self-Evolution Means for Agent Builders
1. The Fine-Tuning Paradigm Gets Disrupted
Deploy a model and it gets better through use. The line between deployment and training blurs.
2. Agent Orchestration Gets More Complex (And Powerful)
A self-evolving agent discovers tools, evaluates their utility, and integrates them without human intervention.
3. Evaluation Becomes the Bottleneck
If evaluation is wrong, the model optimizes for the wrong thing -- with compound efficiency.
4. Memory Architecture Becomes First-Class
Memory is the critical infrastructure for self-evolving agents. This connects to the broader agent memory challenge.
The Open-Source Wildcard
MiniMax announced open-source weights dropping in approximately 2 weeks.
Why Open Weights for a Self-Evolving Model Matter: The fine-tuning community won't just build on M2.7's capabilities -- they'll build on its self-improvement capabilities.
The implications:
- Startups build domain-specific self-improving agents without frontier lab budgets
- Researchers study self-evolution with real weights
- The competitive landscape shifts as self-evolution becomes commoditized
The Risks
Reward hacking at scale. Goodhart's Law with compound interest.
Capability drift. Optimizing for frequent tasks while degrading on rare ones.
Alignment compounding. Subtle misalignment deepening with each cycle.
Verification complexity. How do you audit a self-modified model?
The Bottom Line
Self-evolving AI is no longer theoretical. MiniMax M2.7 proves it at the weight level. Darwin-Godel HyperAgent v3 proves it at the code level. In two weeks, the open-source community gets to prove it at the ecosystem level.
For agent builders: the models are about to start improving themselves faster than you can improve your frameworks around them. Invest in flexible orchestration, robust evaluation, and functional memory architecture.
The era of training a model, shipping it, and moving on is ending. The era of models that train themselves has begun.
Follow AgentConn for daily analysis of the tools, frameworks, and research shaping the future of autonomous AI.
Top comments (0)