My AI Dev Team Doesn’t Sleep: Scaling a Self-Evolving Trading System to a 9% Weekly Return

Crystal Zhang — Tue, 21 Apr 2026 15:51:14 +0000

It started as a classic "developer's trap": a side project meant to save time that ended up consuming every waking hour. I wanted a bot to execute trades so I wouldn't have to watch the charts.

The logic was simple. The implementation, however, was a beast. Between scraping unstructured news feeds, managing low-latency data pipelines, and hunting edge-case bugs that only appeared at market open, my "automated" system had become a demanding second job.

I didn't need better scripts. I needed an engineering organization. So, I built one—out of AI agents.

From "Script" to "Self-Evolving System"

Leveraging 20 years in software engineering, I pivoted from writing code to designing an Autonomous Engineering Org. This isn't a collection of LLM prompts; it’s a recursive, 24/7 CI/CD loop with specialized agents handling triage, dev, QA, and code review.

Since deploying this "AI Team," the system has transitioned from a static bot into a self-evolving architecture. The results were immediate:

Engineering Velocity: The system now handles its own maintenance, pushing 30 to 50 PRs daily.
Bug Resolution: Mean Time to Resolution (MTTR) dropped from days to hours.
Performance: A 9% return in a single week during massive market volatility.

🛠️ The Architecture of the "Squad"

The "team" is structured like a high-performance dev shop, where each agent has a specific, isolated domain.

1. The Sentinels (Observability & QA)
In a production environment, QA doesn't wait for user-reported bugs. Neither do my agents. My QA and Ops agents run continuous telemetry and synthetic tests, catching data pipeline anomalies and API degradation before the market's opening bell. Meanwhile, the UI Review Agent operates as a visual Product Manager. Utilizing headless browsers and DOM analysis, it catches UX regressions I had walked past for weeks. They don't just throw errors; they file highly structured, data-rich bug reports with reproducible steps.

2. The Filter (Heuristic Triage)
Here's a lesson from real-world engineering management: not every bug is worth the compute cycle. Every day, the Triage Agent parses the backlog. It filters out transient third-party API timeouts and utilizes semantic clustering to group duplicate stack traces together. This heuristic filtering is critical—it’s the difference between an AI team stuck in an infinite loop of noise and one that is actually shipping value.

3. The Builders (Autonomous CI/CD)
This is where the magic happens. The Developer Agent picks up prioritized tasks and writes patches. However, it’s governed by a Code Review Agent that acts as a strict quality gate. It checks for:

Algorithmic logic flaws
Race conditions in high-frequency execution
Deterministic state management

If the reviewer finds an issue, the PR is bounced back with inline comments. This iterative loop happens in milliseconds, not days.

🧠 The Secret Weapon: Model Diversity (Macro-MoE)

A key architectural breakthrough was using different foundation models for different nodes.

Just like a human team benefits from cognitive diversity, an agentic system thrives when you mix "deep-reasoning" models for code review with "classification-optimized" models for triage. A logic flaw one model misses, another catches. It’s a macro-level Mixture of Experts (MoE) that eliminates systematic blind spots.

📈 What the Team Built

This AI dev team didn't just maintain legacy scripts. They actively shipped features, transforming a basic trading script into a comprehensive, self-hosted market intelligence architecture. Here is what they autonomously deployed

The Automated Execution Engine:

Multi-Factor Confluence: Algorithmic execution requiring multiple independent quantitative models to agree before triggering an API order.
Dynamic Risk Parameterization: Automated position sizing and ATR-based (Average True Range) trailing stops that adjust dynamically to real-time volatility spikes.
Walk-Forward Optimization: A nightly engine that backtests strategy variants, optimizing indicator weights for the current market regime.
Event-Driven Shadow Sandboxes: Virtual environments for safely testing new strategy algorithms via paper trading before live deployment.

The Market Intelligence Platform:

Real-time telemetry dashboards aggregating global macroeconomic data and mapping sector rotation flows (risk-on/risk-off detection).
NLP-generated market briefings, automatically localized and distributed across multiple webhook endpoints.
An algorithmic stock screener utilizing AI-weighted relative strength scoring for trend analysis.
A low-latency regulatory monitor that parses SEC EDGAR filings the millisecond they drop.

📊 The Results: Proof in the P&L During the weeks of the Iran war, while broader benchmarks dropped over 10% and the VIX spiked, the trading bot:

Automatically detected the volatility regime shift.
Dynamically deleveraged, tightening stop losses to protect capital.
Identified short-selling setups amidst the structural breakdown.
Delivered a nearly 9% weekly return against a backdrop of market-wide panic.

But the real alpha is the engineering velocity. When I started this as a side project, I quickly became the system's bottleneck—manually grepping logs, fixing bugs on weekends, and pushing hotfixes at midnight. Now:

Mean Time to Detect (MTTD): < 15 minutes.
Triage happens asynchronously with zero human input.
Mean Time to Resolution (MTTR): Code patches ship within 24-48 hours.
Every single commit passes deterministic CI/CD quality gates before merging. I now review merged PRs over my morning coffee—not as a gatekeeper, but as a spectator.

🧠 6 Lessons for the AI Era of Software Engineering

Deterministic Processes > Stochastic Prompts. The quality of my system improved dramatically not when I wrote "better prompts," but when I built better state machines. Structured JSON outputs, rigid triage gates, and isolated review loops matter just as much when your developers are LLMs.
Containerized Isolation is Non-Negotiable. Each coding agent operates in its own ephemeral Dockerized environment. Production is sacred. This is standard branching logic on steroids, heavily mitigating the blast radius if a "developer" hallucinates a breaking change.
Implement Circuit Breakers. Resilience engineering isn't optional. Let agents fail gracefully. Every agent has strict market-hour awareness (deployments are locked during active trading sessions) and API timeout management.
Triage is the Highest-Leverage Node. A developer agent executing the wrong ticket flawlessly is worse than no agent at all. The triage layer—filtering, deduplicating, prioritizing—is what turns raw logs into actionable engineering signals.
Compute-to-Task Optimization. Model assignment is an architectural decision, not just a cost-saving measure. Simple log classification doesn't need frontier reasoning models; algorithmic code review absolutely does.
From Operator to Systems Architect. I spend almost zero time writing boilerplate for this system anymore. Instead, I design CI/CD pipelines, review architectural output, and tune the meta-parameters. This is a fundamentally more leveraged way to build software.

🔮 What's Next The architecture is still evolving.

Currently, I'm experimenting with agentic self-reflection loops and RAG-based cross-agent memory.

The broader takeaway: The future of software engineering isn't AI replacing developers. It's developers becoming Engineering Managers for AI teams. The skills that matter aren't changing—system architecture, CI/CD design, risk management. The scale at which one single human can apply those skills is what's compounding.

I'm building this entirely in the open. If you're interested in the tech stack, the Docker setups, or the technical guts of any specific agent, drop a comment below and I'll do a deep dive next! And if you are brave enough to test-drive a trading platform entirely managed by an AI dev team, let me know—I might just let you in. 😉