Qoder_AI

Posted on Jan 19

Quest 1.0: Refactoring the Agent with the Agent

#agents #ai #automation #softwareengineering

Last week, the Qoder Quest team accomplished a complex 26-hour task using Quest 1.0: refactoring its own long-running task execution logic. This wasn't a simple feature iteration, as it involved optimizing interaction flows, managing mid-layer state, adjusting the Agent Loop logic, and validating long-running task execution capabilities.

From requirement definition to merging code into the main branch, the Qoder Quest team only did three things: described the requirements, reviewed the final code, and verified the experimental results.

This is the definition of autonomous programming: AI doesn't just assist or pair. It autonomously completes tasks.

Tokens Produce Deliverables, Not Just Code

Copilot can autocomplete code, but you need to confirm line by line. Cursor or Claude Code can refactor logic, but debugging and handling errors is still your job. These tools improve efficiency, but humans remain the primary executor.

The problem Quest solves is this: Tokens must produce deliverable results. If AI writes code and a human still needs to debug, test, and backstop, the value of those tokens is heavily discounted. Autonomous programming is only achieved when AI can consistently produce complete, runnable, deliverable results.

Agent Effectiveness = Model Capability × Architecture

From engineering practice, we've distilled a formula:

Agent Effectiveness = Model Capability × Agent Architecture (Context + Tools + Agent Loop)

Model capability is the foundation, but the same model performs vastly differently under different architectures. Quest optimizes architecture across three dimensions: context management, tool selection, and Agent Loop, to fully unleash model potential.

Context Management: Agentic, Not Mechanical

As tasks progress, conversations balloon. Keeping everything drowns the model; mechanical truncation loses critical information. Quest employs "Agentic Context Management": letting the model autonomously decide when to compress and summarize.

Model-Driven Compression

In long-running tasks, Quest lets the model summarize completed work at appropriate moments. This isn't "keep the last N conversation turns"; it's letting the model understand which information matters for subsequent tasks and what can be compressed.

Compression triggers based on multiple factors:

Conversation rounds reaching a threshold

Context length approaching limits

Task phase transitions (e.g., from exploring to implementation)

Model detection of context redundancy

The model makes autonomous decisions based on current task state, rather than mechanically following fixed rules.

Dynamic Reminder Mechanism
The traditional approach hardcodes all considerations into the system prompt. But this bloats the prompt, scatters model attention, and tanks cache hit rates.

Take language preference as an example:

Traditional approach: System prompt hardcodes "Reply in Japanese." Every time a user switches languages, the entire prompt cache invalidates, multiplying costs.

Quest approach: Dynamically inject context that needs attention through the Reminder mechanism. Language preferences, project specs, temporary constraints—all added to conversations as needed. This ensures timely information delivery while avoiding infinite system prompt bloat.

Benefits:

Improved cache hit rates, reduced inference costs

Lean system prompts, enhanced model attention

Flexible adaptation to different scenario requirements

Tool Selection: Why Bash is the Ultimate Partner

If we could only keep one tool, it would be Bash. This decision may seem counterintuitive. Most agents on the market offer rich specialized tools: file I/O, code search, Git operations, etc. But increasing tool count raises model selection complexity and error probability.

Three Advantages of Bash

Comprehensive. Bash handles virtually all system-level operations: file management, process control, network requests, text processing, Git operations. One tool covers most scenarios—the model doesn't need to choose among dozens.

Programmable and Composable. Pipelines, redirects, and scripting mechanisms let simple commands compose into complex workflows. This aligns perfectly with Agent task decomposition: break large tasks into small steps, complete each with one or a few commands.

Native Model Familiarity. LLMs have seen vast amounts of Unix commands and shell scripts during pre-training. When problems arise, models can often find solutions themselves without detailed prompt instructions.

Less is More

Quest still maintains a few fixed tools, mainly for security isolation and IDE collaboration. But the principle remains: if Bash can solve it, don't build a new tool.

Every additional tool increases the model's selection burden and error potential. A lean toolset actually makes the Agent more stable and predictable. Through repeated experimentation, after removing redundant specialized tools, task completion rates remained the same level while context token consumption dropped by 12%.

Agent Loop: Spec -> Coding -> Verify

Autonomous programming's Coding Agent needs a complete closed loop: gather context > formulate plan > execute coding > verify results > iterate optimization.

Observing coding agents in the market, users most often say "just run it...", "make it work", "help me fix this error." This exposes a critical weakness: they're cutting corners on verification. AI writes code, humans test it - that's not autonomous programming.

Spec-Driven Development Flow

Spec Phase: Clarify requirements before starting, define acceptance criteria. For complex tasks, Quest generates detailed technical specifications, ensuring both parties agree on the definition of "done."

Spec elements include:

Feature description: What functionality to implement

Acceptance criteria: How to judge completion

Technical constraints: Which tech stacks to use, which specifications to follow

Testing requirements: Which tests must pass

Coding Phase: Implement functionality according to Spec. Quest proceeds autonomously in this phase, without continuous user supervision.

Verify Phase: Automatically run tests, verify implementation meets Spec. Verification types include syntax checks, unit tests, integration tests, etc. If criteria aren't met, automatically enter the next iteration rather than throwing the problem back to the user.

Through the Hook mechanism, these three phases can be flexibly extended and combined. For example, integrate custom testing frameworks or lint rules in the Verify phase, ensuring every delivery meets team engineering standards.

Combating Model "Regress" Tendency

Most current models are trained for ChatBot scenarios. Facing long contexts or complex tasks, they tend to "regress", giving vague answers or asking for more information to delay execution.

Quest's architecture helps models overcome this tendency: injecting necessary context and instructions at appropriate moments, pushing models to complete the full task chain rather than giving up midway or dumping problems back on users.

Auto-Adapt to Complexity, Not Feature Bloat

Quest doesn't just handle code completion. It manages complete engineering tasks. These tasks may involve multiple modules, multiple tech stacks, and require long-running sustained progress.

The design principle: automatically adapt strategy based on task complexity. Users don't need to care about how scheduling works behind the scenes.

Dynamic Skills Loading

When tasks involve specific frameworks or tools, Quest dynamically loads corresponding Skills. Skills encapsulate validated engineering practices, such as:

TypeScript configuration best practices

React state management patterns

Common database indexing pitfalls

API design specifications

This isn't making the model reason from scratch every time—it's directly reusing accumulated experience.

Teams can also encapsulate engineering specs into Skills, making Quest work the team's way. Examples:

Code style guides

Git commit conventions

Test coverage requirements

Security review checklists

Intelligent Model Routing

When a single model's capabilities don't cover task requirements, Quest automatically orchestrates multiple models to collaborate. Some models excel at reasoning, others at writing, others at handling long contexts.

Intelligent routing selects the most suitable model based on subtask characteristics. To users, it's always just one Quest.

Multi-Agent Architecture

When tasks are complex enough to require parallel progress and modular handling, Quest launches multi-agent architecture: the main Agent handles planning and coordination, subagents execute specific tasks, companion Agents supervise. But we use this capability with restraint. Multi-agent isn't a silver bullet because context transfer has loss, and task decomposition has high barriers. We only enable it when truly necessary.

Designed for Future Models

From day one, Quest has been designed for SOTA models. The architecture doesn't patch for past models. It ensures that as underlying model capabilities improve, Agent capabilities rise with the tide.

This is why Quest doesn't provide a model selector. Users don't need to agonize over choosing between different models. The system handles this decision automatically. Users just describe the task; Quest orchestrates the most suitable capabilities to complete it.

In other words, Quest isn't just an Agent adapted to today's models. It's an Agent prepared for models six months from now.

Why We Don't Expose the File Editing Process
Quest has no file tree and doesn't support users directly modifying files. This is a counterintuitive product decision.

Many Coding Agents display every file modification in real-time, allowing users to intervene and edit at any moment. Quest chooses not to do this for three reasons:

Don't interrupt the Agent's execution flow. User intervention breaks coherent task execution and easily introduces inconsistencies.

Shift users from "watching code" to "focusing on the problem itself." Since the goal is autonomous programming, users should focus their attention on requirement definition and result review.

This is the direction autonomous programming is heading. In the future, users care about "is the task done," not "what changed in this line of code." Quest's interface is designed around final deliverables, not execution process.

Self-Evolution: Stronger with Use
One of Quest's technical breakthroughs is autonomous evolution capability. It can deeply analyze a project's code structure, architectural evolution, and team conventions, internalizing this information as "project understanding."

Specific manifestations:

Understand project module division and dependency relationships

Recognize code style and naming conventions

Learn project-specific architectural patterns

Master team engineering practices

Facing unfamiliar APIs or new frameworks, Quest conducts self-learning through exploration and practice: reading documentation, attempting calls, analyzing errors, adjusting approaches. The longer it's used, the deeper its project understanding and the better its performance.

The Skills system further extends this capability. Teams can encapsulate engineering specs and common patterns into Skills, letting Quest continuously acquire new skills. Quest doesn't just execute tasks; it learns continuously during execution.

We Rebuild Quest with Quest
The Quest team is a power user of Quest itself. The "using Quest to refactor Quest" mentioned at the article's opening isn't case packaging. It's a true reflection of daily work.

During product invitation testing, users have handled builds, verification, and validation of 800,000 images through Quest, created prototypes and design drafts through Quest. Quest is changing how we work.

In engineering architecture, we maintain sufficient fault tolerance and generalization capability. A common temptation is compromising engineering for product effects, turning the Agent into a Workflow. Quest's choice: product presentation starts from the user perspective, but engineering practice firmly adopts Agentic architecture. This doesn't limit model capability and prepares for future model upgrades.

From Pairing to Autonomous Programming
AI programming has gone through three stages: code completion, pair programming, autonomous programming. Quest is exploring the possibilities of the third stage.

When developers' role shifts from "code co-writer" to "intent definer," the software development paradigm will undergo fundamental change. Developers will be liberated from tedious coding details, focusing on higher-level problem definition and architectural design.

This is the future Quest is building: a self-evolving autonomous agent.

DEV Community