web3nomad.eth

Posted on Jan 11

Building Scalable AI Agent Systems: Three Evolutions

#agents #ai #architecture #systemdesign

I. December 2025

We needed to add a new feature to atypica.AI: group discussions (discussionChat).

This should've been simple. We already had interviewChat—one-on-one conversations where users deeply engage with AI-simulated personas. Group discussion was just scaling from 1-to-1 to 1-to-many: 3-8 personas engaging simultaneously, watching perspectives collide and insights emerge.

In theory, we just needed to:

Reuse the interview logic
Adjust prompts to simulate group dynamics
Tweak the UI to show multiple speakers

The reality: We had to modify 12 files.

prisma/schema.prisma          # New Discussion table
src/ai/tools/discussionChat.ts  # New tool
src/ai/tools/saveDiscussion.ts   # Save tool
src/app/(study)/agents/studyAgent.ts     # Add tool to agent
src/app/(study)/agents/fastInsightAgent.ts  # Add again
src/app/(study)/agents/productRnDAgent.ts   # And again
... 6 more files

Worse, we discovered this:

// studyAgentRequest.ts (493 lines)
export async function studyAgentRequest(context) {
  const result = await streamText({
    model: llm("claude-sonnet-4"),
    system: studySystem(),
    messages,
    tools: {
      webSearch,
      interview,
      scoutTask,
      saveAnalyst,
      generateReport
      // ... 15 tools
    },
    onStepFinish: async (step) => {
      // Save messages
      // Track tokens
      // Send notifications
      // ... 120 lines of logic
    }
  });
}

// fastInsightAgentRequest.ts (416 lines)
// 95% identical code

// productRnDAgentRequest.ts (302 lines)
// 95% identical code

Three nearly identical agent wrappers.
Every new feature required copy-pasting across all three.
Every bug fix meant changing it three times.

That moment, we realized: something was fundamentally wrong.

Not that our code wasn't elegant.
Not that we lacked abstraction.
But that we were building AI Agent systems with traditional software engineering thinking.

This article chronicles how we escaped this trap—through three architectural evolutions, rethinking how AI Agents should be built from first principles.

II. Rethinking: What is an AI Agent?

Before refactoring, we stopped to ask a fundamental question:

What's the essential difference between AI Agents and traditional software?

The World of Traditional Software

Traditional software is built on state machines:

class ResearchSession {
  state: 'IDLE' | 'PLANNING' | 'RESEARCHING' | 'REPORTING';
  data: {
    interviews: Interview[];
    findings: Finding[];
    reports: Report[];
  };

  transition(event: Event) {
    switch (this.state) {
      case 'IDLE':
        if (event.type === 'START') this.state = 'PLANNING';
        break;
      case 'PLANNING':
        if (event.type === 'PLAN_COMPLETE') this.state = 'RESEARCHING';
        break;
      // ... more state transitions
    }
  }
}

This model's core assumptions:

State is explicit: I know exactly where I am
Transitions are deterministic: Given state + event, next state is unique
Control is precise: if-else covers all paths

This works beautifully for traditional software. But for AI Agents?

The World of AI Agents

LLMs don't work this way:

const messages = [
  { role: 'user', content: 'Want to understand young people's coffee preferences' },
  { role: 'assistant', content: 'I can help you conduct user research...' },
  { role: 'assistant', toolCalls: [{ name: 'scoutTask', args: {...} }] },
  { role: 'tool', content: 'Observed 5 user segments...' },
  { role: 'assistant', content: 'Based on observations, I suggest interviewing 18-25 coffee enthusiasts...' },
  { role: 'assistant', toolCalls: [{ name: 'interviewChat', args: {...} }] },
  // ...
];

Where's the "state" here?

Not in a state field
But in the entire conversation history

The AI infers from conversation history:

What research does the user want?
How far have we progressed?
What should happen next?

This is a completely different paradigm.

Three Core Insights

From this observation, we derived three insights that shaped our architectural evolution.

Insight 1: Conversation as State

Traditional approach: Maintain explicit state

// ❌ Traditional: Explicit state management
interface ResearchState {
  stage: 'planning' | 'researching' | 'reporting';
  completedInterviews: number;
  pendingTasks: Task[];
}

// Need synchronization: state and conversation history can diverge

AI-native approach: Infer state from conversation

// ✅ AI-native: Conversation is state
const messages = [...conversationHistory];

// AI infers state from history, no explicit sync needed
const result = await streamText({
  messages,
  // AI knows what to do
});

Why is conversation superior to state machines?

Natural alignment: LLMs work on message history natively
Strong fault tolerance: State machines are hard to recover from errors; conversations can be "rewound" and replayed
Easy extension: Adding new capabilities doesn't require modifying state graphs

Insight 2: Reasoning-Execution Separation

How humans make decisions:

Understand intent: "What am I trying to achieve?" → Clarify goals
Choose method: "How do I do it?" → Execution steps

AI Agents should follow the same pattern:

// Plan Mode: Understanding intent
"User says: want to understand young people's coffee preferences"
  → Analyze: needs qualitative research
  → Decide: use group discussion method
  → Output: complete research plan

// Study Agent: Executing plan
"Received research plan"
  → Call discussionChat
  → Analyze discussion results
  → Generate insights report

Why separate?

Reasoning needs deep thinking (use Claude Sonnet 4)
Execution needs fast response (can use smaller models)
Separation of concerns, single responsibility

Insight 3: Simple Over Precise

Facing the "AI forgetfulness" problem, we could:

Option A: Vector DB + Semantic Search

// Precise matching of relevant memories
const query_embedding = await embed(user_message);
const relevant_memories = await vectorDB.search(query_embedding, top_k=5);

✅ Precise retrieval
❌ Requires embedding, indexing, complex queries
❌ High maintenance cost

Option B: Markdown Files + Full Loading

// Simple and transparent
const memory = await readFile(`memories/${userId}.md`);
const messages = [
  { role: 'user', content: `<UserMemory>\n${memory}\n</UserMemory>` },
  ...conversationMessages
];

✅ Simple, transparent, user-editable
✅ Leverages large context windows (Claude 200K tokens)
✅ Easier to debug and understand

We chose Option B.

Why?

Context windows changed the game: User memory typically < 10K, full loading is perfectly viable
Simple solutions are more reliable: No embedding inconsistency, no retrieval failures
User control: Memory is transparent, users can view and edit

Four Design Principles

From these three insights, we distilled the core principles of our architecture:

1. Messages as Source of Truth

All important information lives in messages
Database only stores derived state (like reports, study logs)
Similar to Event Sourcing: messages are the event log

2. Configuration over Code

Use configuration to express differences
Use code to express commonalities
Avoid over-abstraction

3. AI as State Manager

Let AI manage state transitions
Don't hand-write complex state machines
Adapt to LLM's capability boundaries

4. Simple, Transparent, Controllable

Simple beats complex
Transparent beats black box
User control beats AI automation

III. Step 1: Message-Driven Architecture

v2.2.0 - 2025-12-27

Problem: Dual Source of Truth

Initially, research data was scattered across three places:

// Place 1: analyst table
const analyst = await prisma.analyst.findUnique({
  where: { id }
});
console.log(analyst.studySummary);  // "Research summary..."

// Place 2: interviews table
const interviews = await prisma.interview.findMany({
  where: { analystId: id }
});
console.log(interviews.map(i => i.conclusion));  // ["Interview 1 conclusion", "Interview 2 conclusion"]

// Place 3: messages table
const messages = await prisma.chatMessage.findMany({
  where: { userChatId }
});
// webSearch results are here

Generating reports required stitching from three places:

async function generateReport(analystId) {
  const analyst = await prisma.analyst.findUnique({
    where: { id: analystId },
    include: { interviews: true }  // JOIN!
  });

  const messages = await prisma.chatMessage.findMany({
    where: { userChatId: analyst.studyUserChatId }
  });

  // Stitch data together
  const reportData = {
    summary: analyst.studySummary,              // from analyst table
    interviewInsights: analyst.interviews.map(...),  // from interviews table
    webResearch: extractFromMessages(messages)    // from messages table
  };
}

Problems:

Data inconsistency: interviews.conclusion and interview content in messages could diverge
Partial failures: When tool calls fail, data is half-saved, hard to trace full context
Hard to extend: Adding discussionChat requires new table, new tool, new queries

Even worse, tool outputs were inconsistent:

// interviewChat: content in DB, returns reference
{
  toolName: 'interviewChat',
  output: { interviewId: 123 }  // Need another DB query
}

// scoutTaskChat: content in return value
{
  toolName: 'scoutTaskChat',
  output: {
    plainText: "Observation results...",  // Content directly returned
    insights: [...]
  }
}

Agents couldn't handle this uniformly, leading to complex code.

Solution: Messages as Single Source

Core idea: All research content flows into the message stream. Database only stores derived state.

// ✅ New architecture: Unified output format
interface ResearchToolResult {
  plainText: string;  // Human-readable summary, required
  [key: string]: any; // Optional structured data
}

// interviewChat also returns plainText
{
  toolName: 'interviewChat',
  output: {
    plainText: "Interview summary: User Zhang San mentioned...",  // ← Full content here
    interviewId: 123  // Optional: DB reference
  }
}

Key changes:

Removed 5 specialized save tools
- Deleted: saveInterview, saveDiscussion, saveScoutTask, ...
- Reason: Agents output directly to messages, no explicit save needed
Unified tool output format
- All research tools return plainText
- Agents can uniformly process all tool results
Generate studyLog on demand

   // Don't pre-save, generate when needed
   if (!analyst.studyLog) {
     const messages = await loadMessages(studyUserChatId);
     const studyLog = await generateStudyLog(messages);  // ← Generate from messages
     await prisma.analyst.update({
       where: { id },
       data: { studyLog }
     });
   }

Why This Design?

Reasoning from first principles:

Conversation as context
- LLMs need complete context to generate reports
- Message history is naturally the most complete, most natural context
- Avoids complexity of "reconstructing context from DB"
LLMs excel at extraction
- Generating structured content (studyLog) from conversations is LLM's strength
- More flexible and reliable than hand-written parsing logic
Shadow of Event Sourcing
- Message sequence = event log
- studyLog, report = derived state
- Can be replayed and regenerated anytime

Comparison with other approaches:

Approach	Pros	Cons	Why not chosen
Messages as source	Data consistent, easy to extend	Requires extra LLM call to generate studyLog	✅ Our choice
Traditional state management	Precise control	Complex state sync, hard to trace	Doesn't suit LLM non-determinism
Remove DB entirely	Extremely simple	Frontend queries difficult, history hard to manage	Need structured display
Event Sourcing	Complete history, replayable	High engineering complexity	Over-engineered for current scale

Impact

Code simplification:

Deleted files:
- src/ai/tools/saveInterview.ts
- src/ai/tools/saveDiscussion.ts
- src/ai/tools/saveScoutTask.ts
- src/ai/tools/savePersona.ts
- src/ai/tools/saveWebSearch.ts

Simplified files (28):
- Agent configs no longer need save tools
- generateReport doesn't need multi-table JOINs

Development efficiency:

Before:

Adding discussionChat:
1. Create Discussion table
2. Write discussionChat tool
3. Write saveDiscussion tool
4. Add both tools to 3 agents
5. Write discussion query logic
6. Modify generateReport query

Total: 12 files, 2-3 days

After:

Adding discussionChat:
1. Write discussionChat tool (returns plainText)
2. Add tool to agent config
3. generateReport auto-supports (reads from messages)

Total: 3 files, 2-3 hours

Cost trade-offs:

✅ Benefits:

Simplified architecture: deleted 5 tools, simplified 28 files
Data consistency: full context traceable even on failures
Easy extension: adding new research methods goes from 12 steps → 3 steps

❌ Costs:

studyLog generation requires extra LLM call (~2K tokens, ~$0.002)
Slightly higher token consumption for long conversations

✅ Mitigation:

Prompt cache reduces repeated token cost by 90%
Architectural benefits far outweigh costs

III. Step 2: Intent Clarification + Unified Execution

v2.3.0 - 2026-01-06

Problem 1: Vague Requirements → Inefficient Dialogue

After implementing message-driven architecture, adding features became simpler. But user experience wasn't good enough.

When creating research, users often say:

"Want to understand young people's coffee preferences"

This isn't specific enough:

Which young people? 18-22 college students? Or 23-28 young professionals?
What method? In-depth interviews? Group discussions? Or social media observation?
What output? User personas? Market insights? Or product recommendations?

Traditional approach: AI asks multiple questions

AI: "Which age group do you want to research?"
User: "18-25 I guess"
AI: "What method? Interviews or surveys?"
User: "Interviews"
AI: "How many people?"
User: "Around 10"

Problems:

Requires 3-5 conversation rounds
Poor user experience (feels like filling forms)
AI can't proactively suggest best approaches

Problem 2: 95% Duplicate Code

While adding features became simpler, we discovered a bigger technical debt:

$ wc -l src/app/(study)/agents/*AgentRequest.ts
493 studyAgentRequest.ts
416 fastInsightAgentRequest.ts
302 productRnDAgentRequest.ts

Three nearly identical agent wrappers, totaling 1,211 lines.

Code duplication mainly in:

Message loading and processing (~80 lines each)
File attachment handling (~60 lines each)
MCP integration (~40 lines each)
Token tracking (~50 lines each)
Notification sending (~30 lines each)

Every new feature (like webhook integration) required changing all three places.

Solution: Plan Mode + baseAgentRequest

Our solution has two parts:

Part 1: Plan Mode (Intent Clarification Layer)

A separate agent dedicated to intent clarification:

// src/app/(study)/agents/configs/planModeAgentConfig.ts

export async function createPlanModeAgentConfig() {
  return {
    model: "claude-sonnet-4-5",
    systemPrompt: planModeSystem({ locale }),
    tools: {
      requestInteraction,  // Interact with user
      makeStudyPlan,       // Display complete plan, one-click confirm
    },
    maxSteps: 5,  // Max 5 steps to complete clarification
  };
}

Workflow:

sequenceDiagram
    participant User
    participant PlanMode as Plan Mode Agent
    participant StudyAgent as Study Agent

    User->>PlanMode: "Want to understand young people's coffee preferences"

    PlanMode->>PlanMode: Analyze requirements
    Note over PlanMode: - Target: 18-25 years old<br/>- Research type: qualitative insights<br/>- Best method: group discussion

    PlanMode->>User: Display complete plan
    Note over PlanMode,User: 【Research Plan】<br/>Goal: Understand 18-25 coffee preferences<br/>Method: Group discussion (5-8 people)<br/>Duration: ~40 minutes<br/>Output: Consumer insights report<br/><br/>[Confirm Start] [Modify Plan]

    User->>PlanMode: [Confirm Start]

    PlanMode->>StudyAgent: Intent recorded in messages
    Note over StudyAgent: Read intent from conversation history<br/>Execute research plan

    StudyAgent->>User: Start research execution

Key design:

Plan Mode's decisions are recorded in messages
Study Agent infers intent from messages, no explicit passing needed
Avoids complexity of context passing

Part 2: baseAgentRequest (Unified Executor)

Merge three duplicate agent wrappers into one generic executor:

// src/app/(study)/agents/baseAgentRequest.ts (577 lines)

interface AgentRequestConfig<TOOLS extends ToolSet> {
  model: LLMModelName;
  systemPrompt: string;
  tools: TOOLS;
  maxSteps?: number;

  specialHandlers?: {
    // Dynamically control which tools are available
    customPrepareStep?: (options) => {
      messages,
      activeTools?: (keyof TOOLS)[]
    };

    // Custom post-processing logic
    customOnStepFinish?: (step, context) => Promise<void>;
  };
}

async function executeBaseAgentRequest<TOOLS>(
  baseContext: BaseAgentContext,
  config: AgentRequestConfig<TOOLS>,
  streamWriter: UIMessageStreamWriter
) {
  // Phase 1: Initialization
  // Phase 2: Prepare Messages
  // Phase 3: Universal Attachment Processing
  // Phase 4: Universal MCP and Team System Prompt
  // Phase 5: Load Memory and Inject into Context
  // Phase 6: Main Streaming Loop
  // Phase 7: Universal Notifications
}

Agent routing:

// src/app/(study)/api/chat/route.ts

if (!analyst.kind) {
  // Plan Mode - intent clarification
  const config = await createPlanModeAgentConfig(agentContext);
  await executeBaseAgentRequest(agentContext, config, streamWriter);

} else if (analyst.kind === AnalystKind.productRnD) {
  // Product R&D Agent
  const config = await createProductRnDAgentConfig(agentContext);
  await executeBaseAgentRequest(agentContext, config, streamWriter);

} else {
  // Study Agent (comprehensive research, fast insights, testing, creative, etc.)
  const config = await createStudyAgentConfig(agentContext);
  await executeBaseAgentRequest(agentContext, config, streamWriter);
}

Each agent only needs to define configuration:

// src/app/(study)/agents/configs/studyAgentConfig.ts

export async function createStudyAgentConfig(params) {
  return {
    model: "claude-sonnet-4",
    systemPrompt: studySystem({ locale }),
    tools: buildStudyTools(params),  // ← Tools this agent needs

    specialHandlers: {
      // Custom tool control
      customPrepareStep: async ({ messages }) => {
        const toolUseCount = calculateToolUsage(messages);
        let activeTools = undefined;

        // After report generation, restrict available tools
        if ((toolUseCount[ToolName.generateReport] ?? 0) > 0) {
          activeTools = [
            ToolName.generateReport,
            ToolName.reasoningThinking,
            ToolName.toolCallError,
          ];
        }

        return { messages, activeTools };
      },

      // Custom post-processing
      customOnStepFinish: async (step) => {
        // After saving research intent, auto-generate title
        const saveAnalystTool = findTool(step, ToolName.saveAnalyst);
        if (saveAnalystTool) {
          await generateChatTitle(studyUserChatId);
        }
      },
    },
  };
}

Why This Design?

Reasoning-execution separation rationale:

Matches cognitive model
- Human decision-making: first figure out "what to do", then consider "how to do it"
- System 1 (intuition) vs System 2 (reasoning)
- Plan Mode = System 2, Study Agent = System 1
Single responsibility
- Plan Mode: focuses on intent understanding, doesn't need to know execution details
- Study Agent: focuses on research execution, doesn't need to handle clarification
- Each is simpler and easier to maintain
Messages as protocol
- Plan Mode's decisions → messages
- Study Agent reads intent from messages
- Loosely coupled without losing context

Unified executor rationale:

Extract, Don't Rebuild
- Extract common patterns from three similar implementations
- Not designing abstraction layer from scratch
Configuration over Inheritance
- Agent differences expressed through configuration
- No inheritance or polymorphism
Plugin-based Lifecycle
- customPrepareStep: dynamic tool control
- customOnStepFinish: custom post-processing
- Preserve extension points, don't hard-code all logic

Comparison with other approaches:

Approach	Pros	Cons	Why not chosen
Plan Mode + baseAgentRequest	Remove duplicate code, separate reasoning-execution	One more abstraction layer	✅ Our choice
Continue copy-pasting	Simple and direct	Tech debt accumulates, hard to maintain	Unsustainable long-term
Fully generic agent	Least code	Sacrifices specialization and control	Can't handle business differences
Microservices split	Independent deployment	Over-engineered, adds ops complexity	Unnecessary at current scale

Impact

Code complexity:

Deleted:
- studyAgentRequest.ts (493 lines)
- fastInsightAgentRequest.ts (416 lines)
- productRnDAgentRequest.ts (302 lines)
Total: -1,211 lines

Added:
+ baseAgentRequest.ts (577 lines)
+ planModeAgentConfig.ts (120 lines)
+ studyAgentConfig.ts (180 lines)
+ productRnDAgentConfig.ts (80 lines)
Total: +957 lines

Net reduction: -254 lines

But more importantly:

Cyclomatic Complexity: 12.3 → 6.7 (45% reduction)
Code duplication: 95% → 0%

Development efficiency:

Before:

Adding MCP integration:
1. Modify studyAgentRequest.ts
2. Modify fastInsightAgentRequest.ts
3. Modify productRnDAgentRequest.ts
4. Test three agents

Time: 2-3 days

After:

Adding MCP integration:
1. Modify baseAgentRequest.ts
2. All agents automatically gain new capability

Time: 2-3 hours

User experience:

Before:

User: "Want to understand young people's coffee preferences"
AI: "Which age group do you want to research?"
User: "18-25"
AI: "What method do you want to use?"
User: "Interviews I guess"
AI: "How many people?"
... (3-5 conversation rounds)

After:

User: "Want to understand young people's coffee preferences"
AI displays complete plan:
┌─────────────────────────────────────┐
│ 【Research Plan】                   │
│ Goal: Understand 18-25 coffee prefs │
│ Method: Group discussion (5-8 ppl)  │
│ Duration: ~40 minutes               │
│ Output: Consumer insights report    │
│                                     │
│ [Confirm Start] [Modify Plan]       │
└─────────────────────────────────────┘

Intent clarification: 3-5 conversation rounds → 1 confirmation

III. Step 3: Persistent Memory

v2.3.0 - 2026-01-08

Problem: AI "Amnesia"

With intent clarification and unified architecture, the research workflow was smooth. But long-term users reported a problem:

"Why does the AI ask me what industry I'm in every single time?"

The AI doesn't remember users. Every conversation feels like the first meeting:

"What industry are you in?"
"Which dimensions do you care about?"
"What's your research goal?"

Users feel the AI is "forgetful", the experience lacks personalization.

Root cause:

LLMs are stateless. Each conversation:

const result = await streamText({
  messages: currentConversation,  // ← Only current conversation
  // No context from historical conversations
});

Although we have historical conversations in the DB:

Cross-conversation info lost: Each research is an independent session
Important info buried: Key information in long conversations is hard to extract
No persistent memory: No long-term memory of "who the user is"

Solution: Two-Tier Memory Architecture

We need a persistent memory system. But how to design it?

Inspired by Anthropic's CLAUDE.md approach:

Simple Markdown files
User-viewable and editable
Fully loaded into context

We adopted a similar approach but added automatic update mechanisms.

Data Model

model Memory {
  id      Int  @id @default(autoincrement())
  userId  Int? // User-level memory
  teamId  Int? // Team-level memory
  version Int  // Version management

  // Two-tier architecture
  core    String @default("") @db.Text  // Core memory (Markdown)
  working Json   @default("[]")         // Working memory (JSON, to be consolidated)

  changeNotes String @db.Text  // Update notes

  @@unique([userId, version])
  @@index([userId, version(sort: Desc)])
}

Two-tier architecture:

Core Memory (core)

Markdown format, human-readable
Long-term stable user information
Example:

 # User Information
 - Industry: Consumer goods product manager
 - Focus: Young consumer preferences, emerging trends

 # Research Style
 - Prefers qualitative research (interviews, discussions)
 - Values authentic user voices over statistics

Working Memory (working)

JSON format, structured
New information to be consolidated
Example:

 [
   { "info": "User recently focused on coffee market", "source": "chat_123" },
   { "info": "Prefers group discussion method", "source": "chat_124" }
 ]

Automatic Update Mechanism

Two-stage update:

// src/app/(memory)/actions.ts

async function updateMemory({ userId, conversationContext }) {
  let memory = await loadLatestMemory(userId);

  // Step 1: Reorganize when threshold exceeded (Claude Sonnet 4.5)
  if (memory.core.length > 8000 || memory.working.length > 20) {
    memory = await reorganizeMemory(memory, conversationContext);
  }

  // Step 2: Extract new information (Claude Haiku 4.5)
  const newInfo = await extractMemoryUpdate(memory.core, conversationContext);

  if (newInfo) {
    // Step 3: Insert new information at specified location
    await insertMemoryInfo(memory, newInfo);
  }
}

Memory Update Agent (Haiku 4.5):

Extract new user information from conversations
Low cost (~$0.001/time)
Runs in background after each conversation

Memory Reorganize Agent (Sonnet 4.5):

Consolidate working memory into core memory
Remove redundancy, merge similar information
Slightly higher cost (~$0.02/time), but infrequently triggered

Integration into Conversation Flow

// src/app/(study)/agents/baseAgentRequest.ts

// Phase 5: Load Memory
const memory = await loadUserMemory(userId);

if (memory?.core) {
  // Inject at conversation start
  modelMessages = [
    {
      role: 'user',
      content: `<UserMemory>\n${memory.core}\n</UserMemory>`
    },
    ...modelMessages
  ];
}

// Phase 6: Streaming
const result = await streamText({
  messages: modelMessages,  // ← Includes user memory
  // ...
});

// Phase 7: Non-blocking memory update
waitUntil(
  updateMemory({ userId, conversationContext: messages })
);

Why This Design?

Why Markdown over Vector DB?

Context window is large enough
- Claude 3.5 Sonnet: 200K tokens
- User memory typically < 10K characters (~3K tokens)
- Full loading is simpler and more accurate than retrieval
Simple and transparent
- Markdown is user-readable and editable
- No embeddings, no vector search, no complex indexing
- Aligns with Anthropic's philosophy: user control
Avoid premature optimization
- Don't need real-time retrieval (low conversation frequency)
- Don't need precise matching (full text provides enough context)
- Start with simple solution, optimize when necessary

Comparison with mainstream approaches:

Approach	Storage	Control	Retrieval	atypica choice rationale
Anthropic (CLAUDE.md)	File-based	User-driven	Full loading	✅ Simple, transparent, effective with large context
OpenAI	Vector DB (speculated)	AI + user confirmation	Semantic retrieval	❌ Black box, weak user control
Mem0	Vector + Graph + KV	AI-driven	Hybrid retrieval	❌ Over-engineered, high maintenance cost
MemGPT	OS-inspired tiered	AI self-managed	Tiered retrieval	❌ Conceptually complex, utility unproven

We chose Anthropic's simple approach because:

Fits current scale (personal assistant, not enterprise knowledge base)
User controllable (transparent, editable)
As context windows grow, this approach becomes better

Impact

User experience:

Before:

First conversation:
User: "Want to do coffee research"
AI: "What industry are you in?"
User: "Consumer goods"
AI: "What dimensions do you care about?"
...

Second conversation (a week later):
User: "Want to do tea beverage research"
AI: "What industry are you in?"  # ← Asks again

After:

First conversation:
User: "Want to do coffee research"
AI: "What industry are you in?"
User: "Consumer goods product manager"
# AI remembers

Second conversation (a week later):
User: "Want to do tea beverage research"
AI: "Based on your background as a consumer goods PM, I suggest..."  # ← Remembers!

System cost:

Memory Update (per conversation):
- Model: Claude Haiku 4.5
- Tokens: ~5K
- Cost: ~$0.001

Memory Reorganize (every 20 conversations):
- Model: Claude Sonnet 4.5
- Tokens: ~15K
- Cost: ~$0.02

Average cost: ~$0.002/conversation

Response time:

Memory loading: +50ms (non-blocking)
Memory update: background, doesn't affect response

Low cost, fast response, completely acceptable.

IV. Architecture Comparison: Our Unique Choices

Now let's step back and see how atypica's architecture differs from mainstream AI Agent frameworks.

State Management: Messages vs Memory Classes

atypica	LangChain	Core Difference
Messages as source	ConversationBufferMemory	We believe conversation history is the best state
Generate studyLog on demand	Pre-compute summary	Avoid sync issues, traceable on failures
DB stores derived state	DB stores core state	Similar to Event Sourcing

Why different?

LangChain's design is influenced by traditional software, believing "state should be explicitly stored and managed."

We believe, for LLMs:

Conversation history = complete state
Derived state (studyLog) can be regenerated
Simpler, more fault-tolerant

Agent Architecture: Configuration vs Graph

atypica	LangGraph	Core Difference
Configuration-driven	Graph-driven	We use configuration to express differences, code for commonalities
Single executor	Node orchestration	Avoid over-abstraction, good enough is enough
Messages as protocol	Explicit node communication	Loosely coupled without losing context

Why different?

LangGraph pursues generality, using graph orchestration to express arbitrarily complex flows.

We believe, for our scenarios:

Configuration-driven is simpler: 99% of needs can be met with configuration
Single executor is sufficient: Don't need graph orchestration's flexibility
Simpler is more reliable: Fewer abstraction layers, easier to debug

Memory System: Markdown vs Vector DB

atypica	Mem0	Core Difference
Markdown files	Vector + Graph + KV	We choose simple and transparent over precise and complex
Full loading	Semantic retrieval	When context window is large enough, full text is better
User-editable	AI black box	User trust comes from transparency

Why different?

Mem0 pursues precise retrieval, using multiple databases in hybrid.

We believe, for personal assistants:

Simple solution is enough: User memory typically < 10K
Transparent beats precise: Users can view and edit memory
Gets better as context grows: At 1M tokens in the future, this approach will crush Vector DB

Core Philosophy Differences

atypica's choices:

Simple, transparent, controllable
Adapt to LLM characteristics (large context, non-determinism)
Start from real pain points, not pursuing architectural perfection

Mainstream frameworks' choices:

Precise, complex, automatic
Port traditional software engineering patterns
Pursue generality and flexibility

Who's right or wrong?

Neither is wrong. It's just:

Our scenario (personal research assistant) suits simple approaches better
As context windows grow, simple approaches become better
User trust comes from transparency, not AI magic

V. Quantitative Impact

Specific impact from three evolutions:

Code Complexity

Duplicate code:
Before: 1,211 lines (three agent wrappers)
After: 0 lines
Reduction: 100%

Total lines of code:
Before: 1,211 lines (duplicates) + others
After: 577 lines (base) + 380 lines (configs) = 957 lines
Net reduction: 254 lines (21%)

Cyclomatic Complexity (code complexity metric):
Before: avg 12.3
After: avg 6.7
Reduction: 45%

Development Efficiency

Task	Before	After	Improvement
Add new research method	12 files, 2-3 days	3 files, 2-3 hours	10x
Add new capability (MCP)	Modify 3 places, 1 day	Modify 1 place, 2 hours	4x
Fix bug	Change 3 agents	Change 1 base	3x

System Performance

Token consumption (with prompt cache):
- studyLog generation: ~2K tokens (~$0.002)
- Memory update: ~5K tokens (~$0.005)
- Average per conversation: +$0.007

Response time:
- Memory loading: +50ms (non-blocking)
- Plan Mode: +2s (one-time)
- studyLog generation: background, doesn't affect response

Cost and performance impact negligible.

User Experience

Intent clarification:
Before: average 3.2 conversation rounds
After: 1 plan display + 1 confirmation
Improvement: 3x efficiency

AI "memory":
Before: repetitive questions every conversation
After: auto-load user preferences
Improvement: personalized experience

Research startup time:
Before: ~5 minutes (multiple rounds of clarification)
After: ~1 minute (one-click confirm)
Improvement: 5x efficiency

VI. Lessons Learned

What did we learn from three evolutions?

What We Did Right

1. Incremental refactoring, not big bang

We didn't rewrite the entire system at once. Three evolutions, each step:

Delivers value independently
Maintains backward compatibility (keeping analyst.studySummary field)
Can be rolled back

This let us quickly validate ideas and reduce risk.

2. Start from real pain points

Don't pursue architectural perfection, instead:

Message-driven: because adding discussionChat was too complex
Unified execution: because duplicate code was too much
Persistent memory: because users reported AI forgetfulness

Let problems drive design, not design drive problems.

3. Embrace LLM characteristics

Don't treat LLMs as traditional software:

Don't hand-write state machines, let AI infer state from conversations
Leverage large context windows, rather than pursuing precise retrieval
Let AI generate studyLog, rather than hand-writing parsers

Adapt to LLM's capability boundaries, rather than fighting them.

Costs We Paid

1. Learning curve for abstraction layer

baseAgentRequest requires understanding to modify:

6 phases of execution flow
Timing of customPrepareStep and customOnStepFinish
Generic constraints and type inference

But: clear interfaces and documentation lowered the barrier.

2. Cost of on-demand generation

studyLog generation requires LLM call (~$0.002/time).

But:

Prompt cache reduces cost by 90%
Architectural benefits >> small cost
Acceptable

3. Limitations of simple solutions

Markdown memory isn't suitable for:

Large-scale knowledge bases (> 100K tokens)
Complex relational queries
Multi-dimensional retrieval

But:

Good enough for personal assistant scenarios
Can upgrade to Vector DB in the future
Solve 80% of problems first

Unexpected Benefits

1. Confidence from type safety

// Fully type-safe tool handling
const tool = step.toolResults.find(
  t => !t.dynamic && t.toolName === ToolName.generateReport
) as StaticToolResult<Pick<StudyToolSet, ToolName.generateReport>>;

if (tool?.output) {
  const token = tool.output.reportToken;  // ← TypeScript knows this field exists
}

During refactoring, the compiler catches 99% of issues.

2. Flexibility of configuration-driven

Adding webhook integration only requires:

// baseAgentRequest.ts
if (webhookUrl) {
  await sendWebhook(webhookUrl, step);
}

All agents automatically gain new capability, no config changes needed.

3. Power of messages as protocol

Plan Mode and Study Agent communicate through messages:

Decoupled: can be modified independently
Without losing context: complete decision process in messages
Traceable: can replay when problems occur

This was an unexpected benefit.

VII. Future Directions

Three evolutions brought atypica closer to general-purpose agents. But there's more to do.

Short-term (3-6 months)

1. Skills Library

Further modularize tools
Users can compose their own agents
Like GPTs, but more flexible

2. Multi-Agent Collaboration

Not just serial execution
Parallel research, cross-validation
Like AutoGPT, but more controllable

Long-term (1-2 years)

3. Evolve toward GEA

GEA = General Execution Architecture
Not just research agents, but a universal AI Agent execution framework
Can run any type of agent

4. Self-Improving Agents

Agents learn from past executions
Continuously optimize prompts and strategies
Get smarter with use

Unchanging Principles

No matter how we evolve, we stick to:

Simple beats complex
Transparent beats black box
User control beats AI automation

VIII. Conclusion

Building AI Agent systems is not a simple extension of traditional software engineering.

We need to rethink:

What is state? (Conversation history)
What is an interface? (Message protocol)
What is control flow? (AI reasoning)

atypica's three evolutions are essentially three cognitive upgrades:

From database thinking → data flow thinking
- Don't maintain explicit state, infer state from messages
From code reuse → configuration-driven
- Don't pursue perfect abstraction, use configuration to express differences
From stateless → memory-enhanced
- Don't rely on precise retrieval, use simple and transparent methods