I. December 2025
We needed to add a new feature to atypica.AI: group discussions (discussionChat).
This should've been simple. We already had interviewChat—one-on-one conversations where users deeply engage with AI-simulated personas. Group discussion was just scaling from 1-to-1 to 1-to-many: 3-8 personas engaging simultaneously, watching perspectives collide and insights emerge.
In theory, we just needed to:
- Reuse the interview logic
- Adjust prompts to simulate group dynamics
- Tweak the UI to show multiple speakers
The reality: We had to modify 12 files.
prisma/schema.prisma # New Discussion table
src/ai/tools/discussionChat.ts # New tool
src/ai/tools/saveDiscussion.ts # Save tool
src/app/(study)/agents/studyAgent.ts # Add tool to agent
src/app/(study)/agents/fastInsightAgent.ts # Add again
src/app/(study)/agents/productRnDAgent.ts # And again
... 6 more files
Worse, we discovered this:
// studyAgentRequest.ts (493 lines)
export async function studyAgentRequest(context) {
const result = await streamText({
model: llm("claude-sonnet-4"),
system: studySystem(),
messages,
tools: {
webSearch,
interview,
scoutTask,
saveAnalyst,
generateReport
// ... 15 tools
},
onStepFinish: async (step) => {
// Save messages
// Track tokens
// Send notifications
// ... 120 lines of logic
}
});
}
// fastInsightAgentRequest.ts (416 lines)
// 95% identical code
// productRnDAgentRequest.ts (302 lines)
// 95% identical code
Three nearly identical agent wrappers.
Every new feature required copy-pasting across all three.
Every bug fix meant changing it three times.
That moment, we realized: something was fundamentally wrong.
Not that our code wasn't elegant.
Not that we lacked abstraction.
But that we were building AI Agent systems with traditional software engineering thinking.
This article chronicles how we escaped this trap—through three architectural evolutions, rethinking how AI Agents should be built from first principles.
II. Rethinking: What is an AI Agent?
Before refactoring, we stopped to ask a fundamental question:
What's the essential difference between AI Agents and traditional software?
The World of Traditional Software
Traditional software is built on state machines:
class ResearchSession {
state: 'IDLE' | 'PLANNING' | 'RESEARCHING' | 'REPORTING';
data: {
interviews: Interview[];
findings: Finding[];
reports: Report[];
};
transition(event: Event) {
switch (this.state) {
case 'IDLE':
if (event.type === 'START') this.state = 'PLANNING';
break;
case 'PLANNING':
if (event.type === 'PLAN_COMPLETE') this.state = 'RESEARCHING';
break;
// ... more state transitions
}
}
}
This model's core assumptions:
- State is explicit: I know exactly where I am
- Transitions are deterministic: Given state + event, next state is unique
- Control is precise: if-else covers all paths
This works beautifully for traditional software. But for AI Agents?
The World of AI Agents
LLMs don't work this way:
const messages = [
{ role: 'user', content: 'Want to understand young people's coffee preferences' },
{ role: 'assistant', content: 'I can help you conduct user research...' },
{ role: 'assistant', toolCalls: [{ name: 'scoutTask', args: {...} }] },
{ role: 'tool', content: 'Observed 5 user segments...' },
{ role: 'assistant', content: 'Based on observations, I suggest interviewing 18-25 coffee enthusiasts...' },
{ role: 'assistant', toolCalls: [{ name: 'interviewChat', args: {...} }] },
// ...
];
Where's the "state" here?
- Not in a
statefield - But in the entire conversation history
The AI infers from conversation history:
- What research does the user want?
- How far have we progressed?
- What should happen next?
This is a completely different paradigm.
Three Core Insights
From this observation, we derived three insights that shaped our architectural evolution.
Insight 1: Conversation as State
Traditional approach: Maintain explicit state
// ❌ Traditional: Explicit state management
interface ResearchState {
stage: 'planning' | 'researching' | 'reporting';
completedInterviews: number;
pendingTasks: Task[];
}
// Need synchronization: state and conversation history can diverge
AI-native approach: Infer state from conversation
// ✅ AI-native: Conversation is state
const messages = [...conversationHistory];
// AI infers state from history, no explicit sync needed
const result = await streamText({
messages,
// AI knows what to do
});
Why is conversation superior to state machines?
- Natural alignment: LLMs work on message history natively
- Strong fault tolerance: State machines are hard to recover from errors; conversations can be "rewound" and replayed
- Easy extension: Adding new capabilities doesn't require modifying state graphs
Insight 2: Reasoning-Execution Separation
How humans make decisions:
- Understand intent: "What am I trying to achieve?" → Clarify goals
- Choose method: "How do I do it?" → Execution steps
AI Agents should follow the same pattern:
// Plan Mode: Understanding intent
"User says: want to understand young people's coffee preferences"
→ Analyze: needs qualitative research
→ Decide: use group discussion method
→ Output: complete research plan
// Study Agent: Executing plan
"Received research plan"
→ Call discussionChat
→ Analyze discussion results
→ Generate insights report
Why separate?
- Reasoning needs deep thinking (use Claude Sonnet 4)
- Execution needs fast response (can use smaller models)
- Separation of concerns, single responsibility
Insight 3: Simple Over Precise
Facing the "AI forgetfulness" problem, we could:
Option A: Vector DB + Semantic Search
// Precise matching of relevant memories
const query_embedding = await embed(user_message);
const relevant_memories = await vectorDB.search(query_embedding, top_k=5);
- ✅ Precise retrieval
- ❌ Requires embedding, indexing, complex queries
- ❌ High maintenance cost
Option B: Markdown Files + Full Loading
// Simple and transparent
const memory = await readFile(`memories/${userId}.md`);
const messages = [
{ role: 'user', content: `<UserMemory>\n${memory}\n</UserMemory>` },
...conversationMessages
];
- ✅ Simple, transparent, user-editable
- ✅ Leverages large context windows (Claude 200K tokens)
- ✅ Easier to debug and understand
We chose Option B.
Why?
- Context windows changed the game: User memory typically < 10K, full loading is perfectly viable
- Simple solutions are more reliable: No embedding inconsistency, no retrieval failures
- User control: Memory is transparent, users can view and edit
Four Design Principles
From these three insights, we distilled the core principles of our architecture:
1. Messages as Source of Truth
- All important information lives in messages
- Database only stores derived state (like reports, study logs)
- Similar to Event Sourcing: messages are the event log
2. Configuration over Code
- Use configuration to express differences
- Use code to express commonalities
- Avoid over-abstraction
3. AI as State Manager
- Let AI manage state transitions
- Don't hand-write complex state machines
- Adapt to LLM's capability boundaries
4. Simple, Transparent, Controllable
- Simple beats complex
- Transparent beats black box
- User control beats AI automation
III. Step 1: Message-Driven Architecture
v2.2.0 - 2025-12-27
Problem: Dual Source of Truth
Initially, research data was scattered across three places:
// Place 1: analyst table
const analyst = await prisma.analyst.findUnique({
where: { id }
});
console.log(analyst.studySummary); // "Research summary..."
// Place 2: interviews table
const interviews = await prisma.interview.findMany({
where: { analystId: id }
});
console.log(interviews.map(i => i.conclusion)); // ["Interview 1 conclusion", "Interview 2 conclusion"]
// Place 3: messages table
const messages = await prisma.chatMessage.findMany({
where: { userChatId }
});
// webSearch results are here
Generating reports required stitching from three places:
async function generateReport(analystId) {
const analyst = await prisma.analyst.findUnique({
where: { id: analystId },
include: { interviews: true } // JOIN!
});
const messages = await prisma.chatMessage.findMany({
where: { userChatId: analyst.studyUserChatId }
});
// Stitch data together
const reportData = {
summary: analyst.studySummary, // from analyst table
interviewInsights: analyst.interviews.map(...), // from interviews table
webResearch: extractFromMessages(messages) // from messages table
};
}
Problems:
-
Data inconsistency:
interviews.conclusionand interview content in messages could diverge - Partial failures: When tool calls fail, data is half-saved, hard to trace full context
-
Hard to extend: Adding
discussionChatrequires new table, new tool, new queries
Even worse, tool outputs were inconsistent:
// interviewChat: content in DB, returns reference
{
toolName: 'interviewChat',
output: { interviewId: 123 } // Need another DB query
}
// scoutTaskChat: content in return value
{
toolName: 'scoutTaskChat',
output: {
plainText: "Observation results...", // Content directly returned
insights: [...]
}
}
Agents couldn't handle this uniformly, leading to complex code.
Solution: Messages as Single Source
Core idea: All research content flows into the message stream. Database only stores derived state.
// ✅ New architecture: Unified output format
interface ResearchToolResult {
plainText: string; // Human-readable summary, required
[key: string]: any; // Optional structured data
}
// interviewChat also returns plainText
{
toolName: 'interviewChat',
output: {
plainText: "Interview summary: User Zhang San mentioned...", // ← Full content here
interviewId: 123 // Optional: DB reference
}
}
Key changes:
-
Removed 5 specialized save tools
- Deleted:
saveInterview,saveDiscussion,saveScoutTask, ... - Reason: Agents output directly to messages, no explicit save needed
- Deleted:
-
Unified tool output format
- All research tools return
plainText - Agents can uniformly process all tool results
- All research tools return
Generate studyLog on demand
// Don't pre-save, generate when needed
if (!analyst.studyLog) {
const messages = await loadMessages(studyUserChatId);
const studyLog = await generateStudyLog(messages); // ← Generate from messages
await prisma.analyst.update({
where: { id },
data: { studyLog }
});
}
Why This Design?
Reasoning from first principles:
-
Conversation as context
- LLMs need complete context to generate reports
- Message history is naturally the most complete, most natural context
- Avoids complexity of "reconstructing context from DB"
-
LLMs excel at extraction
- Generating structured content (studyLog) from conversations is LLM's strength
- More flexible and reliable than hand-written parsing logic
-
Shadow of Event Sourcing
- Message sequence = event log
- studyLog, report = derived state
- Can be replayed and regenerated anytime
Comparison with other approaches:
| Approach | Pros | Cons | Why not chosen |
|---|---|---|---|
| Messages as source | Data consistent, easy to extend | Requires extra LLM call to generate studyLog | ✅ Our choice |
| Traditional state management | Precise control | Complex state sync, hard to trace | Doesn't suit LLM non-determinism |
| Remove DB entirely | Extremely simple | Frontend queries difficult, history hard to manage | Need structured display |
| Event Sourcing | Complete history, replayable | High engineering complexity | Over-engineered for current scale |
Impact
Code simplification:
Deleted files:
- src/ai/tools/saveInterview.ts
- src/ai/tools/saveDiscussion.ts
- src/ai/tools/saveScoutTask.ts
- src/ai/tools/savePersona.ts
- src/ai/tools/saveWebSearch.ts
Simplified files (28):
- Agent configs no longer need save tools
- generateReport doesn't need multi-table JOINs
Development efficiency:
Before:
Adding discussionChat:
1. Create Discussion table
2. Write discussionChat tool
3. Write saveDiscussion tool
4. Add both tools to 3 agents
5. Write discussion query logic
6. Modify generateReport query
Total: 12 files, 2-3 days
After:
Adding discussionChat:
1. Write discussionChat tool (returns plainText)
2. Add tool to agent config
3. generateReport auto-supports (reads from messages)
Total: 3 files, 2-3 hours
Cost trade-offs:
✅ Benefits:
- Simplified architecture: deleted 5 tools, simplified 28 files
- Data consistency: full context traceable even on failures
- Easy extension: adding new research methods goes from 12 steps → 3 steps
❌ Costs:
- studyLog generation requires extra LLM call (~2K tokens, ~$0.002)
- Slightly higher token consumption for long conversations
✅ Mitigation:
- Prompt cache reduces repeated token cost by 90%
- Architectural benefits far outweigh costs
III. Step 2: Intent Clarification + Unified Execution
v2.3.0 - 2026-01-06
Problem 1: Vague Requirements → Inefficient Dialogue
After implementing message-driven architecture, adding features became simpler. But user experience wasn't good enough.
When creating research, users often say:
"Want to understand young people's coffee preferences"
This isn't specific enough:
- Which young people? 18-22 college students? Or 23-28 young professionals?
- What method? In-depth interviews? Group discussions? Or social media observation?
- What output? User personas? Market insights? Or product recommendations?
Traditional approach: AI asks multiple questions
AI: "Which age group do you want to research?"
User: "18-25 I guess"
AI: "What method? Interviews or surveys?"
User: "Interviews"
AI: "How many people?"
User: "Around 10"
Problems:
- Requires 3-5 conversation rounds
- Poor user experience (feels like filling forms)
- AI can't proactively suggest best approaches
Problem 2: 95% Duplicate Code
While adding features became simpler, we discovered a bigger technical debt:
$ wc -l src/app/(study)/agents/*AgentRequest.ts
493 studyAgentRequest.ts
416 fastInsightAgentRequest.ts
302 productRnDAgentRequest.ts
Three nearly identical agent wrappers, totaling 1,211 lines.
Code duplication mainly in:
- Message loading and processing (~80 lines each)
- File attachment handling (~60 lines each)
- MCP integration (~40 lines each)
- Token tracking (~50 lines each)
- Notification sending (~30 lines each)
Every new feature (like webhook integration) required changing all three places.
Solution: Plan Mode + baseAgentRequest
Our solution has two parts:
Part 1: Plan Mode (Intent Clarification Layer)
A separate agent dedicated to intent clarification:
// src/app/(study)/agents/configs/planModeAgentConfig.ts
export async function createPlanModeAgentConfig() {
return {
model: "claude-sonnet-4-5",
systemPrompt: planModeSystem({ locale }),
tools: {
requestInteraction, // Interact with user
makeStudyPlan, // Display complete plan, one-click confirm
},
maxSteps: 5, // Max 5 steps to complete clarification
};
}
Workflow:
sequenceDiagram
participant User
participant PlanMode as Plan Mode Agent
participant StudyAgent as Study Agent
User->>PlanMode: "Want to understand young people's coffee preferences"
PlanMode->>PlanMode: Analyze requirements
Note over PlanMode: - Target: 18-25 years old<br/>- Research type: qualitative insights<br/>- Best method: group discussion
PlanMode->>User: Display complete plan
Note over PlanMode,User: 【Research Plan】<br/>Goal: Understand 18-25 coffee preferences<br/>Method: Group discussion (5-8 people)<br/>Duration: ~40 minutes<br/>Output: Consumer insights report<br/><br/>[Confirm Start] [Modify Plan]
User->>PlanMode: [Confirm Start]
PlanMode->>StudyAgent: Intent recorded in messages
Note over StudyAgent: Read intent from conversation history<br/>Execute research plan
StudyAgent->>User: Start research execution
Key design:
- Plan Mode's decisions are recorded in messages
- Study Agent infers intent from messages, no explicit passing needed
- Avoids complexity of context passing
Part 2: baseAgentRequest (Unified Executor)
Merge three duplicate agent wrappers into one generic executor:
// src/app/(study)/agents/baseAgentRequest.ts (577 lines)
interface AgentRequestConfig<TOOLS extends ToolSet> {
model: LLMModelName;
systemPrompt: string;
tools: TOOLS;
maxSteps?: number;
specialHandlers?: {
// Dynamically control which tools are available
customPrepareStep?: (options) => {
messages,
activeTools?: (keyof TOOLS)[]
};
// Custom post-processing logic
customOnStepFinish?: (step, context) => Promise<void>;
};
}
async function executeBaseAgentRequest<TOOLS>(
baseContext: BaseAgentContext,
config: AgentRequestConfig<TOOLS>,
streamWriter: UIMessageStreamWriter
) {
// Phase 1: Initialization
// Phase 2: Prepare Messages
// Phase 3: Universal Attachment Processing
// Phase 4: Universal MCP and Team System Prompt
// Phase 5: Load Memory and Inject into Context
// Phase 6: Main Streaming Loop
// Phase 7: Universal Notifications
}
Agent routing:
// src/app/(study)/api/chat/route.ts
if (!analyst.kind) {
// Plan Mode - intent clarification
const config = await createPlanModeAgentConfig(agentContext);
await executeBaseAgentRequest(agentContext, config, streamWriter);
} else if (analyst.kind === AnalystKind.productRnD) {
// Product R&D Agent
const config = await createProductRnDAgentConfig(agentContext);
await executeBaseAgentRequest(agentContext, config, streamWriter);
} else {
// Study Agent (comprehensive research, fast insights, testing, creative, etc.)
const config = await createStudyAgentConfig(agentContext);
await executeBaseAgentRequest(agentContext, config, streamWriter);
}
Each agent only needs to define configuration:
// src/app/(study)/agents/configs/studyAgentConfig.ts
export async function createStudyAgentConfig(params) {
return {
model: "claude-sonnet-4",
systemPrompt: studySystem({ locale }),
tools: buildStudyTools(params), // ← Tools this agent needs
specialHandlers: {
// Custom tool control
customPrepareStep: async ({ messages }) => {
const toolUseCount = calculateToolUsage(messages);
let activeTools = undefined;
// After report generation, restrict available tools
if ((toolUseCount[ToolName.generateReport] ?? 0) > 0) {
activeTools = [
ToolName.generateReport,
ToolName.reasoningThinking,
ToolName.toolCallError,
];
}
return { messages, activeTools };
},
// Custom post-processing
customOnStepFinish: async (step) => {
// After saving research intent, auto-generate title
const saveAnalystTool = findTool(step, ToolName.saveAnalyst);
if (saveAnalystTool) {
await generateChatTitle(studyUserChatId);
}
},
},
};
}
Why This Design?
Reasoning-execution separation rationale:
-
Matches cognitive model
- Human decision-making: first figure out "what to do", then consider "how to do it"
- System 1 (intuition) vs System 2 (reasoning)
- Plan Mode = System 2, Study Agent = System 1
-
Single responsibility
- Plan Mode: focuses on intent understanding, doesn't need to know execution details
- Study Agent: focuses on research execution, doesn't need to handle clarification
- Each is simpler and easier to maintain
-
Messages as protocol
- Plan Mode's decisions → messages
- Study Agent reads intent from messages
- Loosely coupled without losing context
Unified executor rationale:
-
Extract, Don't Rebuild
- Extract common patterns from three similar implementations
- Not designing abstraction layer from scratch
-
Configuration over Inheritance
- Agent differences expressed through configuration
- No inheritance or polymorphism
-
Plugin-based Lifecycle
-
customPrepareStep: dynamic tool control -
customOnStepFinish: custom post-processing - Preserve extension points, don't hard-code all logic
-
Comparison with other approaches:
| Approach | Pros | Cons | Why not chosen |
|---|---|---|---|
| Plan Mode + baseAgentRequest | Remove duplicate code, separate reasoning-execution | One more abstraction layer | ✅ Our choice |
| Continue copy-pasting | Simple and direct | Tech debt accumulates, hard to maintain | Unsustainable long-term |
| Fully generic agent | Least code | Sacrifices specialization and control | Can't handle business differences |
| Microservices split | Independent deployment | Over-engineered, adds ops complexity | Unnecessary at current scale |
Impact
Code complexity:
Deleted:
- studyAgentRequest.ts (493 lines)
- fastInsightAgentRequest.ts (416 lines)
- productRnDAgentRequest.ts (302 lines)
Total: -1,211 lines
Added:
+ baseAgentRequest.ts (577 lines)
+ planModeAgentConfig.ts (120 lines)
+ studyAgentConfig.ts (180 lines)
+ productRnDAgentConfig.ts (80 lines)
Total: +957 lines
Net reduction: -254 lines
But more importantly:
- Cyclomatic Complexity: 12.3 → 6.7 (45% reduction)
- Code duplication: 95% → 0%
Development efficiency:
Before:
Adding MCP integration:
1. Modify studyAgentRequest.ts
2. Modify fastInsightAgentRequest.ts
3. Modify productRnDAgentRequest.ts
4. Test three agents
Time: 2-3 days
After:
Adding MCP integration:
1. Modify baseAgentRequest.ts
2. All agents automatically gain new capability
Time: 2-3 hours
User experience:
Before:
User: "Want to understand young people's coffee preferences"
AI: "Which age group do you want to research?"
User: "18-25"
AI: "What method do you want to use?"
User: "Interviews I guess"
AI: "How many people?"
... (3-5 conversation rounds)
After:
User: "Want to understand young people's coffee preferences"
AI displays complete plan:
┌─────────────────────────────────────┐
│ 【Research Plan】 │
│ Goal: Understand 18-25 coffee prefs │
│ Method: Group discussion (5-8 ppl) │
│ Duration: ~40 minutes │
│ Output: Consumer insights report │
│ │
│ [Confirm Start] [Modify Plan] │
└─────────────────────────────────────┘
Intent clarification: 3-5 conversation rounds → 1 confirmation
III. Step 3: Persistent Memory
v2.3.0 - 2026-01-08
Problem: AI "Amnesia"
With intent clarification and unified architecture, the research workflow was smooth. But long-term users reported a problem:
"Why does the AI ask me what industry I'm in every single time?"
The AI doesn't remember users. Every conversation feels like the first meeting:
- "What industry are you in?"
- "Which dimensions do you care about?"
- "What's your research goal?"
Users feel the AI is "forgetful", the experience lacks personalization.
Root cause:
LLMs are stateless. Each conversation:
const result = await streamText({
messages: currentConversation, // ← Only current conversation
// No context from historical conversations
});
Although we have historical conversations in the DB:
- Cross-conversation info lost: Each research is an independent session
- Important info buried: Key information in long conversations is hard to extract
- No persistent memory: No long-term memory of "who the user is"
Solution: Two-Tier Memory Architecture
We need a persistent memory system. But how to design it?
Inspired by Anthropic's CLAUDE.md approach:
- Simple Markdown files
- User-viewable and editable
- Fully loaded into context
We adopted a similar approach but added automatic update mechanisms.
Data Model
model Memory {
id Int @id @default(autoincrement())
userId Int? // User-level memory
teamId Int? // Team-level memory
version Int // Version management
// Two-tier architecture
core String @default("") @db.Text // Core memory (Markdown)
working Json @default("[]") // Working memory (JSON, to be consolidated)
changeNotes String @db.Text // Update notes
@@unique([userId, version])
@@index([userId, version(sort: Desc)])
}
Two-tier architecture:
-
Core Memory (core)
- Markdown format, human-readable
- Long-term stable user information
- Example:
# User Information - Industry: Consumer goods product manager - Focus: Young consumer preferences, emerging trends # Research Style - Prefers qualitative research (interviews, discussions) - Values authentic user voices over statistics -
Working Memory (working)
- JSON format, structured
- New information to be consolidated
- Example:
[ { "info": "User recently focused on coffee market", "source": "chat_123" }, { "info": "Prefers group discussion method", "source": "chat_124" } ]
Automatic Update Mechanism
Two-stage update:
// src/app/(memory)/actions.ts
async function updateMemory({ userId, conversationContext }) {
let memory = await loadLatestMemory(userId);
// Step 1: Reorganize when threshold exceeded (Claude Sonnet 4.5)
if (memory.core.length > 8000 || memory.working.length > 20) {
memory = await reorganizeMemory(memory, conversationContext);
}
// Step 2: Extract new information (Claude Haiku 4.5)
const newInfo = await extractMemoryUpdate(memory.core, conversationContext);
if (newInfo) {
// Step 3: Insert new information at specified location
await insertMemoryInfo(memory, newInfo);
}
}
Memory Update Agent (Haiku 4.5):
- Extract new user information from conversations
- Low cost (~$0.001/time)
- Runs in background after each conversation
Memory Reorganize Agent (Sonnet 4.5):
- Consolidate working memory into core memory
- Remove redundancy, merge similar information
- Slightly higher cost (~$0.02/time), but infrequently triggered
Integration into Conversation Flow
// src/app/(study)/agents/baseAgentRequest.ts
// Phase 5: Load Memory
const memory = await loadUserMemory(userId);
if (memory?.core) {
// Inject at conversation start
modelMessages = [
{
role: 'user',
content: `<UserMemory>\n${memory.core}\n</UserMemory>`
},
...modelMessages
];
}
// Phase 6: Streaming
const result = await streamText({
messages: modelMessages, // ← Includes user memory
// ...
});
// Phase 7: Non-blocking memory update
waitUntil(
updateMemory({ userId, conversationContext: messages })
);
Why This Design?
Why Markdown over Vector DB?
-
Context window is large enough
- Claude 3.5 Sonnet: 200K tokens
- User memory typically < 10K characters (~3K tokens)
- Full loading is simpler and more accurate than retrieval
-
Simple and transparent
- Markdown is user-readable and editable
- No embeddings, no vector search, no complex indexing
- Aligns with Anthropic's philosophy: user control
-
Avoid premature optimization
- Don't need real-time retrieval (low conversation frequency)
- Don't need precise matching (full text provides enough context)
- Start with simple solution, optimize when necessary
Comparison with mainstream approaches:
| Approach | Storage | Control | Retrieval | atypica choice rationale |
|---|---|---|---|---|
| Anthropic (CLAUDE.md) | File-based | User-driven | Full loading | ✅ Simple, transparent, effective with large context |
| OpenAI | Vector DB (speculated) | AI + user confirmation | Semantic retrieval | ❌ Black box, weak user control |
| Mem0 | Vector + Graph + KV | AI-driven | Hybrid retrieval | ❌ Over-engineered, high maintenance cost |
| MemGPT | OS-inspired tiered | AI self-managed | Tiered retrieval | ❌ Conceptually complex, utility unproven |
We chose Anthropic's simple approach because:
- Fits current scale (personal assistant, not enterprise knowledge base)
- User controllable (transparent, editable)
- As context windows grow, this approach becomes better
Impact
User experience:
Before:
First conversation:
User: "Want to do coffee research"
AI: "What industry are you in?"
User: "Consumer goods"
AI: "What dimensions do you care about?"
...
Second conversation (a week later):
User: "Want to do tea beverage research"
AI: "What industry are you in?" # ← Asks again
After:
First conversation:
User: "Want to do coffee research"
AI: "What industry are you in?"
User: "Consumer goods product manager"
# AI remembers
Second conversation (a week later):
User: "Want to do tea beverage research"
AI: "Based on your background as a consumer goods PM, I suggest..." # ← Remembers!
System cost:
Memory Update (per conversation):
- Model: Claude Haiku 4.5
- Tokens: ~5K
- Cost: ~$0.001
Memory Reorganize (every 20 conversations):
- Model: Claude Sonnet 4.5
- Tokens: ~15K
- Cost: ~$0.02
Average cost: ~$0.002/conversation
Response time:
Memory loading: +50ms (non-blocking)
Memory update: background, doesn't affect response
Low cost, fast response, completely acceptable.
IV. Architecture Comparison: Our Unique Choices
Now let's step back and see how atypica's architecture differs from mainstream AI Agent frameworks.
State Management: Messages vs Memory Classes
| atypica | LangChain | Core Difference |
|---|---|---|
| Messages as source | ConversationBufferMemory | We believe conversation history is the best state |
| Generate studyLog on demand | Pre-compute summary | Avoid sync issues, traceable on failures |
| DB stores derived state | DB stores core state | Similar to Event Sourcing |
Why different?
LangChain's design is influenced by traditional software, believing "state should be explicitly stored and managed."
We believe, for LLMs:
- Conversation history = complete state
- Derived state (studyLog) can be regenerated
- Simpler, more fault-tolerant
Agent Architecture: Configuration vs Graph
| atypica | LangGraph | Core Difference |
|---|---|---|
| Configuration-driven | Graph-driven | We use configuration to express differences, code for commonalities |
| Single executor | Node orchestration | Avoid over-abstraction, good enough is enough |
| Messages as protocol | Explicit node communication | Loosely coupled without losing context |
Why different?
LangGraph pursues generality, using graph orchestration to express arbitrarily complex flows.
We believe, for our scenarios:
- Configuration-driven is simpler: 99% of needs can be met with configuration
- Single executor is sufficient: Don't need graph orchestration's flexibility
- Simpler is more reliable: Fewer abstraction layers, easier to debug
Memory System: Markdown vs Vector DB
| atypica | Mem0 | Core Difference |
|---|---|---|
| Markdown files | Vector + Graph + KV | We choose simple and transparent over precise and complex |
| Full loading | Semantic retrieval | When context window is large enough, full text is better |
| User-editable | AI black box | User trust comes from transparency |
Why different?
Mem0 pursues precise retrieval, using multiple databases in hybrid.
We believe, for personal assistants:
- Simple solution is enough: User memory typically < 10K
- Transparent beats precise: Users can view and edit memory
- Gets better as context grows: At 1M tokens in the future, this approach will crush Vector DB
Core Philosophy Differences
atypica's choices:
- Simple, transparent, controllable
- Adapt to LLM characteristics (large context, non-determinism)
- Start from real pain points, not pursuing architectural perfection
Mainstream frameworks' choices:
- Precise, complex, automatic
- Port traditional software engineering patterns
- Pursue generality and flexibility
Who's right or wrong?
Neither is wrong. It's just:
- Our scenario (personal research assistant) suits simple approaches better
- As context windows grow, simple approaches become better
- User trust comes from transparency, not AI magic
V. Quantitative Impact
Specific impact from three evolutions:
Code Complexity
Duplicate code:
Before: 1,211 lines (three agent wrappers)
After: 0 lines
Reduction: 100%
Total lines of code:
Before: 1,211 lines (duplicates) + others
After: 577 lines (base) + 380 lines (configs) = 957 lines
Net reduction: 254 lines (21%)
Cyclomatic Complexity (code complexity metric):
Before: avg 12.3
After: avg 6.7
Reduction: 45%
Development Efficiency
| Task | Before | After | Improvement |
|---|---|---|---|
| Add new research method | 12 files, 2-3 days | 3 files, 2-3 hours | 10x |
| Add new capability (MCP) | Modify 3 places, 1 day | Modify 1 place, 2 hours | 4x |
| Fix bug | Change 3 agents | Change 1 base | 3x |
System Performance
Token consumption (with prompt cache):
- studyLog generation: ~2K tokens (~$0.002)
- Memory update: ~5K tokens (~$0.005)
- Average per conversation: +$0.007
Response time:
- Memory loading: +50ms (non-blocking)
- Plan Mode: +2s (one-time)
- studyLog generation: background, doesn't affect response
Cost and performance impact negligible.
User Experience
Intent clarification:
Before: average 3.2 conversation rounds
After: 1 plan display + 1 confirmation
Improvement: 3x efficiency
AI "memory":
Before: repetitive questions every conversation
After: auto-load user preferences
Improvement: personalized experience
Research startup time:
Before: ~5 minutes (multiple rounds of clarification)
After: ~1 minute (one-click confirm)
Improvement: 5x efficiency
VI. Lessons Learned
What did we learn from three evolutions?
What We Did Right
1. Incremental refactoring, not big bang
We didn't rewrite the entire system at once. Three evolutions, each step:
- Delivers value independently
- Maintains backward compatibility (keeping
analyst.studySummaryfield) - Can be rolled back
This let us quickly validate ideas and reduce risk.
2. Start from real pain points
Don't pursue architectural perfection, instead:
- Message-driven: because adding
discussionChatwas too complex - Unified execution: because duplicate code was too much
- Persistent memory: because users reported AI forgetfulness
Let problems drive design, not design drive problems.
3. Embrace LLM characteristics
Don't treat LLMs as traditional software:
- Don't hand-write state machines, let AI infer state from conversations
- Leverage large context windows, rather than pursuing precise retrieval
- Let AI generate studyLog, rather than hand-writing parsers
Adapt to LLM's capability boundaries, rather than fighting them.
Costs We Paid
1. Learning curve for abstraction layer
baseAgentRequest requires understanding to modify:
- 6 phases of execution flow
- Timing of
customPrepareStepandcustomOnStepFinish - Generic constraints and type inference
But: clear interfaces and documentation lowered the barrier.
2. Cost of on-demand generation
studyLog generation requires LLM call (~$0.002/time).
But:
- Prompt cache reduces cost by 90%
- Architectural benefits >> small cost
- Acceptable
3. Limitations of simple solutions
Markdown memory isn't suitable for:
- Large-scale knowledge bases (> 100K tokens)
- Complex relational queries
- Multi-dimensional retrieval
But:
- Good enough for personal assistant scenarios
- Can upgrade to Vector DB in the future
- Solve 80% of problems first
Unexpected Benefits
1. Confidence from type safety
// Fully type-safe tool handling
const tool = step.toolResults.find(
t => !t.dynamic && t.toolName === ToolName.generateReport
) as StaticToolResult<Pick<StudyToolSet, ToolName.generateReport>>;
if (tool?.output) {
const token = tool.output.reportToken; // ← TypeScript knows this field exists
}
During refactoring, the compiler catches 99% of issues.
2. Flexibility of configuration-driven
Adding webhook integration only requires:
// baseAgentRequest.ts
if (webhookUrl) {
await sendWebhook(webhookUrl, step);
}
All agents automatically gain new capability, no config changes needed.
3. Power of messages as protocol
Plan Mode and Study Agent communicate through messages:
- Decoupled: can be modified independently
- Without losing context: complete decision process in messages
- Traceable: can replay when problems occur
This was an unexpected benefit.
VII. Future Directions
Three evolutions brought atypica closer to general-purpose agents. But there's more to do.
Short-term (3-6 months)
1. Skills Library
- Further modularize tools
- Users can compose their own agents
- Like GPTs, but more flexible
2. Multi-Agent Collaboration
- Not just serial execution
- Parallel research, cross-validation
- Like AutoGPT, but more controllable
Long-term (1-2 years)
3. Evolve toward GEA
- GEA = General Execution Architecture
- Not just research agents, but a universal AI Agent execution framework
- Can run any type of agent
4. Self-Improving Agents
- Agents learn from past executions
- Continuously optimize prompts and strategies
- Get smarter with use
Unchanging Principles
No matter how we evolve, we stick to:
- Simple beats complex
- Transparent beats black box
- User control beats AI automation
VIII. Conclusion
Building AI Agent systems is not a simple extension of traditional software engineering.
We need to rethink:
- What is state? (Conversation history)
- What is an interface? (Message protocol)
- What is control flow? (AI reasoning)
atypica's three evolutions are essentially three cognitive upgrades:
-
From database thinking → data flow thinking
- Don't maintain explicit state, infer state from messages
-
From code reuse → configuration-driven
- Don't pursue perfect abstraction, use configuration to express differences
-
From stateless → memory-enhanced
- Don't rely on precise retrieval, use simple and transparent methods
These choices may not be the most "advanced."
But they are:
- Simple: easy to understand, easy to debug
- Transparent: users know what AI is doing
- Controllable: users can intervene and adjust
- Good enough: solve 80% of problems
And this, perhaps, is the key to building reliable AI systems.
Top comments (0)