How Pause's Guardian learns from every interaction — and proves it.
The Problem
"Self-improving AI" is one of the most overclaimed phrases in tech demos. Most projects mean one of two things: batch fine-tuning between releases, or A/B testing between user cohorts. Neither is real-time, per-user, per-interaction learning.
Pause needed something different. The Guardian AI intercepts impulse purchases with personalized strategies — but a strategy that works for late-night electronics doesn't work for morning coffee runs. A generic "Do you really need this?" prompt gets overridden immediately. The AI had to learn what works for each user and adapt on every interaction.
We integrated the ACE (Agentic Context Engine) framework to build a Skillbook that grows from 4 seed strategies into a personalized knowledge base — with every learning step traceable in Opik.
Architecture Overview
Link: mermaid.live
Two feedback loops, one Skillbook. Immediate outcomes (accepted/overridden) and retrospective satisfaction ("Was it worth it?") both feed into the same ACE pipeline, giving the Guardian two temporal perspectives on every decision.
The Skillbook: A Living Knowledge Base
The Skillbook is a persistent JSON registry of learned strategies. Each skill has an ID, content description, and effectiveness scores:
interface Skill {
id: string;
section: string;
content: string;
helpful: number; // Times this skill led to acceptance
harmful: number; // Times this skill led to override
neutral: number; // Times outcome was ambiguous
status: "active" | "invalid";
}
Storage: The Skillbook is stored as a JSONB column in Neon PostgreSQL, serialized via Skillbook.toDict() and loaded via Skillbook.fromDict(). No normalized skills table — the entire Skillbook is a single atomic document, simplifying concurrent updates. Two deserialization methods are used: Skillbook.loads() parses a JSON string (used when converting JSONB to string first), while Skillbook.fromDict() accepts a JS object directly (used when Drizzle returns JSONB as a parsed object).
Injection: Before every Guardian interaction, the Skillbook is loaded and formatted for the LLM:
// apps/web/src/lib/server/ace.ts
export async function loadUserSkillbook(userId: string): Promise<string> {
const result = await withTimeout(
db.select({ skills: skillbookTable.skills })
.from(skillbookTable)
.where(eq(skillbookTable.userId, userId))
.limit(1),
DB_TIMEOUT_MS
);
let instance: Skillbook;
if (result[0]) {
instance = Skillbook.loads(JSON.stringify(result[0].skills));
} else {
instance = new Skillbook();
}
const context = wrapSkillbookContext(instance);
// Guard against prompt bloat
if (context.length > MAX_CONTEXT_CHARS) {
return `${context.substring(0, MAX_CONTEXT_CHARS)}\n\n[Skillbook truncated]`;
}
return context;
}
The wrapSkillbookContext() function formats skills into a structured text block that the LLM can reference when selecting strategies. Skills with high helpful scores rise to the top. Skills marked harmful are deprioritized.
The Adapter Layer: All ACE Flows Through One Module
Every ACE import in the application flows through a single adapter:
apps/web/src/lib/server/ace.ts
This module re-exports Skillbook, Reflector, SkillManager, VercelAIClient, and wrapSkillbookContext from the vendored @pause/ace package. It also provides two loading functions:
| Function | Returns | Used By |
|---|---|---|
loadUserSkillbook(userId) |
Formatted context string | Guardian route (prompt injection) |
loadUserSkillbookInstance(userId) |
Raw Skillbook instance + version |
Learning pipeline (mutation) |
Why a single adapter? The ACE package is a vendored fork with Turbopack-specific modifications. Centralizing imports means upstream changes (import paths, API signatures) only need fixing in one file.
The Learning Pipeline: Three Stages
When a user makes a decision (accept, override, wait), the learning pipeline runs asynchronously via Next.js after():
Stage 1: Reflection
The Reflector analyzes why the interaction succeeded or failed:
export async function runReflection(params: {
interactionId: string;
userId: string;
question: string;
generatorAnswer: string;
outcome: string;
}): Promise<LearningPipelineResult | null> {
const { skillbook, version } = await loadUserSkillbookInstance(userId);
const reflector = getReflector();
const reflectionOutput = await Promise.race([
reflector.reflect({
question: params.question,
generatorAnswer: params.generatorAnswer,
feedback: feedbackSignalMap[params.outcome],
skillbook,
}),
new Promise<never>((_, reject) => {
setTimeout(() => reject(new Error("Reflection timed out")), 10_000);
}),
]);
return { reflectionOutput, interactionId, userId, skillbook, version };
}
The Reflector produces a ReflectorOutput:
interface ReflectorOutput {
analysis: string; // What happened and why
helpful_skill_ids: string[]; // Skills that contributed to success
harmful_skill_ids: string[]; // Skills that contributed to failure
new_learnings: Array<{
section: string;
content: string;
atomicity_score: number;
}>;
}
Outcome mapping converts database enum values to natural-language feedback for the Reflector:
const feedbackSignalMap = {
accepted: "correct — user accepted the Guardian's suggestion",
overridden:"incorrect — user overrode the Guardian's suggestion",
wait: "correct — user chose to wait as suggested",
abandoned: "neutral — user abandoned without deciding",
// ...
};
Stage 2: Skill Curation
The SkillManager converts reflections into concrete Skillbook operations:
export async function runSkillUpdate(
result: LearningPipelineResult
): Promise<UpdateBatch | null> {
const skillManager = getSkillManager();
const updateBatch = await skillManager.curate({
reflectionAnalysis: result.reflectionOutput.analysis,
skillbook: result.skillbook,
});
// Apply to in-memory Skillbook
result.skillbook.applyUpdate(updateBatch);
// Persist with optimistic locking
await persistSkillbookUpdate(result.userId, result.skillbook, result.skillbookVersion);
return updateBatch;
}
An UpdateBatch contains typed operations:
interface UpdateBatch {
reasoning: string; // Why these changes were made
operations: Array<{
type: "ADD" | "UPDATE" | "TAG" | "REMOVE";
section: string;
content?: string;
skill_id?: string;
}>;
}
Stage 3: Persistence with Optimistic Locking
The Skillbook is persisted using optimistic locking to handle concurrent interactions:
async function persistSkillbookUpdate(
userId: string,
updatedSkillbook: Skillbook,
expectedVersion: number
): Promise<boolean> {
const result = await db
.update(skillbookTable)
.set({
skills: updatedSkillbook.toDict(),
version: sql`${skillbookTable.version} + 1`,
})
.where(
and(
eq(skillbookTable.userId, userId),
eq(skillbookTable.version, expectedVersion)
)
);
return result.rowCount > 0;
}
If a version conflict occurs (another interaction updated the Skillbook while this one was reflecting), the pipeline reloads the fresh Skillbook, re-applies the same UpdateBatch, and retries — up to 3 attempts.
The Dual Feedback Loop
Immediate Feedback (Seconds)
User overrides → feedbackSignalMap["overridden"] → Reflector →
"Strategy 'impulse-check-generic' was too generic, user felt lectured" →
SkillManager → TAG harmful, ADD "time-cost-framing" strategy
Retrospective Feedback (Hours/Days)
Ghost Cards resurface past purchases for reflection:
"Still happy with those headphones?" → "Worth it" →
satisfactionToFeedbackSignal("worth_it", "accepted") →
"positive_reinforcement — user confirms good decision" →
Reflector → TAG helpful on the strategy that was used
Why both loops matter: Immediate feedback captures decision quality — did the user listen? Retrospective feedback captures outcome quality — was the decision actually good? A user might override the Guardian (immediate negative signal) but later regret the purchase (retrospective negative signal), confirming the Guardian was right. Or they might accept (immediate positive) but regret (retrospective negative), revealing the strategy was persuasive but unhelpful.
The Counterfactual Proof
Rising acceptance rates alone could be data ordering artifacts. The real proof of learning is the counterfactual:
Interaction A (Day 1):
Trigger: Late-night electronics, $80
Strategy: "impulse-check-generic" → "Do you really need this?"
Result: Override (user annoyed)
Learning cycle runs...
Interaction B (Day 2):
Trigger: Late-night electronics, $60 (same pattern)
Strategy: "time-cost-framing" → "That's 4 hours of your hourly rate"
Result: Wait 24h (user paused)
Same trigger type. Different strategy. Different outcome. This is visible in Opik traces — filter by guardian:therapist:override to see Interaction A, then guardian:therapist:wait to see Interaction B. The learning:skillbook_update trace between them shows the pivot.
Vendored ACE Package
The ACE framework is vendored from kayba-ai/ace-ts@f790a4a at packages/ace/ with three modifications for Pause's build toolchain:
| Change | Why |
|---|---|
Stripped .js extensions from all imports |
Turbopack can't resolve .js → .ts in workspace packages |
Stubbed sentence-transformers dynamic import |
Python-only dependency; Turbopack statically resolves import() even behind guards |
Migrated to Zod v4 (z.record(z.string(), z.any())) |
Zod v4 requires explicit key type in z.record()
|
The vendored package has full Biome linting (with targeted overrides for upstream patterns) and TypeScript compliance with verbatimModuleSyntax: true.
ACE + Opik: The Observable Learning Loop
Every ACE operation generates an Opik trace (see the Opik Deep Dive):
| ACE Stage | Opik Trace | What's Captured |
|---|---|---|
| Reflection | learning:reflection |
Analysis text, helpful/harmful skill IDs, new learnings count |
| Skill curation | learning:skillbook_update |
Operation count, before/after skill count, delta, reasoning, full operations list |
| Satisfaction feedback | learning:satisfaction_feedback |
Original outcome, mapped signal, reflection analysis |
The Skillbook snapshot is included in update traces, so you can reconstruct the Skillbook's state at any point in time by walking the trace history.
The ACE package also has its own Opik integration layer (packages/ace/src/observability/) with:
-
OpikIntegrationclass for framework-level tracing -
maybeTrack()decorator for conditional method tracing - Graceful degradation when Opik is unavailable
What We Learned
1. Optimistic locking is essential for Skillbook persistence
Multiple interactions can trigger learning simultaneously. Without version-based locking, later writes silently overwrite earlier learnings. The retry-with-reload pattern ensures no learning is lost.
2. Two temporal feedback loops are more powerful than one
Immediate feedback tells you what the user decided. Retrospective feedback tells you if it was the right decision. The Ghost Card loop catches cases where the Guardian's strategy was persuasive but ultimately unhelpful — a signal you'd never get from click-through rates alone.
3. The adapter layer saved us repeatedly
When the upstream ACE API changed (Zod v4, import paths), fixes were isolated to one file. When Turbopack broke on .js extensions, the fix was in the vendored package — the adapter layer shielded the application code entirely.
4. Seed scripts prove learning more convincingly than live demos
The demo-rookie → demo-pro seed script transition creates a controlled before/after that demonstrates learning in seconds. Without seeds, you'd need 10+ live interactions — impossible in a 5-minute demo. The seed scripts use Skillbook class methods (Skillbook.dumps(), Skillbook.fromDict()) for type-safe serialization.
5. Strategy selection is LLM-driven, not hardcoded
The system prompt includes the Skillbook context with effectiveness scores. The LLM chooses which strategy to use based on skill ratings — we don't hardcode "if score > X, use strategy Y." This means the learning is genuinely emergent, not just a threshold lookup.
The Learning Arc in Numbers
| State | Skills | Dominant Strategy | Acceptance Rate |
|---|---|---|---|
| Rookie (seed) | 4 generic | impulse-check-generic |
~30% |
| After 5 interactions | 6-7 mixed | Transitioning | ~50% |
| Pro (seed) | 8-10 learned | time-cost-framing |
~70%+ |
The Skillbook grows from generic seeds into personalized strategies through real LLM-powered reflection — not hardcoded progression.
File Reference
| File | Role |
|---|---|
packages/ace/ |
Vendored ACE framework (Skillbook, Reflector, SkillManager, Agent) |
packages/ace/src/observability/ |
ACE's own Opik integration layer |
apps/web/src/lib/server/ace.ts |
Adapter layer — all ACE imports flow through here |
apps/web/src/lib/server/learning.ts |
Learning pipeline: reflection, skill curation, persistence |
apps/web/src/lib/server/ghost-cards.ts |
Ghost Card satisfaction feedback signals |
apps/web/src/app/api/ai/feedback/route.ts |
Feedback collection + async learning trigger |
apps/web/src/app/api/ai/ghost-cards/[id]/route.ts |
Retrospective satisfaction feedback |
packages/db/src/schema/ |
Skillbook table with JSONB column + version for optimistic locking |
Built with Next.js 16 + ACE Framework + Opik + Vercel AI SDK v6 + Neon PostgreSQL

Top comments (0)