Armel BOBDA

Posted on Feb 20 • Edited on Feb 21

Deep Dive: Self-Improving AI with ACE Skillbooks

#ace #ai #typescript #nextjs

How Pause's Guardian learns from every interaction — and proves it.

The Problem

"Self-improving AI" is one of the most overclaimed phrases in tech demos. Most projects mean one of two things: batch fine-tuning between releases, or A/B testing between user cohorts. Neither is real-time, per-user, per-interaction learning.

Pause needed something different. The Guardian AI intercepts impulse purchases with personalized strategies — but a strategy that works for late-night electronics doesn't work for morning coffee runs. A generic "Do you really need this?" prompt gets overridden immediately. The AI had to learn what works for each user and adapt on every interaction.

We integrated the ACE (Agentic Context Engine) framework to build a Skillbook that grows from 4 seed strategies into a personalized knowledge base — with every learning step traceable in Opik.

Architecture Overview

Link: mermaid.live

Two feedback loops, one Skillbook. Immediate outcomes (accepted/overridden) and retrospective satisfaction ("Was it worth it?") both feed into the same ACE pipeline, giving the Guardian two temporal perspectives on every decision.

The Skillbook: A Living Knowledge Base

The Skillbook is a persistent JSON registry of learned strategies. Each skill has an ID, content description, and effectiveness scores:

interface Skill {
  id: string;
  section: string;
  content: string;
  helpful: number;   // Times this skill led to acceptance
  harmful: number;   // Times this skill led to override
  neutral: number;   // Times outcome was ambiguous
  status: "active" | "invalid";
}

Storage: The Skillbook is stored as a JSONB column in Neon PostgreSQL, serialized via Skillbook.toDict() and loaded via Skillbook.fromDict(). No normalized skills table — the entire Skillbook is a single atomic document, simplifying concurrent updates. Two deserialization methods are used: Skillbook.loads() parses a JSON string (used when converting JSONB to string first), while Skillbook.fromDict() accepts a JS object directly (used when Drizzle returns JSONB as a parsed object).

Injection: Before every Guardian interaction, the Skillbook is loaded and formatted for the LLM:

// apps/web/src/lib/server/ace.ts
export async function loadUserSkillbook(userId: string): Promise<string> {
  const result = await withTimeout(
    db.select({ skills: skillbookTable.skills })
      .from(skillbookTable)
      .where(eq(skillbookTable.userId, userId))
      .limit(1),
    DB_TIMEOUT_MS
  );

  let instance: Skillbook;
  if (result[0]) {
    instance = Skillbook.loads(JSON.stringify(result[0].skills));
  } else {
    instance = new Skillbook();
  }

  const context = wrapSkillbookContext(instance);

  // Guard against prompt bloat
  if (context.length > MAX_CONTEXT_CHARS) {
    return `${context.substring(0, MAX_CONTEXT_CHARS)}\n\n[Skillbook truncated]`;
  }
  return context;
}

The wrapSkillbookContext() function formats skills into a structured text block that the LLM can reference when selecting strategies. Skills with high helpful scores rise to the top. Skills marked harmful are deprioritized.

The Adapter Layer: All ACE Flows Through One Module

Every ACE import in the application flows through a single adapter:

apps/web/src/lib/server/ace.ts

This module re-exports Skillbook, Reflector, SkillManager, VercelAIClient, and wrapSkillbookContext from the vendored @pause/ace package. It also provides two loading functions:

Function	Returns	Used By
`loadUserSkillbook(userId)`	Formatted context string	Guardian route (prompt injection)
`loadUserSkillbookInstance(userId)`	Raw `Skillbook` instance + version	Learning pipeline (mutation)

Why a single adapter? The ACE package is a vendored fork with Turbopack-specific modifications. Centralizing imports means upstream changes (import paths, API signatures) only need fixing in one file.

The Learning Pipeline: Three Stages

When a user makes a decision (accept, override, wait), the learning pipeline runs asynchronously via Next.js after():

Stage 1: Reflection

The Reflector analyzes why the interaction succeeded or failed:

export async function runReflection(params: {
  interactionId: string;
  userId: string;
  question: string;
  generatorAnswer: string;
  outcome: string;
}): Promise<LearningPipelineResult | null> {
  const { skillbook, version } = await loadUserSkillbookInstance(userId);
  const reflector = getReflector();

  const reflectionOutput = await Promise.race([
    reflector.reflect({
      question: params.question,
      generatorAnswer: params.generatorAnswer,
      feedback: feedbackSignalMap[params.outcome],
      skillbook,
    }),
    new Promise<never>((_, reject) => {
      setTimeout(() => reject(new Error("Reflection timed out")), 10_000);
    }),
  ]);

  return { reflectionOutput, interactionId, userId, skillbook, version };
}

The Reflector produces a ReflectorOutput:

interface ReflectorOutput {
  analysis: string;                    // What happened and why
  helpful_skill_ids: string[];         // Skills that contributed to success
  harmful_skill_ids: string[];         // Skills that contributed to failure
  new_learnings: Array<{
    section: string;
    content: string;
    atomicity_score: number;
  }>;
}

Outcome mapping converts database enum values to natural-language feedback for the Reflector:

const feedbackSignalMap = {
  accepted:  "correct — user accepted the Guardian's suggestion",
  overridden:"incorrect — user overrode the Guardian's suggestion",
  wait:      "correct — user chose to wait as suggested",
  abandoned: "neutral — user abandoned without deciding",
  // ...
};

Stage 2: Skill Curation

The SkillManager converts reflections into concrete Skillbook operations:

export async function runSkillUpdate(
  result: LearningPipelineResult
): Promise<UpdateBatch | null> {
  const skillManager = getSkillManager();

  const updateBatch = await skillManager.curate({
    reflectionAnalysis: result.reflectionOutput.analysis,
    skillbook: result.skillbook,
  });

  // Apply to in-memory Skillbook
  result.skillbook.applyUpdate(updateBatch);

  // Persist with optimistic locking
  await persistSkillbookUpdate(result.userId, result.skillbook, result.skillbookVersion);

  return updateBatch;
}

An UpdateBatch contains typed operations:

interface UpdateBatch {
  reasoning: string;       // Why these changes were made
  operations: Array<{
    type: "ADD" | "UPDATE" | "TAG" | "REMOVE";
    section: string;
    content?: string;
    skill_id?: string;
  }>;
}

Stage 3: Persistence with Optimistic Locking

The Skillbook is persisted using optimistic locking to handle concurrent interactions:

async function persistSkillbookUpdate(
  userId: string,
  updatedSkillbook: Skillbook,
  expectedVersion: number
): Promise<boolean> {
  const result = await db
    .update(skillbookTable)
    .set({
      skills: updatedSkillbook.toDict(),
      version: sql`${skillbookTable.version} + 1`,
    })
    .where(
      and(
        eq(skillbookTable.userId, userId),
        eq(skillbookTable.version, expectedVersion)
      )
    );

  return result.rowCount > 0;
}

If a version conflict occurs (another interaction updated the Skillbook while this one was reflecting), the pipeline reloads the fresh Skillbook, re-applies the same UpdateBatch, and retries — up to 3 attempts.

The Dual Feedback Loop

Immediate Feedback (Seconds)

User overrides → feedbackSignalMap["overridden"] → Reflector →
  "Strategy 'impulse-check-generic' was too generic, user felt lectured" →
  SkillManager → TAG harmful, ADD "time-cost-framing" strategy

Retrospective Feedback (Hours/Days)

Ghost Cards resurface past purchases for reflection:

"Still happy with those headphones?" → "Worth it" →
  satisfactionToFeedbackSignal("worth_it", "accepted") →
  "positive_reinforcement — user confirms good decision" →
  Reflector → TAG helpful on the strategy that was used

Why both loops matter: Immediate feedback captures decision quality — did the user listen? Retrospective feedback captures outcome quality — was the decision actually good? A user might override the Guardian (immediate negative signal) but later regret the purchase (retrospective negative signal), confirming the Guardian was right. Or they might accept (immediate positive) but regret (retrospective negative), revealing the strategy was persuasive but unhelpful.

The Counterfactual Proof

Rising acceptance rates alone could be data ordering artifacts. The real proof of learning is the counterfactual:

Interaction A (Day 1):
  Trigger: Late-night electronics, $80
  Strategy: "impulse-check-generic" → "Do you really need this?"
  Result: Override (user annoyed)

Learning cycle runs...

Interaction B (Day 2):
  Trigger: Late-night electronics, $60 (same pattern)
  Strategy: "time-cost-framing" → "That's 4 hours of your hourly rate"
  Result: Wait 24h (user paused)

Same trigger type. Different strategy. Different outcome. This is visible in Opik traces — filter by guardian:therapist:override to see Interaction A, then guardian:therapist:wait to see Interaction B. The learning:skillbook_update trace between them shows the pivot.

Vendored ACE Package

The ACE framework is vendored from kayba-ai/ace-ts@f790a4a at packages/ace/ with three modifications for Pause's build toolchain:

Change	Why
Stripped `.js` extensions from all imports	Turbopack can't resolve `.js` → `.ts` in workspace packages
Stubbed `sentence-transformers` dynamic import	Python-only dependency; Turbopack statically resolves `import()` even behind guards
Migrated to Zod v4 (`z.record(z.string(), z.any())`)	Zod v4 requires explicit key type in `z.record()`

The vendored package has full Biome linting (with targeted overrides for upstream patterns) and TypeScript compliance with verbatimModuleSyntax: true.

ACE + Opik: The Observable Learning Loop

Every ACE operation generates an Opik trace (see the Opik Deep Dive):

ACE Stage	Opik Trace	What's Captured
Reflection	`learning:reflection`	Analysis text, helpful/harmful skill IDs, new learnings count
Skill curation	`learning:skillbook_update`	Operation count, before/after skill count, delta, reasoning, full operations list
Satisfaction feedback	`learning:satisfaction_feedback`	Original outcome, mapped signal, reflection analysis

The Skillbook snapshot is included in update traces, so you can reconstruct the Skillbook's state at any point in time by walking the trace history.

The ACE package also has its own Opik integration layer (packages/ace/src/observability/) with:

OpikIntegration class for framework-level tracing
maybeTrack() decorator for conditional method tracing
Graceful degradation when Opik is unavailable

What We Learned

1. Optimistic locking is essential for Skillbook persistence

Multiple interactions can trigger learning simultaneously. Without version-based locking, later writes silently overwrite earlier learnings. The retry-with-reload pattern ensures no learning is lost.

2. Two temporal feedback loops are more powerful than one

Immediate feedback tells you what the user decided. Retrospective feedback tells you if it was the right decision. The Ghost Card loop catches cases where the Guardian's strategy was persuasive but ultimately unhelpful — a signal you'd never get from click-through rates alone.

3. The adapter layer saved us repeatedly

When the upstream ACE API changed (Zod v4, import paths), fixes were isolated to one file. When Turbopack broke on .js extensions, the fix was in the vendored package — the adapter layer shielded the application code entirely.

4. Seed scripts prove learning more convincingly than live demos

The demo-rookie → demo-pro seed script transition creates a controlled before/after that demonstrates learning in seconds. Without seeds, you'd need 10+ live interactions — impossible in a 5-minute demo. The seed scripts use Skillbook class methods (Skillbook.dumps(), Skillbook.fromDict()) for type-safe serialization.

5. Strategy selection is LLM-driven, not hardcoded

The system prompt includes the Skillbook context with effectiveness scores. The LLM chooses which strategy to use based on skill ratings — we don't hardcode "if score > X, use strategy Y." This means the learning is genuinely emergent, not just a threshold lookup.

The Learning Arc in Numbers

State	Skills	Dominant Strategy	Acceptance Rate
Rookie (seed)	4 generic	`impulse-check-generic`	~30%
After 5 interactions	6-7 mixed	Transitioning	~50%
Pro (seed)	8-10 learned	`time-cost-framing`	~70%+

The Skillbook grows from generic seeds into personalized strategies through real LLM-powered reflection — not hardcoded progression.

File Reference

File	Role
`packages/ace/`	Vendored ACE framework (Skillbook, Reflector, SkillManager, Agent)
`packages/ace/src/observability/`	ACE's own Opik integration layer
`apps/web/src/lib/server/ace.ts`	Adapter layer — all ACE imports flow through here
`apps/web/src/lib/server/learning.ts`	Learning pipeline: reflection, skill curation, persistence
`apps/web/src/lib/server/ghost-cards.ts`	Ghost Card satisfaction feedback signals
`apps/web/src/app/api/ai/feedback/route.ts`	Feedback collection + async learning trigger
`apps/web/src/app/api/ai/ghost-cards/[id]/route.ts`	Retrospective satisfaction feedback
`packages/db/src/schema/`	Skillbook table with JSONB column + version for optimistic locking

Built with Next.js 16 + ACE Framework + Opik + Vercel AI SDK v6 + Neon PostgreSQL

DEV Community