DEV Community

Cover image for Build Your Own AI Ops Assistant — Part 5: The Knowledge Loop
Stephen Goldberg for Harper

Posted on

Build Your Own AI Ops Assistant — Part 5: The Knowledge Loop

This is Part 5 of a 6-part series. Part 4 covers the Slack integration.


Vector Search, Feedback & Self-Improvement

This is the part that separates a demo from a production system. Most AI assistants are stateless; they forget everything between conversations. Harper Eye gets smarter every time your team uses it. Good answers get cached and returned instantly. Bad answers get flagged, degraded, and eventually purged.

Here's what the knowledge loop looks like after a few weeks of real usage:

Dashboard showing 29 KB articles built from positive feedback, 24 feedback items shaping future responses, and 0 degraded entries — the knowledge loop working as designed

29 knowledge base articles, all created automatically when engineers clicked "Helpful." 24 feedback items — teaching the system what to avoid. Zero degraded entries, meaning the self-healing pipeline is working. That's institutional knowledge being built by your team without anyone having to write a wiki page. The knowledge loop is the compound interest of institutional knowledge.


How the Knowledge Loop Works

Flowchart showing an AI feedback loop architecture: an engineer’s question triggers a vector similarity search (HNSW). High similarity (≥0.85) returns a cached answer instantly, medium similarity (0.70–0.84) injects context for LLM synthesis, and low similarity (<0.70) triggers full orchestration. Outputs include “Helpful” and “Not Helpful” buttons. Helpful responses are stored as knowledge entries for faster future answers, while negative feedback is stored separately with decay rules that downgrade or delete entries if negative ratios exceed 30% or 50%, creating a continuous improvement cycle.

The beauty of this system: no one has to manually curate the knowledge base. Engineers just click thumbs up or thumbs down. The embeddings and similarity scores handle the rest.


Step 1: The Knowledge Base — Storage & Search

This is the core of the knowledge loop. Create lib/knowledge-base.js:

import { tables } from 'harperdb';
import crypto from 'crypto';
import { generateEmbedding } from './embeddings.js';

const SIMILARITY_HIGH = 0.85;   // Exact match — return cached
const SIMILARITY_LOW = 0.70;    // Partial match — inject as context
const NEGATIVE_FEEDBACK_THRESHOLD = 0.80;
const NEGATIVE_RATIO_DOWNGRADE = 0.3;  // 30%+ negative → downgrade
const NEGATIVE_RATIO_DELETE = 0.5;     // 50%+ negative → delete
const MIN_NEGATIVES_TO_DELETE = 2;     // Need at least 2 negatives

/**
 * Search the knowledge base for verified answers similar to the query.
 * Returns: { match: 'exact' | 'partial' | 'none', entry?, score?, degraded? }
 */
export async function searchKnowledgeBase(queryText, precomputedEmbedding = null) {
  let embedding;
  try {
    embedding = precomputedEmbedding ?? await generateEmbedding(queryText);
  } catch (err) {
    console.error('Embedding failed, skipping KB search:', err.message);
    return { match: 'none' };
  }

  try {
    // Vector similarity search using Harper's built-in HNSW index
    const results = tables.KnowledgeEntry.search({
      sort: { attribute: 'queryEmbedding', target: embedding },
      limit: 1,
    });

    const entries = [];
    for await (const entry of results) {
      entries.push(entry);
    }

    if (entries.length === 0) return { match: 'none' };

    const topEntry = entries[0];
    const storedEmbedding = topEntry.queryEmbedding;
    if (!storedEmbedding?.length) return { match: 'none' };

    // Compute cosine similarity
    const score = cosineSimilarity(embedding, storedEmbedding);

    if (score >= SIMILARITY_HIGH) {
      // Check if the entry has been degraded by negative feedback
      const negativeCount = topEntry.negativeCount ?? 0;
      const total = (topEntry.useCount ?? 0) + negativeCount;

      if (negativeCount > 0 && total > 0) {
        const ratio = negativeCount / total;
        if (ratio >= NEGATIVE_RATIO_DOWNGRADE) {
          // Downgrade: don't return as exact, let Claude regenerate
          return {
            match: 'partial',
            entry: serializeEntry(topEntry),
            score,
            degraded: true,
          };
        }
      }

      // Genuine exact match — increment use count and return
      await updateUseCount(topEntry);
      return { match: 'exact', entry: serializeEntry(topEntry), score };
    }

    if (score >= SIMILARITY_LOW) {
      return { match: 'partial', entry: serializeEntry(topEntry), score };
    }

    return { match: 'none' };
  } catch (err) {
    console.error('KB search failed:', err.message);
    return { match: 'none' };
  }
}
Enter fullscreen mode Exit fullscreen mode

Let me highlight what's happening with Harper's vector search:

tables.KnowledgeEntry.search({
  sort: { attribute: 'queryEmbedding', target: embedding },
  limit: 1,
});
Enter fullscreen mode Exit fullscreen mode

That's it. One line. Harper's built-in HNSW index handles approximate nearest-neighbor search across all stored embeddings. No Pinecone client, no pgvector extension, no separate vector database. You defined @indexed(type: "HNSW", distance: "cosine") in your schema and Harper handles the rest.


Step 2: Storing Verified Knowledge

When an engineer clicks "Helpful," we store the question-answer pair with a vector embedding. Future similar questions will match against it.

/**
 * Store a verified answer in the knowledge base.
 * Called when user clicks thumbs-up.
 */
export async function storeKnowledgeEntry({
  query,
  answer,
  sources,
  originalIncidentId,
  approvedByUserId,
  channelId,
}) {
  const embedding = await generateEmbedding(query);

  const entry = {
    id: crypto.randomUUID(),
    query,
    queryEmbedding: embedding,
    answer: typeof answer === 'string' ? answer : JSON.stringify(answer),
    sources: Array.isArray(sources) ? sources : [],
    originalIncidentId,
    approvedByUserId,
    approvedAt: new Date().toISOString(),
    channelId,
    useCount: 0,
    lastUsedAt: null,
    negativeCount: 0,
  };

  await tables.KnowledgeEntry.put(entry);
  console.log(`KB entry stored: ${entry.id} for "${query.slice(0, 80)}"`);
  return entry;
}
Enter fullscreen mode Exit fullscreen mode

Once stored, the embedding is automatically added to the HNSW index. The next time someone asks a semantically similar question, searchKnowledgeBase() will find it.


Step 3: The Negative Feedback Loop

This is where the system self-heals. Negative feedback does two things:

  1. Warns Claude about past failures so it doesn't repeat the same mistake
  2. Degrades bad KB entries so they stop being returned as cached answers
/**
 * Search for past negative feedback on similar queries.
 * Used by the orchestrator to warn Claude about past failures.
 */
export async function searchNegativeFeedback(queryText, precomputedEmbedding = null) {
  let embedding;
  try {
    embedding = precomputedEmbedding ?? await generateEmbedding(queryText);
  } catch (err) {
    return [];
  }

  try {
    const results = tables.NegativeFeedback.search({
      sort: { attribute: 'queryEmbedding', target: embedding },
      limit: 3,
    });

    const matches = [];
    for await (const entry of results) {
      const stored = entry.queryEmbedding;
      if (!stored?.length) continue;

      const score = cosineSimilarity(embedding, stored);
      if (score >= NEGATIVE_FEEDBACK_THRESHOLD) {
        matches.push({
          category: entry.category,
          details: entry.details,
          originalQuery: entry.originalQuery,
          score,
        });
      }
    }
    return matches;
  } catch (err) {
    console.error('Negative feedback search failed:', err.message);
    return [];
  }
}

/**
 * Store negative feedback with embedding.
 * If linked to a KB entry, degrades that entry.
 */
export async function storeNegativeFeedback({
  queryId,
  originalQuery,
  category,
  details,
  userId,
  channelId,
  knowledgeEntryId,
}) {
  const embedding = await generateEmbedding(originalQuery);

  const record = {
    id: crypto.randomUUID(),
    queryId,
    originalQuery,
    queryEmbedding: embedding,
    category,
    details: details || null,
    userId,
    channelId,
    knowledgeEntryId: knowledgeEntryId || null,
    createdAt: new Date().toISOString(),
  };

  await tables.NegativeFeedback.put(record);

  // Degrade the linked KB entry if one exists
  if (knowledgeEntryId) {
    await degradeKnowledgeEntry(knowledgeEntryId);
  }

  return record;
}
Enter fullscreen mode Exit fullscreen mode

The degradation logic is where it gets interesting:

async function degradeKnowledgeEntry(entryId) {
  try {
    const entry = await tables.KnowledgeEntry.get(entryId);
    if (!entry) return;

    const newNegativeCount = (entry.negativeCount ?? 0) + 1;
    const total = (entry.useCount ?? 0) + newNegativeCount;
    const ratio = newNegativeCount / total;

    if (newNegativeCount >= MIN_NEGATIVES_TO_DELETE && ratio >= NEGATIVE_RATIO_DELETE) {
      // This answer is doing more harm than good — remove it
      await tables.KnowledgeEntry.delete(entryId);
      console.log(`KB entry ${entryId} DELETED (ratio: ${ratio.toFixed(2)})`);
    } else {
      // Just increment the counter
      await tables.KnowledgeEntry.put({
        ...entry,
        negativeCount: newNegativeCount,
      });
      console.log(`KB entry ${entryId} negative count → ${newNegativeCount} (ratio: ${ratio.toFixed(2)})`);
    }
  } catch (err) {
    console.error('Failed to degrade KB entry:', err.message);
  }
}
Enter fullscreen mode Exit fullscreen mode

The thresholds are tuned for safety:

  • 30% negative ratio → the "exact" match gets downgraded to "partial," so Claude regenerates a fresh answer using the old one as reference
  • 50% negative ratio AND at least 2 negatives → the entry is deleted entirely
  • This means a single bad vote can't kill an entry that's been helpful 10 times

The result: bad knowledge expires naturally. Good knowledge compounds. No human curation required.


Step 4: Cosine Similarity

The math behind the matching:

function cosineSimilarity(a, b) {
  let dot = 0;
  let magA = 0;
  let magB = 0;
  for (let i = 0; i < a.length; i++) {
    dot += a[i] * b[i];
    magA += a[i] * a[i];
    magB += b[i] * b[i];
  }
  return dot / (Math.sqrt(magA) * Math.sqrt(magB));
}
Enter fullscreen mode Exit fullscreen mode

We compute this manually after Harper's HNSW search because Harper returns the nearest vectors but doesn't expose the similarity score directly. The HNSW index does the heavy lifting (finding the right candidates from potentially thousands of entries); we just compute the final score on the top result.


Step 5: The Use Count Tracker

Every time a cached answer is returned, we increment its use count. This serves two purposes: analytics (which answers are most valuable?) and feedback ratio calculation (a bad answer that's been used 100 times needs more than 2 complaints to be deleted).

async function updateUseCount(entry) {
  try {
    const existing = await tables.KnowledgeEntry.get(entry.id);
    if (!existing) return;
    await tables.KnowledgeEntry.put({
      ...existing,
      useCount: (existing.useCount ?? 0) + 1,
      lastUsedAt: new Date().toISOString(),
    });
  } catch (err) {
    console.error('Failed to update KB use count:', err.message);
  }
}

function serializeEntry(entry) {
  return {
    id: entry.id,
    query: entry.query,
    answer: entry.answer,
    sources: entry.sources,
    approvedByUserId: entry.approvedByUserId,
    approvedAt: entry.approvedAt,
    useCount: entry.useCount,
    negativeCount: entry.negativeCount ?? 0,
  };
}
Enter fullscreen mode Exit fullscreen mode

Notice serializeEntry strips the embedding vector before returning. Those are 768-element float arrays; there is no need to send them downstream or serialize them in responses.


The Full Picture

Here's what your Harper Eye knowledge pipeline looks like now:

Scenario What Happens Speed
New question, no KB match Full orchestration: embed → parallel search → Claude → respond 5-15s
Similar question asked before (verified) KB exact match → return cached answer <1s
Similar question but answer was bad KB match downgraded → Claude regenerates with old answer as context 5-15s
Same mistake pattern detected Negative feedback injected into Claude prompt → avoids repeating error 5-15s
Terrible answer accumulates complaints KB entry auto-deleted → system reverts to fresh orchestration Automatic

Every interaction makes the system better. Every thumbs-up builds institutional knowledge. Every thumbs-down teaches the system what to avoid. This is the compound interest of an AI assistant that works for your company, not a vendor's.


What's Next

In Part 6, we deploy to production, walk through the actual cost breakdown with real numbers, and explore extensions — PagerDuty webhooks for automatic incident analysis, a web dashboard for knowledge base management, and expert routing that knows who on your team to page.

Top comments (0)