<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Rost</title>
    <description>The latest articles on DEV Community by Rost (@rosgluk).</description>
    <link>https://dev.to/rosgluk</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3544400%2F04dd81bf-749e-4055-971f-316c0134e76c.jpg</url>
      <title>DEV Community: Rost</title>
      <link>https://dev.to/rosgluk</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/rosgluk"/>
    <language>en</language>
    <item>
      <title>AI for Knowledge Management: Real Workflows That Hold Up</title>
      <dc:creator>Rost</dc:creator>
      <pubDate>Sun, 31 May 2026 13:51:08 +0000</pubDate>
      <link>https://dev.to/rosgluk/ai-for-knowledge-management-real-workflows-that-hold-up-3ag</link>
      <guid>https://dev.to/rosgluk/ai-for-knowledge-management-real-workflows-that-hold-up-3ag</guid>
      <description>&lt;p&gt;AI is not replacing knowledge management; it is changing the shape of it for both individuals and teams.&lt;/p&gt;

&lt;p&gt;Microsoft's Work Trend Index describes a move toward hybrid teams of humans and agents, and NIST's AI RMF argues that trustworthy AI systems need explicit roles, evaluation, and oversight rather than vague automation.&lt;br&gt;
Those ideas fit neatly beside the human-centred practices in the site's &lt;a href="https://www.glukhov.org/knowledge-management/" rel="noopener noreferrer"&gt;Knowledge Management in 2026 pillar&lt;/a&gt;, which focuses on tools and methods long before any model is involved.&lt;/p&gt;

&lt;p&gt;That is exactly the right frame for knowledge work: AI is best treated as an enrichment layer over notes, docs, runbooks, and research, not as a magical second brain that works without structure. A useful mental model is the one developed in &lt;a href="https://www.glukhov.org/knowledge-management/foundations/pkm-vs-rag-vs-wiki-vs-memory-systems/" rel="noopener noreferrer"&gt;PKM vs RAG vs Wiki vs Memory Systems&lt;/a&gt;, where human note systems, shared wikis, retrieval pipelines, and agent memory each play a distinct role instead of collapsing into a single tool.&lt;/p&gt;

&lt;p&gt;The slightly opinionated version is this: if your notes are chaotic, AI will not rescue them. It will often make the chaos more fluent. Good knowledge management still starts with capture, naming, ownership, and source discipline. What AI changes is what you can do after capture: compress, extract, link, retrieve, and repackage information at useful speed. That view fits both modern prompting guidance, which recommends small, well-scoped tasks, and chunking guidance that preserves semantic units for retrieval instead of flattening everything into one blob.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why AI changes knowledge management
&lt;/h2&gt;

&lt;p&gt;The core shift is from static archives to active memory. Embeddings convert text into vectors that reflect relatedness and are commonly used for search, clustering, and recommendations. Retrieval systems can then surface semantically similar material even when the query shares few or no keywords with the source text. In practical terms, that means a note about "incident review" can still find a runbook chunk titled "post-deployment outage steps" without brittle exact-match rules.&lt;/p&gt;

&lt;p&gt;This is why AI-augmented knowledge management is worth doing now. The enabling pieces are no longer exotic: embedding APIs are mainstream, vector stores are standard, local embedding models are easy to run, and production databases such as Postgres can do both exact and approximate nearest-neighbour search with pgvector. The result is not artificial knowledge in the philosophical sense. It is a much more practical thing: better recall, better compression, and better context at the moment someone needs to think, especially when paired with solid representation choices from work such as &lt;a href="https://www.glukhov.org/knowledge-management/foundations/retrieval-vs-representation/" rel="noopener noreferrer"&gt;Retrieval vs Representation in Knowledge Systems&lt;/a&gt;. If your next step is implementation detail, the &lt;a href="https://www.glukhov.org/rag/" rel="noopener noreferrer"&gt;RAG cluster&lt;/a&gt; covers chunking, retrieval, reranking, and production patterns in depth.&lt;/p&gt;
&lt;h2&gt;
  
  
  Workflow patterns that actually work
&lt;/h2&gt;

&lt;p&gt;The patterns that hold up in production are boring in the best way. They use AI for bounded transformations, not vague autonomy. In practice, three patterns show up again and again: summarisation, extraction, and linking suggestions. Those map neatly to what current tools do well: summarise within a clear scope, extract structured data with schemas, and compute semantic relatedness through embeddings and retrieval. They also map cleanly onto the layered view of knowledge systems behind concepts such as &lt;a href="https://www.glukhov.org/knowledge-management/foundations/second-brain/" rel="noopener noreferrer"&gt;second brain workflows&lt;/a&gt; and &lt;a href="https://www.glukhov.org/knowledge-management/knowledge-systems-architectures/compiled-knowledge/what-is-llm-wiki/" rel="noopener noreferrer"&gt;LLM Wiki style compiled knowledge&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;
  
  
  Summaries that preserve decisions
&lt;/h3&gt;

&lt;p&gt;Summarisation works best when it stays close to the source and preserves the parts humans actually need later: decisions, unresolved questions, owners, dates, and links back to the original material. OpenAI's enterprise prompting guidance explicitly recommends "one prompt, one deliverable", simple headings, and clear success criteria. That is a good discipline for knowledge work too: summarise one meeting, one document, or one research item at a time, then store the summary beside the source. Do not ask a model to "summarise my knowledge base" and expect anything trustworthy.&lt;/p&gt;

&lt;p&gt;A real workflow looks like this: capture meeting notes or a PDF, run a scoped summary prompt, store the summary with source references, then add a human check before it becomes canonical. If the source is a rich PDF, multimodal parsing can matter because slide decks and exported web pages often contain layout cues that plain text extraction misses. OpenAI's PDF parsing cookbook shows a practical split between text extraction and page-image analysis for turning rich PDFs into retrievable content.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Context
You are assisting with team knowledge capture.

# Instructions
Summarise this meeting note in:
- 5 key points
- decisions made
- open questions
- actions with owners
- terms that should link to existing notes

# Constraints
- Do not invent details
- If something is unclear, mark it as uncertain
- Include the source note ID
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Extraction that creates reusable fields
&lt;/h3&gt;

&lt;p&gt;Extraction is where AI starts to feel genuinely infrastructural. Instead of storing only prose, you ask the model to populate reusable fields such as entities, systems, APIs, owners, action items, products, dates, claims, or risk tags. OpenAI's Structured Outputs feature is designed to keep responses aligned to a JSON Schema, and Ollama offers the same pattern locally with schema-based JSON output. That matters because useful knowledge systems are made of fields you can sort, filter, compare, and validate, not just paragraphs that sound clever.&lt;/p&gt;

&lt;p&gt;OpenAI's long-document entity extraction example follows the right operational pattern: chunk the document, extract the relevant facts from each chunk, and then combine results. That same workflow works for postmortems, research papers, product docs, customer interviews, and support transcripts. In practice, I would extract more than named entities: I would also pull "needs follow-up", "contradicts existing note", and "candidate for evergreen note" because those fields create action, not just metadata.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"source_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"note-2026-05-22-incident-review"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"summary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Short summary here."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"entities"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"service-a"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"postgres"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"oauth"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"actions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"owner"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ops"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"task"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"rotate keys"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"due"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-05-24"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"related_terms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"token refresh"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"deployment checklist"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"confidence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"medium"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Linking that turns notes into a graph
&lt;/h3&gt;

&lt;p&gt;Link suggestions are the quiet workhorse of AI for knowledge management. Embeddings are explicitly used for search, clustering, and recommendations, which makes them a natural fit for related notes, similar incidents, see also, and you may want to merge these two docs features. Semantic retrieval is especially good at surfacing conceptually related content even when wording differs. That makes it far better than folder hierarchies alone for large note sets and technical documentation.&lt;/p&gt;

&lt;p&gt;Dense semantic search should not be your only retrieval signal, though. Exact identifiers still matter: function names, package names, issue IDs, error codes, SKUs, regulation numbers. Google Research has shown that hybrid retrieval, which combines semantic and lexical signals, improves recall because each method finds relevant material the other misses. In a technical knowledge base, that is not an academic detail. It is the difference between finding the conceptually related design note and also finding the exact migration command someone needs at 2 a.m.&lt;/p&gt;

&lt;p&gt;If you are already on Postgres, pgvector is the pragmatic option. It stores vectors with the rest of your data, supports exact search by default, and offers approximate indexing through HNSW and IVFFlat when you need more speed and can tolerate some recall trade-off. That is enough to build related-content suggestions, semantic search, and note deduplication without adding a separate vector database on day one.&lt;/p&gt;

&lt;h2&gt;
  
  
  The human plus AI loop
&lt;/h2&gt;

&lt;p&gt;The model that actually works is not human or AI. It is capture -&amp;gt; AI enrich -&amp;gt; human refine. Microsoft describes the broader shift as humans working with assistants and then agent teams, while NIST's AI RMF and Playbook stress clearly defined human roles, responsibilities, and oversight in human-AI configurations. For knowledge management, that means humans remain accountable for the canonical note, the source of truth, and the final merge or publication decision. AI does the first-pass compression and cross-linking; humans do the judgement.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;capture -&amp;gt; parse -&amp;gt; chunk -&amp;gt; embed -&amp;gt; enrich -&amp;gt; review -&amp;gt; publish
             |         |        |
             |         |        +-&amp;gt; related notes
             |         +-&amp;gt; retrieval index
             +-&amp;gt; structure-aware extraction
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This division of labour is more than cautious process design. It matches how risk accumulates. NIST notes that understanding the limitations of human-AI interaction improves AI risk management, and that roles in oversight and use should be clearly differentiated. In practice, that means the model can draft titles, tags, summaries, and candidate links, but a person should approve anything that changes taxonomy, publishes external content, or overwrites an existing note. If you let the model silently rewrite your knowledge base, you are not building memory. You are outsourcing editorial control to a probabilistic system.&lt;/p&gt;

&lt;h2&gt;
  
  
  The tool choices that matter
&lt;/h2&gt;

&lt;p&gt;The base layer is embeddings plus retrieval. OpenAI's embeddings guide frames embeddings as a way to measure relatedness between text strings, while the Retrieval API handles semantic search over your data through vector stores. For many teams, that is the minimum viable stack for AI-augmented knowledge management: parse content, chunk it well, embed it, and retrieve the right fragments before synthesis. If you only do one serious thing this quarter, make it retrieval-backed recall instead of a chat wrapper over raw documents.&lt;/p&gt;

&lt;p&gt;Local models are the right answer when privacy, offline use, or cost control dominate. Ollama documents both local embeddings and structured outputs, and its product pages emphasise that data stays yours and that workloads can run entirely offline. That makes local-first pipelines sensible for internal notes, engineering runbooks, and sensitive research archives. My bias is simple: use local models for indexing, classification, and routine enrichment; reach for hosted APIs when you need stronger reasoning, multimodal extraction, or the best available model quality.&lt;/p&gt;

&lt;p&gt;Do not ignore parsing and chunking. Unstructured's chunking docs recommend building chunks from semantic document elements rather than raw character boundaries when possible, and OpenAI's PDF cookbook shows why rich-document parsing matters for RAG. Structure-aware PDF work goes further: naive parsing can destroy tables, scramble reading order, and strip hierarchical headings, while structure-aware parsing preserves paragraphs, tables, and document hierarchy. In knowledge management, that is the difference between an index that understands your corpus and one that merely tokenises it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Limitations worth respecting
&lt;/h2&gt;

&lt;p&gt;Hallucination is still the obvious risk, but the more useful framing is insufficient context. RAG exists because large language models can hallucinate, use stale knowledge, and produce answers with weak traceability; retrieval helps by grounding generation in external knowledge. Even so, Google Research found that models often answer incorrectly instead of abstaining when the provided context is not sufficient. That matters for knowledge management because "I found something similar" is not the same as "I found enough to answer". Your system should preserve source references, expose uncertainty, and prefer abstention over confident fabrication.&lt;/p&gt;

&lt;p&gt;Long context does not remove the need for retrieval discipline. The 2023 "Lost in the Middle" paper showed that model performance could degrade when relevant information sat in the middle of long inputs, and newer Google results show that at least some newer models have improved substantially on simple needle-in-a-haystack retrieval near context limits. The sober lesson is not "long context solves it" or "long context is useless". It is that you should test your actual workflows and corpus, because position effects, task type, and document structure still matter.&lt;/p&gt;

&lt;p&gt;Loss of structure is the quieter failure mode, and in technical documentation it can be worse than hallucination because it poisons retrieval before the model even starts reasoning. Structure-aware PDF research shows that naive parsing can split tables, destroy their internal meaning, and break reading order, while semantic chunking systems try to preserve coherent document elements. If your source material includes tables, diagrams, code examples, or multi-column layouts, your parser is part of your knowledge system, not a boring preprocessing detail.&lt;/p&gt;

&lt;p&gt;So the practical rule is this: keep the human editorial loop, preserve source links, use schemas for extraction, and treat retrieval quality as a product feature. AI does not replace PKM, team docs, or knowledge architecture. It changes the leverage. Used well, it turns raw notes into searchable, linkable, structured memory. Used badly, it turns your documentation into high-speed drift.&lt;/p&gt;

</description>
      <category>llm</category>
      <category>ai</category>
      <category>knowledgemanagement</category>
      <category>rag</category>
    </item>
    <item>
      <title>Multi-Tenancy Database Patterns with examples in Go</title>
      <dc:creator>Rost</dc:creator>
      <pubDate>Thu, 28 May 2026 13:13:31 +0000</pubDate>
      <link>https://dev.to/rosgluk/multi-tenancy-database-patterns-with-examples-in-go-1kje</link>
      <guid>https://dev.to/rosgluk/multi-tenancy-database-patterns-with-examples-in-go-1kje</guid>
      <description>&lt;p&gt;&lt;a href="https://www.glukhov.org/app-architecture/multitenancy/multi-tenant-database-patterns/" rel="noopener noreferrer"&gt;Multi-tenancy&lt;/a&gt; is a fundamental architectural pattern for SaaS applications, allowing multiple customers (tenants) to share the same application infrastructure while maintaining data isolation.&lt;/p&gt;

&lt;p&gt;Choosing the right database pattern is crucial for scalability, security, and operational efficiency.&lt;/p&gt;

&lt;h2&gt;
  
  
  Overview of Multi-Tenancy Patterns
&lt;/h2&gt;

&lt;p&gt;When designing a multi-tenant application, you have three primary database architecture patterns to choose from:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Shared Database, Shared Schema&lt;/strong&gt; (most common)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Shared Database, Separate Schema&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Separate Database per Tenant&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Each pattern has distinct characteristics, trade-offs, and use cases. Let's explore each in detail.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pattern 1: Shared Database, Shared Schema
&lt;/h2&gt;

&lt;p&gt;This is the most common multi-tenancy pattern, where all tenants share the same database and schema, with a &lt;code&gt;tenant_id&lt;/code&gt; column used to distinguish tenant data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Architecture&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────┐
│     Single Database                 │
│  ┌───────────────────────────────┐  │
│  │  Shared Schema                │  │
│  │  - users (tenant_id, ...)     │  │
│  │  - orders (tenant_id, ...)    │  │
│  │  - products (tenant_id, ...)  │  │
│  └───────────────────────────────┘  │
└─────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Implementation Example&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When implementing multi-tenant patterns, understanding SQL fundamentals is crucial. For a comprehensive reference on SQL commands and syntax, check out our &lt;a href="https://www.glukhov.org/developer-tools/database-tools/sql-cheatsheet/" rel="noopener noreferrer"&gt;SQL Cheatsheet&lt;/a&gt;. Here's how to set up the shared schema pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Users table with tenant_id&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;SERIAL&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="nb"&gt;INTEGER&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;email&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="nb"&gt;TIMESTAMP&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="k"&gt;FOREIGN&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;REFERENCES&lt;/span&gt; &lt;span class="n"&gt;tenants&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Index on tenant_id for performance&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_users_tenant_id&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Row-Level Security (PostgreSQL example)&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="n"&gt;ENABLE&lt;/span&gt; &lt;span class="k"&gt;ROW&lt;/span&gt; &lt;span class="k"&gt;LEVEL&lt;/span&gt; &lt;span class="k"&gt;SECURITY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;POLICY&lt;/span&gt; &lt;span class="n"&gt;tenant_isolation&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;
    &lt;span class="k"&gt;FOR&lt;/span&gt; &lt;span class="k"&gt;ALL&lt;/span&gt;
    &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;current_setting&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'app.current_tenant'&lt;/span&gt;&lt;span class="p"&gt;)::&lt;/span&gt;&lt;span class="nb"&gt;INTEGER&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For more PostgreSQL-specific features and commands, including RLS policies, schema management, and performance tuning, refer to our &lt;a href="https://www.glukhov.org/developer-tools/database-tools/postgresql-cheatsheet/" rel="noopener noreferrer"&gt;PostgreSQL Cheatsheet&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Application-Level Filtering&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When working with Go applications, choosing the right ORM can significantly impact your multi-tenant implementation. The examples below use GORM, but there are several excellent options available. For a detailed comparison of Go ORMs including GORM, Ent, Bun, and sqlc, see our &lt;a href="https://www.glukhov.org/app-architecture/data-access/comparing-go-orms-gorm-ent-bun-sqlc/" rel="noopener noreferrer"&gt;comprehensive guide to Go ORMs for PostgreSQL&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// Example in Go with GORM&lt;/span&gt;
&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;GetUserByEmail&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;gorm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DB&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tenantID&lt;/span&gt; &lt;span class="kt"&gt;uint&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="n"&gt;User&lt;/span&gt;
    &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"tenant_id = ? AND email = ?"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tenantID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;First&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Error&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c"&gt;// Middleware to set tenant context&lt;/span&gt;
&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;TenantMiddleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;next&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Handler&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Handler&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;HandlerFunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ResponseWriter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;tenantID&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;extractTenantID&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c"&gt;// From subdomain, header, or JWT&lt;/span&gt;
        &lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithValue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="s"&gt;"tenant_id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tenantID&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;next&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ServeHTTP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithContext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Shared Schema Pros&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Lowest cost&lt;/strong&gt;: Single database instance, minimal infrastructure&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Easiest operations&lt;/strong&gt;: One database to backup, monitor, and maintain&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Simple schema changes&lt;/strong&gt;: Migrations apply to all tenants at once&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Best for high tenant count&lt;/strong&gt;: Efficient resource utilization&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-tenant analytics&lt;/strong&gt;: Easy to aggregate data across tenants&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Shared Schema Cons&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Weaker isolation&lt;/strong&gt;: Data leakage risk if queries forget tenant_id filter&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Noisy neighbor&lt;/strong&gt;: One tenant's heavy workload can affect others&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Limited customization&lt;/strong&gt;: All tenants share the same schema&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compliance challenges&lt;/strong&gt;: Harder to meet strict data isolation requirements&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backup complexity&lt;/strong&gt;: Can't restore individual tenant data easily&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Shared Schema Best For&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;SaaS applications with many small-to-medium tenants&lt;/li&gt;
&lt;li&gt;Applications where tenants don't need custom schemas&lt;/li&gt;
&lt;li&gt;Cost-sensitive startups&lt;/li&gt;
&lt;li&gt;When tenant count is high (thousands+)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Pattern 2: Shared Database, Separate Schema
&lt;/h2&gt;

&lt;p&gt;Each tenant gets their own schema within the same database, providing better isolation while sharing infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Separate Schema Architecture&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────┐
│     Single Database                 │
│  ┌──────────┐  ┌──────────┐         │
│  │ Schema A │  │ Schema B │  ...    │
│  │ (Tenant1)│  │ (Tenant2)│         │
│  └──────────┘  └──────────┘         │
└─────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Separate Schema Implementation&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;PostgreSQL schemas are a powerful feature for multi-tenancy. For detailed information on PostgreSQL schema management, connection strings, and database administration commands, consult our &lt;a href="https://www.glukhov.org/developer-tools/database-tools/postgresql-cheatsheet/" rel="noopener noreferrer"&gt;PostgreSQL Cheatsheet&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Create schema for tenant&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;SCHEMA&lt;/span&gt; &lt;span class="n"&gt;tenant_123&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Set search path for tenant operations&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;search_path&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;tenant_123&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;public&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Create tables in tenant schema&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;tenant_123&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;SERIAL&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;email&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="nb"&gt;TIMESTAMP&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Application Connection Management&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Managing database connections efficiently is critical for multi-tenant applications. The connection management code below uses GORM, but you might want to explore other ORM options. For a thorough comparison of Go ORMs including connection pooling, performance characteristics, and use cases, refer to our &lt;a href="https://www.glukhov.org/app-architecture/data-access/comparing-go-orms-gorm-ent-bun-sqlc/" rel="noopener noreferrer"&gt;Go ORMs comparison guide&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// Connection string with schema search path&lt;/span&gt;
&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;GetTenantDB&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenantID&lt;/span&gt; &lt;span class="kt"&gt;uint&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;gorm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DB&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;db&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;initializeDB&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Exec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Sprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"SET search_path TO tenant_%d, public"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tenantID&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c"&gt;// Or use PostgreSQL connection string&lt;/span&gt;
&lt;span class="c"&gt;// postgresql://user:pass@host/db?search_path=tenant_123&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Separate Schema Pros&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Better isolation&lt;/strong&gt;: Schema-level separation reduces data leakage risk&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Customization&lt;/strong&gt;: Each tenant can have different table structures&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Moderate cost&lt;/strong&gt;: Still single database instance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Easier per-tenant backups&lt;/strong&gt;: Can backup individual schemas&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Better for compliance&lt;/strong&gt;: Stronger than shared schema pattern&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Separate Schema Cons&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Schema management complexity&lt;/strong&gt;: Migrations must run per tenant&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Connection overhead&lt;/strong&gt;: Need to set search_path per connection&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Limited scalability&lt;/strong&gt;: Schema count limits (PostgreSQL ~10k schemas)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-tenant queries&lt;/strong&gt;: More complex, requires dynamic schema references&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resource limits&lt;/strong&gt;: Still shared database resources&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Separate Schema Best For&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Medium-scale SaaS (dozens to hundreds of tenants)&lt;/li&gt;
&lt;li&gt;When tenants need schema customization&lt;/li&gt;
&lt;li&gt;Applications needing better isolation than shared schema&lt;/li&gt;
&lt;li&gt;When compliance requirements are moderate&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Pattern 3: Separate Database per Tenant
&lt;/h2&gt;

&lt;p&gt;Each tenant gets their own complete database instance, providing maximum isolation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Separate Database Architecture&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌──────────────┐  ┌──────────────┐  ┌──────────────┐
│  Database 1  │  │  Database 2  │  │  Database 3  │
│  (Tenant A)  │  │  (Tenant B)  │  │  (Tenant C)  │
└──────────────┘  └──────────────┘  └──────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Separate Database Implementation&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Create database for tenant&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;DATABASE&lt;/span&gt; &lt;span class="n"&gt;tenant_enterprise_corp&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Connect to tenant database&lt;/span&gt;
&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="k"&gt;c&lt;/span&gt; &lt;span class="n"&gt;tenant_enterprise_corp&lt;/span&gt;

&lt;span class="c1"&gt;-- Create tables (no tenant_id needed!)&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;SERIAL&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;email&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="nb"&gt;TIMESTAMP&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Dynamic Connection Management&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// Connection pool manager&lt;/span&gt;
&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;TenantDBManager&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;pools&lt;/span&gt; &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;uint&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;gorm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DB&lt;/span&gt;
    &lt;span class="n"&gt;mu&lt;/span&gt;    &lt;span class="n"&gt;sync&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RWMutex&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;TenantDBManager&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;GetDB&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenantID&lt;/span&gt; &lt;span class="kt"&gt;uint&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;gorm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DB&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RLock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exists&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pools&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;tenantID&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt; &lt;span class="n"&gt;exists&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RUnlock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RUnlock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Lock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mu&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Unlock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c"&gt;// Double-check after acquiring write lock&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exists&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pools&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;tenantID&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt; &lt;span class="n"&gt;exists&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c"&gt;// Create new connection&lt;/span&gt;
    &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;gorm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;postgres&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Sprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="s"&gt;"host=localhost user=dbuser password=dbpass dbname=tenant_%d sslmode=disable"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;tenantID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)),&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;gorm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Config&lt;/span&gt;&lt;span class="p"&gt;{})&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pools&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;tenantID&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Separate Database Pros&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Maximum isolation&lt;/strong&gt;: Complete data separation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Best security&lt;/strong&gt;: No risk of cross-tenant data access&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Full customization&lt;/strong&gt;: Each tenant can have completely different schemas&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Independent scaling&lt;/strong&gt;: Scale tenant databases individually&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Easy compliance&lt;/strong&gt;: Meets strictest data isolation requirements&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-tenant backups&lt;/strong&gt;: Simple, independent backup/restore&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No noisy neighbors&lt;/strong&gt;: Tenant workloads don't affect each other&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Separate Database Cons&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Highest cost&lt;/strong&gt;: Multiple database instances require more resources&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Operational complexity&lt;/strong&gt;: Managing many databases (backups, monitoring, migrations)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Connection limits&lt;/strong&gt;: Each database instance has connection limits&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-tenant analytics&lt;/strong&gt;: Requires data federation or ETL&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Migration complexity&lt;/strong&gt;: Must run migrations across all databases&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resource overhead&lt;/strong&gt;: More memory, CPU, and storage needed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Separate Database Best For&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Enterprise SaaS with high-value customers&lt;/li&gt;
&lt;li&gt;Strict compliance requirements (HIPAA, GDPR, SOC 2)&lt;/li&gt;
&lt;li&gt;When tenants need significant customization&lt;/li&gt;
&lt;li&gt;Low to medium tenant count (dozens to low hundreds)&lt;/li&gt;
&lt;li&gt;When tenants have very different data models&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Security Considerations
&lt;/h2&gt;

&lt;p&gt;Regardless of the pattern chosen, security is paramount:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Row-Level Security (RLS)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;PostgreSQL RLS automatically filters queries by tenant, providing a database-level security layer. This feature is particularly powerful for multi-tenant applications. For more details on PostgreSQL RLS, security policies, and other advanced PostgreSQL features, see our &lt;a href="https://www.glukhov.org/developer-tools/database-tools/postgresql-cheatsheet/" rel="noopener noreferrer"&gt;PostgreSQL Cheatsheet&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Enable RLS&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="n"&gt;ENABLE&lt;/span&gt; &lt;span class="k"&gt;ROW&lt;/span&gt; &lt;span class="k"&gt;LEVEL&lt;/span&gt; &lt;span class="k"&gt;SECURITY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Policy to isolate by tenant&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;POLICY&lt;/span&gt; &lt;span class="n"&gt;tenant_isolation&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
    &lt;span class="k"&gt;FOR&lt;/span&gt; &lt;span class="k"&gt;ALL&lt;/span&gt;
    &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;current_setting&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'app.current_tenant'&lt;/span&gt;&lt;span class="p"&gt;)::&lt;/span&gt;&lt;span class="nb"&gt;INTEGER&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Application sets tenant context&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;current_tenant&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'123'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. Application-Level Filtering&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Always filter by tenant_id in application code. The examples below use GORM, but different ORMs have their own approaches to query building. For guidance on choosing the right ORM for your multi-tenant application, check our &lt;a href="https://www.glukhov.org/app-architecture/data-access/comparing-go-orms-gorm-ent-bun-sqlc/" rel="noopener noreferrer"&gt;comparison of Go ORMs&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// ❌ BAD - Missing tenant filter&lt;/span&gt;
&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"email = ?"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;First&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;// ✅ GOOD - Always include tenant filter&lt;/span&gt;
&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"tenant_id = ? AND email = ?"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tenantID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;First&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;// ✅ BETTER - Use scopes or middleware&lt;/span&gt;
&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Scopes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TenantScope&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenantID&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"email = ?"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;First&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;3. Connection Pooling&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Use connection poolers that support tenant context:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// PgBouncer with transaction pooling&lt;/span&gt;
&lt;span class="c"&gt;// Or use application-level connection routing&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;4. Audit Logging&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Track all tenant data access:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;AuditLog&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;ID&lt;/span&gt;        &lt;span class="kt"&gt;uint&lt;/span&gt;
    &lt;span class="n"&gt;TenantID&lt;/span&gt;  &lt;span class="kt"&gt;uint&lt;/span&gt;
    &lt;span class="n"&gt;UserID&lt;/span&gt;    &lt;span class="kt"&gt;uint&lt;/span&gt;
    &lt;span class="n"&gt;Action&lt;/span&gt;    &lt;span class="kt"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;Table&lt;/span&gt;     &lt;span class="kt"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;RecordID&lt;/span&gt;  &lt;span class="kt"&gt;uint&lt;/span&gt;
    &lt;span class="n"&gt;Timestamp&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Time&lt;/span&gt;
    &lt;span class="n"&gt;IPAddress&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Performance Optimization
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Indexing Strategy&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Proper indexing is crucial for multi-tenant database performance. Understanding SQL indexing strategies, including composite indexes and partial indexes, is essential. For a comprehensive reference on SQL commands including CREATE INDEX and query optimization, see our &lt;a href="https://www.glukhov.org/developer-tools/database-tools/sql-cheatsheet/" rel="noopener noreferrer"&gt;SQL Cheatsheet&lt;/a&gt;. For PostgreSQL-specific indexing features and performance tuning, refer to our &lt;a href="https://www.glukhov.org/developer-tools/database-tools/postgresql-cheatsheet/" rel="noopener noreferrer"&gt;PostgreSQL Cheatsheet&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Composite indexes for tenant queries&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_orders_tenant_created&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_orders_tenant_status&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Partial indexes for common tenant-specific queries&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_orders_active_tenant&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'active'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Query Optimization&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// Use prepared statements for tenant queries&lt;/span&gt;
&lt;span class="n"&gt;stmt&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Prepare&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"SELECT * FROM users WHERE tenant_id = $1 AND email = $2"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;// Batch operations per tenant&lt;/span&gt;
&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"tenant_id = ?"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tenantID&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;// Use connection pooling per tenant (for separate database pattern)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Monitoring&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Effective database management tools are essential for monitoring multi-tenant applications. You'll need to track query performance, resource usage, and database health across all tenants. For comparing database management tools that can help with this, check out our &lt;a href="https://www.glukhov.org/developer-tools/database-tools/dbeaver-vs-beekeeper/" rel="noopener noreferrer"&gt;DBeaver vs Beekeeper comparison&lt;/a&gt;. Both tools offer excellent features for managing and monitoring PostgreSQL databases in multi-tenant environments.&lt;/p&gt;

&lt;p&gt;Monitor per-tenant metrics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Query performance per tenant&lt;/li&gt;
&lt;li&gt;Resource usage per tenant&lt;/li&gt;
&lt;li&gt;Connection counts per tenant&lt;/li&gt;
&lt;li&gt;Database size per tenant&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Migration Strategy
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Shared Schema Pattern&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When implementing database migrations, your choice of ORM affects how you handle schema changes. The examples below use GORM's AutoMigrate feature, but different ORMs have different migration strategies. For detailed information on how various Go ORMs handle migrations and schema management, see our &lt;a href="https://www.glukhov.org/app-architecture/data-access/comparing-go-orms-gorm-ent-bun-sqlc/" rel="noopener noreferrer"&gt;Go ORMs comparison&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// Migrations apply to all tenants automatically&lt;/span&gt;
&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;Migrate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;gorm&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DB&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AutoMigrate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;Order&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;Product&lt;/span&gt;&lt;span class="p"&gt;{})&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Separate Schema/Database Pattern&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// Migrations must run per tenant&lt;/span&gt;
&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;MigrateAllTenants&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenantIDs&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;uint&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tenantID&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;tenantIDs&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;db&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;GetTenantDB&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenantID&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AutoMigrate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;Order&lt;/span&gt;&lt;span class="p"&gt;{});&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Errorf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"tenant %d: %w"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tenantID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Decision Matrix
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Factor&lt;/th&gt;
&lt;th&gt;Shared Schema&lt;/th&gt;
&lt;th&gt;Separate Schema&lt;/th&gt;
&lt;th&gt;Separate DB&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Isolation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Scalability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Low-Medium&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Customization&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Operational Complexity&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Compliance&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;td&gt;Excellent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Best Tenant Count&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1000+&lt;/td&gt;
&lt;td&gt;10-1000&lt;/td&gt;
&lt;td&gt;1-100&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Hybrid Approach
&lt;/h2&gt;

&lt;p&gt;You can combine patterns for different tenant tiers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// Small tenants: Shared schema&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;tenant&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Tier&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;"standard"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;GetSharedDB&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ID&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c"&gt;// Enterprise tenants: Separate database&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;tenant&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Tier&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;"enterprise"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;GetTenantDB&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tenant&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ID&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Best Practices
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Always filter by tenant&lt;/strong&gt;: Never trust application code alone; use RLS when possible. Understanding SQL fundamentals helps ensure proper query construction—refer to our &lt;a href="https://www.glukhov.org/developer-tools/database-tools/sql-cheatsheet/" rel="noopener noreferrer"&gt;SQL Cheatsheet&lt;/a&gt; for query best practices.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitor tenant resource usage&lt;/strong&gt;: Identify and throttle noisy neighbors. Use database management tools like those compared in our &lt;a href="https://www.glukhov.org/developer-tools/database-tools/dbeaver-vs-beekeeper/" rel="noopener noreferrer"&gt;DBeaver vs Beekeeper guide&lt;/a&gt; to track performance metrics.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Implement tenant context middleware&lt;/strong&gt;: Centralize tenant extraction and validation. Your ORM choice affects how you implement this—see our &lt;a href="https://www.glukhov.org/app-architecture/data-access/comparing-go-orms-gorm-ent-bun-sqlc/" rel="noopener noreferrer"&gt;Go ORMs comparison&lt;/a&gt; for different approaches.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use connection pooling&lt;/strong&gt;: Efficiently manage database connections. PostgreSQL-specific connection pooling strategies are covered in our &lt;a href="https://www.glukhov.org/developer-tools/database-tools/postgresql-cheatsheet/" rel="noopener noreferrer"&gt;PostgreSQL Cheatsheet&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Plan for tenant migration&lt;/strong&gt;: Ability to move tenants between patterns&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Implement soft delete&lt;/strong&gt;: Use deleted_at instead of hard deletes for tenant data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audit everything&lt;/strong&gt;: Log all tenant data access for compliance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test isolation&lt;/strong&gt;: Regular security audits to prevent cross-tenant data leakage&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Choosing the right multi-tenancy database pattern depends on your specific requirements for isolation, cost, scalability, and operational complexity. The Shared Database, Shared Schema pattern works well for most SaaS applications, while Separate Database per Tenant is necessary for enterprise customers with strict compliance needs.&lt;/p&gt;

&lt;p&gt;Start with the simplest pattern that meets your requirements, and plan for migration to a more isolated pattern as your needs evolve. Always prioritize security and data isolation, regardless of the pattern chosen.&lt;/p&gt;

&lt;h2&gt;
  
  
  Useful Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.postgresql.org/docs/current/ddl-rowsecurity.html" rel="noopener noreferrer"&gt;PostgreSQL Row-Level Security Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/apn/apn-multi-tenant-saas-database-architecture/" rel="noopener noreferrer"&gt;Multi-Tenant SaaS Database Architecture&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.microsoft.com/en-us/azure/sql-database/saas-tenancy-app-design-patterns" rel="noopener noreferrer"&gt;Designing Multi-Tenant Databases&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.glukhov.org/app-architecture/data-access/comparing-go-orms-gorm-ent-bun-sqlc/" rel="noopener noreferrer"&gt;Comparing Go ORMs for PostgreSQL: GORM vs Ent vs Bun vs sqlc&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.glukhov.org/developer-tools/database-tools/postgresql-cheatsheet/" rel="noopener noreferrer"&gt;PostgreSQL Cheatsheet: A Developer’s Quick Reference&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.glukhov.org/developer-tools/database-tools/dbeaver-vs-beekeeper/" rel="noopener noreferrer"&gt;DBeaver vs Beekeeper - SQL Database Management Tools&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.glukhov.org/developer-tools/database-tools/sql-cheatsheet/" rel="noopener noreferrer"&gt;SQL Cheatsheet - most useful SQL commands&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>sql</category>
      <category>devops</category>
      <category>dev</category>
      <category>privacy</category>
    </item>
    <item>
      <title>Zettelkasten for Developers: A Practical Method That Works</title>
      <dc:creator>Rost</dc:creator>
      <pubDate>Mon, 25 May 2026 13:09:20 +0000</pubDate>
      <link>https://dev.to/rosgluk/zettelkasten-for-developers-a-practical-method-that-works-3ij</link>
      <guid>https://dev.to/rosgluk/zettelkasten-for-developers-a-practical-method-that-works-3ij</guid>
      <description>&lt;p&gt;Developers do not usually suffer from a lack of information. We suffer from too much of it.&lt;/p&gt;

&lt;p&gt;There are API docs, pull requests, production incidents, design discussions, meeting notes, architecture diagrams, code comments, Slack threads, research papers, experiments, bookmarks, and half-finished ideas sitting in five different tools. The hard part is not saving information. The hard part is turning it into reusable thinking.&lt;/p&gt;

&lt;p&gt;That is where Zettelkasten becomes useful.&lt;/p&gt;

&lt;p&gt;A Zettelkasten is often described as a note-taking system, but that undersells it. Used well, it is a &lt;a href="https://www.glukhov.org/knowledge-management/" rel="noopener noreferrer"&gt;personal knowledge system&lt;/a&gt; for developing ideas over time. For developers, it can become a practical bridge between code, architecture, debugging, learning, and writing.&lt;/p&gt;

&lt;p&gt;The opinionated part is this: most developers should not use Zettelkasten as a romantic productivity hobby. Do not build a beautiful note museum. Build a working system that helps you solve problems, explain systems, and make better engineering decisions.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is Zettelkasten?
&lt;/h2&gt;

&lt;p&gt;Zettelkasten means "slip box". The method is associated with sociologist Niklas Luhmann, who used a large collection of linked notes to develop ideas and write extensively.&lt;/p&gt;

&lt;p&gt;The important lesson is not that he used paper cards. The important lesson is that his notes were not isolated files. Each note had a clear idea, a place in the system, and links to other notes. Over time, the system became more valuable because connections accumulated.&lt;/p&gt;

&lt;p&gt;For developers, the modern version is simple:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Write one useful idea per note.&lt;/li&gt;
&lt;li&gt;Link it to related notes.&lt;/li&gt;
&lt;li&gt;Use those links to grow explanations, decisions, patterns, and articles.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That is it. The rest is implementation detail.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Developers Struggle With Knowledge Overload
&lt;/h2&gt;

&lt;p&gt;Software development creates knowledge that is both detailed and temporary.&lt;/p&gt;

&lt;p&gt;You learn why a cache invalidation bug happened. You discover a weird edge case in a framework. You compare two queueing strategies. You debug a production outage. You understand why a legacy service behaves strangely. You read a great article about distributed tracing.&lt;/p&gt;

&lt;p&gt;Then, two months later, you vaguely remember that you once knew the answer.&lt;/p&gt;

&lt;p&gt;The usual developer knowledge stack makes this worse:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bookmarks store sources, not understanding.&lt;/li&gt;
&lt;li&gt;Folders force early categorization.&lt;/li&gt;
&lt;li&gt;Wikis become stale when nobody owns them.&lt;/li&gt;
&lt;li&gt;TODO lists mix tasks with ideas.&lt;/li&gt;
&lt;li&gt;Code comments explain local details, not broader concepts.&lt;/li&gt;
&lt;li&gt;Chat messages disappear into history.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A Zettelkasten helps because it treats knowledge as a network, not a warehouse. If that framing sounds familiar from reading about &lt;a href="https://www.glukhov.org/knowledge-management/foundations/second-brain/" rel="noopener noreferrer"&gt;building a second brain&lt;/a&gt;, that is not a coincidence — both methods attack the same gap between capture and reuse, but Zettelkasten's discipline of atomic notes and explicit links gives developers a more granular handle on technical ideas.&lt;/p&gt;

&lt;h2&gt;
  
  
  Core Principles of Zettelkasten
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Atomic Notes
&lt;/h3&gt;

&lt;p&gt;An atomic note contains one idea.&lt;/p&gt;

&lt;p&gt;Not one topic. Not one article summary. Not one giant page called "PostgreSQL". One idea.&lt;/p&gt;

&lt;p&gt;For example, these are too broad:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;PostgreSQL notes
Kubernetes
Caching
System design
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;These are closer to atomic:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;text&lt;br&gt;
Partial indexes reduce write overhead when queries target a small subset&lt;br&gt;
Kubernetes readiness probes protect traffic routing, not container startup&lt;br&gt;
Write-through caching improves consistency but increases write latency&lt;br&gt;
Idempotency keys turn retries into safe operations&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Atomic notes are powerful because they are easier to link. A huge page can only be linked as a vague topic. A focused note can be connected to an exact concept, decision, bug, or system.&lt;/p&gt;

&lt;p&gt;A good developer note should usually answer one of these questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What is the idea?&lt;/li&gt;
&lt;li&gt;When does it matter?&lt;/li&gt;
&lt;li&gt;What tradeoff does it expose?&lt;/li&gt;
&lt;li&gt;Where have I seen it in real code?&lt;/li&gt;
&lt;li&gt;What other concept does it connect to?&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Linking
&lt;/h3&gt;

&lt;p&gt;Links are the heart of the system.&lt;/p&gt;

&lt;p&gt;The point is not to create a pretty graph. The point is to make ideas reusable.&lt;/p&gt;

&lt;p&gt;When you write a note about idempotency keys, link it to notes about retries, distributed systems, payment processing, message queues, API design, and incident prevention. When you write a note about database migrations, link it to deploy safety, rollback strategy, backward compatibility, and feature flags.&lt;/p&gt;

&lt;p&gt;A link should usually mean one of these things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"This explains the same concept from another angle."&lt;/li&gt;
&lt;li&gt;"This is a practical example of the idea."&lt;/li&gt;
&lt;li&gt;"This is a tradeoff or counterpoint."&lt;/li&gt;
&lt;li&gt;"This concept depends on that concept."&lt;/li&gt;
&lt;li&gt;"This note belongs in a larger argument."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Avoid lazy links. Linking every note to every other note creates noise. The best links are intentional.&lt;/p&gt;

&lt;h3&gt;
  
  
  Emergence
&lt;/h3&gt;

&lt;p&gt;Emergence is the part of Zettelkasten that sounds mystical, but it is practical.&lt;/p&gt;

&lt;p&gt;You do not need to design the perfect structure upfront. You add useful notes, connect them honestly, and let clusters appear over time.&lt;/p&gt;

&lt;p&gt;After a few months, you may notice that many notes connect around topics like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;API reliability&lt;/li&gt;
&lt;li&gt;Observability&lt;/li&gt;
&lt;li&gt;Developer experience&lt;/li&gt;
&lt;li&gt;Event-driven architecture&lt;/li&gt;
&lt;li&gt;Database performance&lt;/li&gt;
&lt;li&gt;Technical debt&lt;/li&gt;
&lt;li&gt;Documentation&lt;/li&gt;
&lt;li&gt;Security reviews&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those clusters become future articles, internal docs, design principles, conference talks, onboarding material, or better engineering decisions.&lt;/p&gt;

&lt;p&gt;This is why Zettelkasten is different from a folder hierarchy. Folders ask you to decide where knowledge belongs before you fully understand it. Links let knowledge belong to multiple contexts.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Developer Adaptation Of Zettelkasten
&lt;/h2&gt;

&lt;p&gt;Classic Zettelkasten advice often comes from academic writing — the &lt;a href="https://www.glukhov.org/knowledge-management/foundations/personal-knowledge-management/" rel="noopener noreferrer"&gt;personal knowledge management&lt;/a&gt; literature covers that tradition well. Developers need a slightly different version.&lt;/p&gt;

&lt;p&gt;A developer Zettelkasten should connect three things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Concepts&lt;/li&gt;
&lt;li&gt;Code&lt;/li&gt;
&lt;li&gt;Systems&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Concepts
&lt;/h3&gt;

&lt;p&gt;Concept notes explain reusable ideas.&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;text&lt;br&gt;
Backpressure prevents fast producers from overwhelming slow consumers&lt;br&gt;
Optimistic locking detects conflicting writes without blocking readers&lt;br&gt;
Circuit breakers protect dependencies from repeated failing calls&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;These notes should be written in your own words. Copying documentation is not enough. The value comes from forcing yourself to explain the concept clearly.&lt;/p&gt;

&lt;p&gt;A useful concept note can include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A short explanation&lt;/li&gt;
&lt;li&gt;A concrete example&lt;/li&gt;
&lt;li&gt;A tradeoff&lt;/li&gt;
&lt;li&gt;A link to a related pattern&lt;/li&gt;
&lt;li&gt;A link to a real system where you used it&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Code
&lt;/h3&gt;

&lt;p&gt;Code notes capture practical implementation knowledge.&lt;/p&gt;

&lt;p&gt;They are not random snippet dumps. A snippet is useful only when it explains a decision or pattern.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;`markdown&lt;/p&gt;

&lt;h2&gt;
  
  
  Idempotent request handling with a database constraint
&lt;/h2&gt;

&lt;p&gt;The safest implementation is often a unique constraint on the idempotency key.&lt;br&gt;
The application can retry safely because duplicate requests resolve to the same&lt;br&gt;
stored result instead of creating a second side effect.&lt;/p&gt;

&lt;p&gt;Related:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[[Retries need idempotent operations]]&lt;/li&gt;
&lt;li&gt;[[Database constraints are concurrency control]]&lt;/li&gt;
&lt;li&gt;[[Payment APIs should treat network failure as unknown outcome]]
`&lt;code&gt;&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Good code notes explain why the code works, when to use it, and what can go wrong.&lt;/p&gt;

&lt;h3&gt;
  
  
  Systems
&lt;/h3&gt;

&lt;p&gt;System notes connect abstract ideas to your actual architecture.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;text&lt;br&gt;
The billing service uses idempotency keys because payment provider calls may&lt;br&gt;
succeed even when our HTTP client times out.&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;This note can link to:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;text&lt;br&gt;
Idempotency keys turn retries into safe operations&lt;br&gt;
Timeouts do not prove failure&lt;br&gt;
Payment APIs should model unknown outcomes&lt;br&gt;
Outbox pattern separates database writes from external side effects&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;This is where Zettelkasten becomes valuable for senior engineering work. It helps you build a memory of why systems are shaped the way they are.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Practical Workflow
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1: Capture Fleeting Notes
&lt;/h3&gt;

&lt;p&gt;A fleeting note is a rough capture. It does not need to be polished.&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;text&lt;br&gt;
Look into why readiness probe failed during deploy.&lt;br&gt;
Maybe retries made the duplicate invoice bug worse.&lt;br&gt;
Good quote from incident review: timeout is not failure.&lt;br&gt;
Research: Postgres partial index for active rows only.&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Use whatever is fastest: Obsidian daily note, Logseq journal, a text file, mobile notes, or a scratch buffer.&lt;/p&gt;

&lt;p&gt;The rule is simple: capture quickly, process later.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Process Notes Into Permanent Notes
&lt;/h3&gt;

&lt;p&gt;Processing is where the value appears.&lt;/p&gt;

&lt;p&gt;Turn rough notes into clear, reusable notes. Rewrite in your own words. Give each note a title that states the idea.&lt;/p&gt;

&lt;p&gt;Bad title:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;text&lt;br&gt;
Retries&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Better title:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;text&lt;br&gt;
Retries are safe only when the operation is idempotent&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Bad note:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;text&lt;br&gt;
Need idempotency for retries.&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Better note:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;text&lt;br&gt;
Retries can turn a temporary network problem into duplicate side effects.&lt;br&gt;
A retry is safe only when the operation can run more than once and still&lt;br&gt;
produce the same business result. For APIs, this often requires an&lt;br&gt;
idempotency key, a unique constraint, or a stored request result.&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Add Links While The Context Is Fresh
&lt;/h3&gt;

&lt;p&gt;After writing the note, ask:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What does this explain?&lt;/li&gt;
&lt;li&gt;What does this depend on?&lt;/li&gt;
&lt;li&gt;Where have I seen this in code?&lt;/li&gt;
&lt;li&gt;What is the opposite view?&lt;/li&gt;
&lt;li&gt;What system would benefit from this?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Add only the links that help future you think.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Create Index Notes Or Maps Of Content
&lt;/h3&gt;

&lt;p&gt;Once a cluster grows, create an index note.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;`markdown&lt;/p&gt;

&lt;h1&gt;
  
  
  API Reliability
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Core ideas
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;[[Retries are safe only when the operation is idempotent]]&lt;/li&gt;
&lt;li&gt;[[Timeouts do not prove failure]]&lt;/li&gt;
&lt;li&gt;[[Circuit breakers reduce pressure on failing dependencies]]&lt;/li&gt;
&lt;li&gt;[[Rate limits protect shared resources]]&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Implementation patterns
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;[[Idempotency keys turn retries into safe operations]]&lt;/li&gt;
&lt;li&gt;[[Outbox pattern separates persistence from delivery]]&lt;/li&gt;
&lt;li&gt;[[Dead letter queues preserve failed messages for inspection]]&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  System examples
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;[[Billing service payment retry design]]&lt;/li&gt;
&lt;li&gt;[[Webhook delivery failure handling]]
`&lt;code&gt;&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This gives you navigation without forcing everything into folders.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5: Use Notes To Produce Output
&lt;/h3&gt;

&lt;p&gt;A Zettelkasten should produce something.&lt;/p&gt;

&lt;p&gt;For developers, output can be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Architecture decision records&lt;/li&gt;
&lt;li&gt;Design documents&lt;/li&gt;
&lt;li&gt;Blog posts&lt;/li&gt;
&lt;li&gt;Debugging guides&lt;/li&gt;
&lt;li&gt;Onboarding docs&lt;/li&gt;
&lt;li&gt;Pull request explanations&lt;/li&gt;
&lt;li&gt;Internal talks&lt;/li&gt;
&lt;li&gt;Refactoring plans&lt;/li&gt;
&lt;li&gt;Incident review insights&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your notes never influence your work, the system is too decorative.&lt;/p&gt;

&lt;h2&gt;
  
  
  Recommended Note Types For Developers
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Fleeting Notes
&lt;/h3&gt;

&lt;p&gt;Temporary notes for quick capture.&lt;/p&gt;

&lt;p&gt;Use them for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ideas during coding&lt;/li&gt;
&lt;li&gt;Debugging observations&lt;/li&gt;
&lt;li&gt;Meeting fragments&lt;/li&gt;
&lt;li&gt;Questions&lt;/li&gt;
&lt;li&gt;Bookmarks to process later&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Delete or convert them quickly. Do not let them become a swamp.&lt;/p&gt;

&lt;h3&gt;
  
  
  Literature Notes
&lt;/h3&gt;

&lt;p&gt;Notes about external sources.&lt;/p&gt;

&lt;p&gt;For developers, a source can be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Documentation&lt;/li&gt;
&lt;li&gt;Blog article&lt;/li&gt;
&lt;li&gt;RFC&lt;/li&gt;
&lt;li&gt;Source code&lt;/li&gt;
&lt;li&gt;Conference talk&lt;/li&gt;
&lt;li&gt;GitHub issue&lt;/li&gt;
&lt;li&gt;Postmortem&lt;/li&gt;
&lt;li&gt;Book chapter&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Keep source notes separate from your own permanent notes. A source note says, "This source said this." A permanent note says, "I understand this idea this way."&lt;/p&gt;

&lt;h3&gt;
  
  
  Permanent Notes
&lt;/h3&gt;

&lt;p&gt;These are the core of the Zettelkasten.&lt;/p&gt;

&lt;p&gt;A permanent note should be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Atomic&lt;/li&gt;
&lt;li&gt;Written in your own words&lt;/li&gt;
&lt;li&gt;Linked to related notes&lt;/li&gt;
&lt;li&gt;Useful without needing the original source&lt;/li&gt;
&lt;li&gt;Stable enough to revisit later&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Project Notes
&lt;/h3&gt;

&lt;p&gt;Project notes are allowed, but do not confuse them with permanent notes.&lt;/p&gt;

&lt;p&gt;A project note might be:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;text&lt;br&gt;
Migrate billing worker to queue v2&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;It can link to permanent notes like:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;text&lt;br&gt;
Backpressure prevents queue consumers from collapsing&lt;br&gt;
Outbox pattern separates persistence from delivery&lt;br&gt;
Feature flags reduce deployment risk&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Projects end. Concepts stay.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tool Examples
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Obsidian
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://www.glukhov.org/knowledge-management/tools/obsidian-for-personal-knowledge-management/" rel="noopener noreferrer"&gt;Obsidian&lt;/a&gt; works well for developer Zettelkasten because it uses local Markdown files and supports internal links.&lt;/p&gt;

&lt;p&gt;A simple Obsidian structure:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;text&lt;br&gt;
notes/&lt;br&gt;
  fleeting/&lt;br&gt;
  sources/&lt;br&gt;
  permanent/&lt;br&gt;
  maps/&lt;br&gt;
  projects/&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Example note:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;`markdown&lt;/p&gt;

&lt;h1&gt;
  
  
  Timeouts do not prove failure
&lt;/h1&gt;

&lt;p&gt;A timeout means the client stopped waiting. It does not prove the server failed.&lt;br&gt;
The operation may have succeeded, failed, or still be running.&lt;/p&gt;

&lt;p&gt;This matters for payment APIs, job queues, and any external side effect.&lt;/p&gt;

&lt;p&gt;Related:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[[Retries are safe only when the operation is idempotent]]&lt;/li&gt;
&lt;li&gt;[[Idempotency keys turn retries into safe operations]]&lt;/li&gt;
&lt;li&gt;[[External side effects need reconciliation]]
`&lt;code&gt;&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Obsidian is a good fit if you like file ownership, plain text, and editor-like workflows.&lt;/p&gt;

&lt;h3&gt;
  
  
  Logseq
&lt;/h3&gt;

&lt;p&gt;Logseq is useful if you prefer outlining, daily journals, and block-level references.&lt;/p&gt;

&lt;p&gt;Its block model works well for capturing small units of thought. You can write rough notes in the journal, then promote useful blocks into permanent notes.&lt;/p&gt;

&lt;p&gt;Example Logseq-style workflow:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;`text&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Timeout during payment request does not prove payment failure.

&lt;ul&gt;
&lt;li&gt;This should become a permanent note about unknown outcomes.&lt;/li&gt;
&lt;li&gt;Related: [[Idempotency]], [[Retries]], [[Payment APIs]]
`&lt;code&gt;&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Logseq is a good fit if your thinking starts as outlines and you like block references. For a side-by-side comparison of both tools across workflow style, sync options, and plugin ecosystems, &lt;a href="https://www.glukhov.org/knowledge-management/tools/obsidian-vs-logseq-comparison/" rel="noopener noreferrer"&gt;Obsidian vs Logseq&lt;/a&gt; maps the trade-offs clearly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Plain Markdown And Git
&lt;/h3&gt;

&lt;p&gt;You do not need a special app.&lt;/p&gt;

&lt;p&gt;A Git repository of Markdown files can be enough:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;text&lt;br&gt;
knowledge/&lt;br&gt;
  permanent/&lt;br&gt;
  sources/&lt;br&gt;
  maps/&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Use normal Markdown links:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;markdown&lt;br&gt;
[Retries are safe only when operations are idempotent](../permanent/retries-safe-only-with-idempotency.md)&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;This approach is boring, durable, and developer-friendly. That is a compliment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Naming Notes
&lt;/h2&gt;

&lt;p&gt;Prefer titles that make claims.&lt;/p&gt;

&lt;p&gt;Weak titles:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;text&lt;br&gt;
Caching&lt;br&gt;
Queues&lt;br&gt;
OAuth&lt;br&gt;
PostgreSQL indexes&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Strong titles:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;text&lt;br&gt;
Cache invalidation is a coordination problem&lt;br&gt;
Queues hide latency but do not remove work&lt;br&gt;
OAuth access tokens should be short lived&lt;br&gt;
Partial indexes are useful when queries target a subset&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;A claim-based title makes the note easier to understand and easier to link.&lt;/p&gt;

&lt;h2&gt;
  
  
  What To Put In A Developer Zettelkasten
&lt;/h2&gt;

&lt;p&gt;Good candidates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Architecture principles&lt;/li&gt;
&lt;li&gt;Debugging lessons&lt;/li&gt;
&lt;li&gt;Production incident insights&lt;/li&gt;
&lt;li&gt;API design rules&lt;/li&gt;
&lt;li&gt;Database patterns&lt;/li&gt;
&lt;li&gt;Security assumptions&lt;/li&gt;
&lt;li&gt;Performance tradeoffs&lt;/li&gt;
&lt;li&gt;Framework edge cases&lt;/li&gt;
&lt;li&gt;Refactoring heuristics&lt;/li&gt;
&lt;li&gt;Testing strategies&lt;/li&gt;
&lt;li&gt;Deployment lessons&lt;/li&gt;
&lt;li&gt;Code review patterns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Poor candidates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Raw meeting transcripts&lt;/li&gt;
&lt;li&gt;Unprocessed bookmarks&lt;/li&gt;
&lt;li&gt;Huge copied documentation pages&lt;/li&gt;
&lt;li&gt;Random snippets with no explanation&lt;/li&gt;
&lt;li&gt;Task lists&lt;/li&gt;
&lt;li&gt;Secrets&lt;/li&gt;
&lt;li&gt;Credentials&lt;/li&gt;
&lt;li&gt;Anything that belongs in official company documentation only&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A personal Zettelkasten can reference work, but it should not become an unsafe shadow copy of private systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common Mistakes
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Mistake 1: Over-Structuring Too Early
&lt;/h3&gt;

&lt;p&gt;Developers love structure. That is sometimes a problem.&lt;/p&gt;

&lt;p&gt;Do not spend the first week designing folders, tags, templates, naming conventions, dashboards, and automation. You do not yet know what structure your notes need.&lt;/p&gt;

&lt;p&gt;Start with a small number of note types:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;text&lt;br&gt;
fleeting&lt;br&gt;
sources&lt;br&gt;
permanent&lt;br&gt;
maps&lt;br&gt;
projects&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Let complexity earn its place.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mistake 2: Treating It Like Folders
&lt;/h3&gt;

&lt;p&gt;A Zettelkasten is not a better folder tree.&lt;/p&gt;

&lt;p&gt;If every note belongs to exactly one folder and has no meaningful links, you have built a filing cabinet. That may still be useful, but it is not Zettelkasten.&lt;/p&gt;

&lt;p&gt;The value comes from connections:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;text&lt;br&gt;
API retries -&amp;gt; idempotency -&amp;gt; database constraints -&amp;gt; payment safety -&amp;gt; incident prevention&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;That chain is more useful than a folder called "Backend".&lt;/p&gt;

&lt;h3&gt;
  
  
  Mistake 3: Saving Instead Of Thinking
&lt;/h3&gt;

&lt;p&gt;Copying is not learning.&lt;/p&gt;

&lt;p&gt;A saved paragraph from documentation may help later, but a rewritten explanation helps now. The act of restating an idea in your own words is where understanding improves.&lt;/p&gt;

&lt;p&gt;A good rule:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;text&lt;br&gt;
Do not create a permanent note until you can explain the idea without copying.&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Mistake 4: Linking Everything
&lt;/h3&gt;

&lt;p&gt;Too many links are as bad as too few.&lt;/p&gt;

&lt;p&gt;Do not link words just because they exist. Link ideas because the relationship matters.&lt;/p&gt;

&lt;p&gt;A useful link should help future you answer:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;text&lt;br&gt;
Why is this connected?&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Mistake 5: Confusing Tags With Structure
&lt;/h3&gt;

&lt;p&gt;Tags are useful for status and broad grouping:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;`text&lt;/p&gt;

&lt;h1&gt;
  
  
  todo
&lt;/h1&gt;

&lt;h1&gt;
  
  
  source
&lt;/h1&gt;

&lt;h1&gt;
  
  
  security
&lt;/h1&gt;

&lt;h1&gt;
  
  
  draft
&lt;/h1&gt;

&lt;p&gt;`&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;But tags should not carry the whole system. If you rely only on tags, you lose the richer meaning of direct links.&lt;/p&gt;

&lt;p&gt;A link says:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;text&lt;br&gt;
This idea relates to that idea in a specific way.&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;A tag usually says:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;text&lt;br&gt;
This belongs to a broad bucket.&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Both are useful. They are not the same.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mistake 6: Never Producing Output
&lt;/h3&gt;

&lt;p&gt;A Zettelkasten that never produces output becomes a private archive.&lt;/p&gt;

&lt;p&gt;Output does not have to mean public writing. It can be a design doc, an incident review, a better pull request, or a clear explanation to a teammate.&lt;/p&gt;

&lt;p&gt;The system should make your thinking easier to reuse.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Minimal Template
&lt;/h2&gt;

&lt;p&gt;Use a small template. Resist the urge to create a form with fifteen fields.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;`markdown&lt;/p&gt;

&lt;h1&gt;
  
  
  Title as a claim
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Idea
&lt;/h2&gt;

&lt;p&gt;Explain the idea in your own words.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why it matters
&lt;/h2&gt;

&lt;p&gt;Describe the practical impact.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example
&lt;/h2&gt;

&lt;p&gt;Show a code, system, or debugging example.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tradeoffs
&lt;/h2&gt;

&lt;p&gt;Mention limits, risks, or counterpoints.&lt;/p&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;[[Related note]]&lt;/li&gt;
&lt;li&gt;[[Another related note]]
`&lt;code&gt;&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For many notes, even this is too much. A title, a paragraph, and three links can be enough.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example: From Bug To Zettelkasten Notes
&lt;/h2&gt;

&lt;p&gt;Imagine you fixed a bug where users were charged twice after a timeout.&lt;/p&gt;

&lt;p&gt;A weak note would be:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;text&lt;br&gt;
Payment bug - retries caused duplicate charge.&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;A stronger set of notes might be:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;text&lt;br&gt;
Timeouts do not prove failure&lt;br&gt;
Retries are safe only when the operation is idempotent&lt;br&gt;
Idempotency keys turn retries into safe operations&lt;br&gt;
Payment APIs should model unknown outcomes&lt;br&gt;
Database constraints are concurrency control&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Now the bug has become reusable engineering knowledge.&lt;/p&gt;

&lt;p&gt;Later, those notes can support:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A postmortem&lt;/li&gt;
&lt;li&gt;A design doc for payment retries&lt;/li&gt;
&lt;li&gt;A blog post about idempotency&lt;/li&gt;
&lt;li&gt;A checklist for external API integrations&lt;/li&gt;
&lt;li&gt;A code review comment&lt;/li&gt;
&lt;li&gt;A safer implementation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the practical value of Zettelkasten.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Weekly Maintenance Routine
&lt;/h2&gt;

&lt;p&gt;You do not need a complicated review process.&lt;/p&gt;

&lt;p&gt;Once a week:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Process rough notes.&lt;/li&gt;
&lt;li&gt;Delete notes that no longer matter.&lt;/li&gt;
&lt;li&gt;Convert useful ideas into permanent notes.&lt;/li&gt;
&lt;li&gt;Add missing links.&lt;/li&gt;
&lt;li&gt;Promote clusters into map notes.&lt;/li&gt;
&lt;li&gt;Pick one note and turn it into output.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Keep it lightweight. The system should support development, not compete with it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical Rules
&lt;/h2&gt;

&lt;p&gt;Use these rules to keep the system healthy:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One idea per note.&lt;/li&gt;
&lt;li&gt;Write titles as claims.&lt;/li&gt;
&lt;li&gt;Prefer links over folders.&lt;/li&gt;
&lt;li&gt;Keep source notes separate from your own ideas.&lt;/li&gt;
&lt;li&gt;Connect notes to real code and real systems.&lt;/li&gt;
&lt;li&gt;Create map notes only when a cluster exists.&lt;/li&gt;
&lt;li&gt;Delete low-value notes.&lt;/li&gt;
&lt;li&gt;Do not automate before you understand your workflow.&lt;/li&gt;
&lt;li&gt;Use the system to produce something.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  When Zettelkasten Is Not Worth It
&lt;/h2&gt;

&lt;p&gt;Zettelkasten is not the answer to every problem.&lt;/p&gt;

&lt;p&gt;It may be overkill if:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You only need a task manager.&lt;/li&gt;
&lt;li&gt;You rarely revisit technical ideas.&lt;/li&gt;
&lt;li&gt;You do not write, teach, design, or document.&lt;/li&gt;
&lt;li&gt;Your notes are mostly short-lived project details.&lt;/li&gt;
&lt;li&gt;You are using it to avoid doing the actual work.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It is most useful when your work depends on compounding understanding.&lt;/p&gt;

&lt;p&gt;That includes senior engineering, architecture, technical leadership, debugging complex systems, writing, consulting, research, and learning deeply over many years.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;For developers, Zettelkasten is not about collecting notes. It is about building a thinking environment.&lt;/p&gt;

&lt;p&gt;The method works best when it stays practical: atomic notes, meaningful links, real examples, and regular output. Connect concepts to code. Connect code to systems. Connect systems to decisions.&lt;/p&gt;

&lt;p&gt;Do not try to build the perfect second brain. Build a useful one.&lt;/p&gt;

&lt;p&gt;A good developer Zettelkasten should help you answer better questions:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;text&lt;br&gt;
Where have I seen this problem before?&lt;br&gt;
What concept explains this bug?&lt;br&gt;
What tradeoff are we making?&lt;br&gt;
What pattern applies here?&lt;br&gt;
What should I write down so I do not relearn this again?&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;That is enough.&lt;/p&gt;

</description>
      <category>obsidian</category>
      <category>logseq</category>
      <category>knowledgemanagement</category>
    </item>
    <item>
      <title>OpenClaw vs Hermes Agent: Stars, Downloads &amp; Usage 2026</title>
      <dc:creator>Rost</dc:creator>
      <pubDate>Mon, 25 May 2026 13:09:06 +0000</pubDate>
      <link>https://dev.to/rosgluk/openclaw-vs-hermes-agent-stars-downloads-usage-2026-b07</link>
      <guid>https://dev.to/rosgluk/openclaw-vs-hermes-agent-stars-downloads-usage-2026-b07</guid>
      <description>&lt;p&gt;Open-source AI agent frameworks are exploding in popularity on GitHub.&lt;br&gt;
Two projects at the core of the &lt;a href="https://www.glukhov.org/ai-systems/" rel="noopener noreferrer"&gt;self-hosted AI systems&lt;/a&gt; ecosystem — &lt;strong&gt;OpenClaw&lt;/strong&gt; and &lt;strong&gt;Hermes Agent&lt;/strong&gt; — have pulled so far ahead that the rest of the field is fighting for a distant third place.&lt;/p&gt;

&lt;p&gt;Here is the full picture as of May 2026.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Leaderboard
&lt;/h2&gt;

&lt;p&gt;Star counts are live data fetched from the GitHub API on &lt;strong&gt;May 21, 2026&lt;/strong&gt;. Repos are sorted by current stars, descending.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Rank&lt;/th&gt;
&lt;th&gt;Project&lt;/th&gt;
&lt;th&gt;GitHub repo&lt;/th&gt;
&lt;th&gt;Language&lt;/th&gt;
&lt;th&gt;Stars&lt;/th&gt;
&lt;th&gt;Releases last 30 days&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;OpenClaw&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/openclaw/openclaw" rel="noopener noreferrer"&gt;&lt;code&gt;openclaw/openclaw&lt;/code&gt;&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;TypeScript&lt;/td&gt;
&lt;td&gt;373,616&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;62&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Hermes Agent&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/NousResearch/hermes-agent" rel="noopener noreferrer"&gt;&lt;code&gt;NousResearch/hermes-agent&lt;/code&gt;&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Python&lt;/td&gt;
&lt;td&gt;160,175&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;5&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Nanobot&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/HKUDS/nanobot" rel="noopener noreferrer"&gt;&lt;code&gt;HKUDS/nanobot&lt;/code&gt;&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Python&lt;/td&gt;
&lt;td&gt;42,873&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;AstrBot&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/AstrBotDevs/AstrBot" rel="noopener noreferrer"&gt;&lt;code&gt;AstrBotDevs/AstrBot&lt;/code&gt;&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Python&lt;/td&gt;
&lt;td&gt;32,709&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;11&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;ZeroClaw&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/zeroclaw-labs/zeroclaw" rel="noopener noreferrer"&gt;&lt;code&gt;zeroclaw-labs/zeroclaw&lt;/code&gt;&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Rust&lt;/td&gt;
&lt;td&gt;31,500&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;≥1&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;NanoClaw&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/nanocoai/nanoclaw" rel="noopener noreferrer"&gt;&lt;code&gt;nanocoai/nanoclaw&lt;/code&gt;&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;TypeScript&lt;/td&gt;
&lt;td&gt;29,143&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;≥1&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;PicoClaw&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/sipeed/picoclaw" rel="noopener noreferrer"&gt;&lt;code&gt;sipeed/picoclaw&lt;/code&gt;&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Go&lt;/td&gt;
&lt;td&gt;29,121&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;AionUi&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/iOfficeAI/AionUi" rel="noopener noreferrer"&gt;&lt;code&gt;iOfficeAI/AionUi&lt;/code&gt;&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;TypeScript&lt;/td&gt;
&lt;td&gt;26,025&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;≥3&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;NemoClaw&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/NVIDIA/NemoClaw" rel="noopener noreferrer"&gt;&lt;code&gt;NVIDIA/NemoClaw&lt;/code&gt;&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;TypeScript&lt;/td&gt;
&lt;td&gt;20,571&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;OpenFang&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/RightNow-AI/openfang" rel="noopener noreferrer"&gt;&lt;code&gt;RightNow-AI/openfang&lt;/code&gt;&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Rust&lt;/td&gt;
&lt;td&gt;17,599&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;≥5&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;LangBot&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/langbot-app/LangBot" rel="noopener noreferrer"&gt;&lt;code&gt;langbot-app/LangBot&lt;/code&gt;&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Python&lt;/td&gt;
&lt;td&gt;16,084&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;memU&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/NevaMind-AI/memU" rel="noopener noreferrer"&gt;&lt;code&gt;NevaMind-AI/memU&lt;/code&gt;&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Python&lt;/td&gt;
&lt;td&gt;13,672&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;13&lt;/td&gt;
&lt;td&gt;IronClaw&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/nearai/ironclaw" rel="noopener noreferrer"&gt;&lt;code&gt;nearai/ironclaw&lt;/code&gt;&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Rust&lt;/td&gt;
&lt;td&gt;12,305&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;4&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;Moltworker&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/cloudflare/moltworker" rel="noopener noreferrer"&gt;&lt;code&gt;cloudflare/moltworker&lt;/code&gt;&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;TypeScript&lt;/td&gt;
&lt;td&gt;9,899&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;MemOS&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/MemTensor/MemOS" rel="noopener noreferrer"&gt;&lt;code&gt;MemTensor/MemOS&lt;/code&gt;&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Python&lt;/td&gt;
&lt;td&gt;9,246&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;≥2&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;ClawWork&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/HKUDS/ClawWork" rel="noopener noreferrer"&gt;&lt;code&gt;HKUDS/ClawWork&lt;/code&gt;&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Python&lt;/td&gt;
&lt;td&gt;8,111&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;17&lt;/td&gt;
&lt;td&gt;NullClaw&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/nullclaw/nullclaw" rel="noopener noreferrer"&gt;&lt;code&gt;nullclaw/nullclaw&lt;/code&gt;&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Zig&lt;/td&gt;
&lt;td&gt;7,603&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;MimicLaw&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/memovai/mimiclaw" rel="noopener noreferrer"&gt;&lt;code&gt;memovai/mimiclaw&lt;/code&gt;&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;C&lt;/td&gt;
&lt;td&gt;5,422&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;19&lt;/td&gt;
&lt;td&gt;Moltis&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/moltis-org/moltis" rel="noopener noreferrer"&gt;&lt;code&gt;moltis-org/moltis&lt;/code&gt;&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Rust&lt;/td&gt;
&lt;td&gt;2,697&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;≥3&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;Clawra&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/SumeLabs/clawra" rel="noopener noreferrer"&gt;&lt;code&gt;SumeLabs/clawra&lt;/code&gt;&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;TypeScript&lt;/td&gt;
&lt;td&gt;2,298&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  OpenClaw: 373k Stars and Still Growing
&lt;/h2&gt;

&lt;p&gt;OpenClaw is a personal AI assistant framework built in TypeScript. It runs entirely on the user's own device and connects to over 50 messaging platforms — WhatsApp, Telegram, Slack, Discord, and more — through a single unified interface.&lt;/p&gt;

&lt;p&gt;The project launched in November 2025 but truly ignited on &lt;strong&gt;January 30, 2026&lt;/strong&gt;, reaching 100,000 stars within 48 hours of its relaunch. By April 2026 it had overtaken React to become the most-starred software repository in GitHub's history. At the time of this writing it sits at &lt;strong&gt;373,616 stars&lt;/strong&gt;, 72,000+ forks, and 360 contributors.&lt;/p&gt;

&lt;p&gt;The release cadence is extraordinary: &lt;strong&gt;62 tagged releases in the last 30 days&lt;/strong&gt; puts it in a category of its own in terms of iteration speed. The full arc of how OpenClaw grew from a weekend prototype to GitHub's most-starred repository — including the economics behind the viral spike and the April 2026 subscription cutoff that reshaped weekly growth — is detailed in the &lt;a href="https://www.glukhov.org/ai-systems/openclaw/openclaw-rise-and-fall-timeline/" rel="noopener noreferrer"&gt;OpenClaw rise and fall timeline&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hermes Agent: The Challenger
&lt;/h2&gt;

&lt;p&gt;Nous Research's Hermes Agent markets itself as "the agent that grows with you." It is a self-improving AI agent built in Python with a built-in learning loop — it creates new skills from experience, searches past conversations for relevant context, and can run on a range of infrastructure options from local hardware to cloud.&lt;/p&gt;

&lt;p&gt;Created in July 2025 and now at &lt;strong&gt;160,175 stars&lt;/strong&gt;, Hermes Agent recently surpassed OpenClaw as the world's most-used open-source AI agent &lt;strong&gt;by daily token processing&lt;/strong&gt; on OpenRouter — though OpenClaw still leads in cumulative all-time usage. The gap between the two in raw GitHub stars remains large (over 200k), but Hermes Agent's trajectory is notably steeper.&lt;/p&gt;

&lt;h2&gt;
  
  
  Mid-field: The 20k–45k Band
&lt;/h2&gt;

&lt;p&gt;The third through eighth positions are all clustered between 26k and 43k stars, making ranking changes here frequent:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Nanobot&lt;/strong&gt; (HKUDS, 42,873 ⭐) — Python, lightweight graph-based task orchestration from the HKU Data Science lab.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AstrBot&lt;/strong&gt; (AstrBotDevs, 32,709 ⭐) — Python, multi-platform chatbot framework with active release history (11 releases in the last 30 days).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ZeroClaw&lt;/strong&gt; (zeroclaw-labs, 31,500 ⭐) — Rust, systems-level agent runtime targeting low-latency deployments.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;NanoClaw&lt;/strong&gt; (nanocoai, 29,143 ⭐) — TypeScript, recently migrated from &lt;code&gt;qwibitai/nanoclaw&lt;/code&gt; to &lt;code&gt;nanocoai/nanoclaw&lt;/code&gt;; the rename caused a brief star-count gap in trackers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PicoClaw&lt;/strong&gt; (Sipeed, 29,121 ⭐) — Go, embedded-friendly agent framework. Only 22 stars separate it from NanoClaw.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AionUi&lt;/strong&gt; (iOfficeAI, 26,025 ⭐) — TypeScript, focuses on agentic UI generation with a visual workflow editor.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Language Breakdown
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Language&lt;/th&gt;
&lt;th&gt;Repos in top 20&lt;/th&gt;
&lt;th&gt;Total stars&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Python&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;294,897&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TypeScript&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;470,155&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rust&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;51,502&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Go&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;29,121&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Zig&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;7,603&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;C&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;5,422&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;TypeScript leads in total star weight — largely because of OpenClaw itself — while Python holds the most individual projects. Rust is carving out a niche in the performance-sensitive tier (ZeroClaw, OpenFang, IronClaw).&lt;/p&gt;

&lt;h2&gt;
  
  
  Release Velocity vs Star Count
&lt;/h2&gt;

&lt;p&gt;High star counts do not always mean high release velocity. Several top-starred repos (NemoClaw, memU, ClawWork, Clawra, MimicLaw) show zero releases in the last 30 days — they may be in maintenance mode or experiencing slower development cycles.&lt;/p&gt;

&lt;p&gt;AstrBot stands out in the mid-field with 11 releases in 30 days, suggesting active feature development. OpenFang (≥5) and Moltis (≥3) are also moving quickly relative to their star counts, which may signal emerging momentum.&lt;/p&gt;

&lt;h2&gt;
  
  
  Notable Moves Since Last Snapshot
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;NanoClaw&lt;/strong&gt; renamed org from &lt;code&gt;qwibitai&lt;/code&gt; to &lt;code&gt;nanocoai&lt;/code&gt;; updated link in the table above.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;NemoClaw&lt;/strong&gt; language corrected to TypeScript (previously listed as JavaScript in older data).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AionUi&lt;/strong&gt; gained ~800 stars, moving from 8th to a stronger 8th position.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MemOS&lt;/strong&gt; crossed 9,000 stars.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  OpenRouter Usage Rankings
&lt;/h2&gt;

&lt;p&gt;GitHub stars measure mindshare; OpenRouter token volume measures actual runtime usage. The two charts tell different stories.&lt;/p&gt;

&lt;p&gt;The table below shows the &lt;strong&gt;global daily ranking on OpenRouter&lt;/strong&gt; as of &lt;strong&gt;May 21, 2026&lt;/strong&gt;, filtered to apps and agents that have opted into usage attribution. Counts are daily tokens processed through the platform.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Rank&lt;/th&gt;
&lt;th&gt;App / Agent&lt;/th&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Daily tokens&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Hermes Agent&lt;/td&gt;
&lt;td&gt;Personal / CLI Agents&lt;/td&gt;
&lt;td&gt;458 B&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;OpenClaw&lt;/td&gt;
&lt;td&gt;Personal / CLI Agents&lt;/td&gt;
&lt;td&gt;173 B&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Kilo Code&lt;/td&gt;
&lt;td&gt;CLI / IDE Agents&lt;/td&gt;
&lt;td&gt;163 B&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Descript&lt;/td&gt;
&lt;td&gt;Video Generation&lt;/td&gt;
&lt;td&gt;68.1 B&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Claude Code&lt;/td&gt;
&lt;td&gt;CLI Agents&lt;/td&gt;
&lt;td&gt;64.1 B&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;pi&lt;/td&gt;
&lt;td&gt;CLI Agents&lt;/td&gt;
&lt;td&gt;58 B&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;Janitor AI&lt;/td&gt;
&lt;td&gt;Roleplay&lt;/td&gt;
&lt;td&gt;28.4 B&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;ISEKAI ZERO&lt;/td&gt;
&lt;td&gt;Game&lt;/td&gt;
&lt;td&gt;26.8 B&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;CSS AI Pro&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;25.4 B&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;Cline&lt;/td&gt;
&lt;td&gt;IDE / CLI Agents&lt;/td&gt;
&lt;td&gt;23.5 B&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;Roo Code&lt;/td&gt;
&lt;td&gt;IDE / Cloud Agents&lt;/td&gt;
&lt;td&gt;20.1 B&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;13&lt;/td&gt;
&lt;td&gt;Lemonade&lt;/td&gt;
&lt;td&gt;Programming App&lt;/td&gt;
&lt;td&gt;20 B&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;Mira&lt;/td&gt;
&lt;td&gt;Personal Agents&lt;/td&gt;
&lt;td&gt;15.2 B&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;VidMuse&lt;/td&gt;
&lt;td&gt;Video Generation&lt;/td&gt;
&lt;td&gt;13.3 B&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;AA-LCR Benchmark&lt;/td&gt;
&lt;td&gt;Research&lt;/td&gt;
&lt;td&gt;9.42 B&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;SillyTavern&lt;/td&gt;
&lt;td&gt;Roleplay&lt;/td&gt;
&lt;td&gt;7.84 B&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;19&lt;/td&gt;
&lt;td&gt;OpenHands&lt;/td&gt;
&lt;td&gt;CLI Agents&lt;/td&gt;
&lt;td&gt;7.21 B&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;Nous Research API&lt;/td&gt;
&lt;td&gt;General Chat&lt;/td&gt;
&lt;td&gt;6.63 B&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;Gaps in rank numbers (e.g., no 7 or 17) reflect apps without public attribution at the time of retrieval; OpenRouter lists 60 apps in total.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  All-time cumulative tokens
&lt;/h3&gt;

&lt;p&gt;The daily leader and the all-time leader have swapped since earlier in the year. As of today Hermes Agent has also overtaken OpenClaw on the all-time chart — a milestone that crossed some time after the May 10 daily flip.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;App / Agent&lt;/th&gt;
&lt;th&gt;All-time tokens&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Hermes Agent&lt;/td&gt;
&lt;td&gt;8.14 T&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenClaw&lt;/td&gt;
&lt;td&gt;7.18 T&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kilo Code&lt;/td&gt;
&lt;td&gt;5.21 T&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Code&lt;/td&gt;
&lt;td&gt;2.6 T&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  What the numbers mean
&lt;/h3&gt;

&lt;p&gt;The gap between Hermes Agent (458 B daily) and OpenClaw (173 B) is now wider than it was on May 10, when the flip first happened at 224 B vs 186 B. Hermes has more than doubled its daily volume in 11 days; OpenClaw's daily volume has declined.&lt;/p&gt;

&lt;p&gt;The architecture difference explains a lot of this. OpenClaw is session-native — it resets between runs, which means every session re-pays the full context-stuffing cost. Hermes is a persistent runtime with a &lt;a href="https://www.glukhov.org/ai-systems/hermes/hermes-agent-memory-system/" rel="noopener noreferrer"&gt;three-layer memory system&lt;/a&gt; (identity snapshot, SQLite FTS5 session database, self-written procedural skill files). Once a skill is written, repeat tasks cost a fraction of the tokens.&lt;/p&gt;

&lt;p&gt;For the coding-agent sub-category specifically, the top five are Hermes Agent, OpenClaw, Kilo Code, Claude Code, and &lt;code&gt;pi&lt;/code&gt;. Cline (#11) and Roo Code (#12) round out the open-source coding-agent tier, both crossing 20 B daily tokens.&lt;/p&gt;

&lt;p&gt;The driver of Hermes's May acceleration was the &lt;strong&gt;v0.13.0 "Tenacity" release&lt;/strong&gt; (May 7, 2026): 864 commits, 588 merged PRs, 295 contributors. That release shipped a Kanban-style durable multi-agent task board with heartbeat monitoring and hallucination recovery, plus eight P0 security fixes and Google Chat as the 20th messaging integration.&lt;/p&gt;

&lt;h2&gt;
  
  
  Community Health
&lt;/h2&gt;

&lt;p&gt;GitHub repository metrics reveal a sharp contrast in project maturity and maintenance style between the two leaders.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;OpenClaw&lt;/th&gt;
&lt;th&gt;Hermes Agent&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Issue close rate&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;89.9 %&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;37.2 %&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Contributors&lt;/td&gt;
&lt;td&gt;360&lt;/td&gt;
&lt;td&gt;400&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Forks&lt;/td&gt;
&lt;td&gt;72,696&lt;/td&gt;
&lt;td&gt;26,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Releases shipped (total)&lt;/td&gt;
&lt;td&gt;82+&lt;/td&gt;
&lt;td&gt;14+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Disclosed CVEs (2026 YTD)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;9 in 4 days&lt;/strong&gt; (March 2026)&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Worst CVE severity&lt;/td&gt;
&lt;td&gt;CVSS 9.9&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Exposed public instances&lt;/td&gt;
&lt;td&gt;135,000+ across 82 countries&lt;/td&gt;
&lt;td&gt;Not separately tracked&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Security response (v0.13.0)&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;8 P0 fixes, default redaction on&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;OpenClaw's 89.9 % issue close rate reflects a well-staffed, responsive maintainer team — the highest of any project in this space. Its release cadence (62 in the last 30 days alone) is exceptional, but that velocity has a cost: roughly a quarter of updates reportedly break response delivery on at least one channel, and the March 2026 CVE cluster (nine issues in four days, the worst at CVSS 9.9) forced emergency patching at scale. Shadowserver confirmed over 135,000 exposed Gateway instances across 82 countries in the same window. The OpenClaw team does publish fixes fast; the problem is that a community of this size patches slowly.&lt;/p&gt;

&lt;p&gt;Hermes Agent's 37.2 % issue close rate is the expected profile for a three-month-old project with a backlog accumulating faster than it can be triaged. The security record so far is clean — zero disclosed agent-specific CVEs as of May 2026 — though that partly reflects fewer eyes on the codebase. The v0.13.0 "Tenacity" release shipped eight P0 fixes proactively, before any public disclosure, which is a good signal of security culture.&lt;/p&gt;

&lt;h2&gt;
  
  
  Ecosystem Size
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Package downloads
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Package&lt;/th&gt;
&lt;th&gt;Registry&lt;/th&gt;
&lt;th&gt;Weekly downloads&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;openclaw&lt;/code&gt; (main)&lt;/td&gt;
&lt;td&gt;npm&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;5,344,931&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;@tencent-weixin/openclaw-weixin&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;npm&lt;/td&gt;
&lt;td&gt;230,903&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;@ollama/openclaw-web-search&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;npm&lt;/td&gt;
&lt;td&gt;160,221&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;@paperclipai/adapter-openclaw-gateway&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;npm&lt;/td&gt;
&lt;td&gt;159,310&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;@larksuite/openclaw-lark&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;npm&lt;/td&gt;
&lt;td&gt;115,964&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;hermes-agent&lt;/code&gt; (main)&lt;/td&gt;
&lt;td&gt;PyPI&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;53,134&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The raw numbers are not directly comparable — npm counts installs on every &lt;code&gt;npm install&lt;/code&gt; (including CI runs), while PyPI counts pip installs. OpenClaw also has a larger ecosystem of third-party adapter packages that each pull in the core. Even so, the order-of-magnitude difference reflects OpenClaw's deeper penetration of automated pipelines and developer toolchains.&lt;/p&gt;

&lt;p&gt;Hermes Agent at 53,000 PyPI downloads per week is not a small number for a three-month-old Python tool. Its install rate has grown roughly linearly with the GitHub star count.&lt;/p&gt;

&lt;h3&gt;
  
  
  Skills and integrations
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;OpenClaw&lt;/th&gt;
&lt;th&gt;Hermes Agent&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Third-party skill marketplace&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;ClawHub — 44,000+ skills&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;None yet (self-generated only)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Messaging integrations (official)&lt;/td&gt;
&lt;td&gt;50+ channels&lt;/td&gt;
&lt;td&gt;20 channels&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Community repositories (GitHub)&lt;/td&gt;
&lt;td&gt;Large (untracked by maintainers)&lt;/td&gt;
&lt;td&gt;80+ quality-filtered&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Skill libraries (community)&lt;/td&gt;
&lt;td&gt;Embedded in ClawHub&lt;/td&gt;
&lt;td&gt;17 curated&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-agent orchestration frameworks&lt;/td&gt;
&lt;td&gt;Built-in ACP swarm&lt;/td&gt;
&lt;td&gt;9 third-party&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;External memory providers&lt;/td&gt;
&lt;td&gt;Via skills&lt;/td&gt;
&lt;td&gt;8 native&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;ClawHub is OpenClaw's most durable moat: 44,000 community-maintained skills covering integrations, automations, and workflows that would take months to replicate. Hermes's answer is to generate skills from its own task completions rather than pull them from a marketplace — a fundamentally different philosophy that pays off on deep, repeated tasks but leaves gaps on long-tail integrations. The eight external memory backends Hermes ships natively — Honcho, OpenViking, Mem0, Hindsight, and four more — are compared in detail in &lt;a href="https://www.glukhov.org/ai-systems/memory/agent-memory-providers/" rel="noopener noreferrer"&gt;Agent Memory Providers Compared&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;One security note on ClawHub: in Q1 2026 Koi Security identified 341 malicious entries in the registry, prompting OpenClaw to add a verification layer to the skill submission pipeline. A detailed guide to vetting skills, understanding which are safe to install, and navigating ClawHub's quality tiers is in &lt;a href="https://www.glukhov.org/ai-systems/openclaw/skills/" rel="noopener noreferrer"&gt;OpenClaw Skills Ecosystem and Practical Production Picks&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Community Sentiment
&lt;/h2&gt;

&lt;p&gt;A synthesis of Reddit threads across r/homeautomation, r/selfhosted, and r/MachineLearning (compiled by &lt;a href="https://kilo.ai/articles/openclaw-vs-hermes-what-reddit-says" rel="noopener noreferrer"&gt;kilo.ai&lt;/a&gt;) breaks down operator preferences as follows:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Stance&lt;/th&gt;
&lt;th&gt;Share&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Stay on OpenClaw&lt;/td&gt;
&lt;td&gt;35 %&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Switched fully to Hermes&lt;/td&gt;
&lt;td&gt;30 %&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Run both side by side&lt;/td&gt;
&lt;td&gt;20 %&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Withholding judgment on Hermes&lt;/td&gt;
&lt;td&gt;15 %&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The 15 % holding off on Hermes are primarily concerned about what some users characterise as coordinated promotion activity from newly created accounts in Hermes-related threads — a pattern common to fast-growing projects but notable enough that veteran community members flag it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Top OpenClaw complaints (by upvote volume)
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Release breakage&lt;/strong&gt; — most-upvoted complaint has 305 votes: &lt;em&gt;"Every single update ships more bugs and problems than before."&lt;/em&gt; An estimated 25 % of releases break response delivery on at least one channel.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory drift&lt;/strong&gt; — agents forget prior instructions across sessions, requiring users to re-establish context manually.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Self-host friction&lt;/strong&gt; — disproportionate time spent on Docker configuration, SSH setup, and YAML tuning relative to actual agent work.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Top Hermes Agent complaints (by frequency)
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Unreliable self-evaluation&lt;/strong&gt; — the agent occasionally reports task success when the outcome was a partial failure.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Skill file overwriting&lt;/strong&gt; — auto-improvement rewrites manually tuned skill files, discarding intentional customisation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Integration gaps&lt;/strong&gt; — ClawHub has a skill for almost everything; Hermes does not, and self-generation takes time to catch up.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The "run both" pattern (20 % of operators) is the most architecturally interesting: OpenClaw as the channel-and-routing layer up front, with Hermes as the deep-specialist backend. Messages arrive via Telegram or Slack, OpenClaw routes them, and the tasks where compounding matters are dispatched to a Hermes instance that has been improving on exactly those workflows for weeks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Search Interest Trend
&lt;/h2&gt;

&lt;p&gt;Tracking the &lt;strong&gt;project growth leaderboard&lt;/strong&gt; (weekly new GitHub stars, a cleaner signal than raw star count) shows a clear momentum reversal as of May 2026.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Project&lt;/th&gt;
&lt;th&gt;Weekly star growth&lt;/th&gt;
&lt;th&gt;Leaderboard position&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Claw-code&lt;/td&gt;
&lt;td&gt;+7,000&lt;/td&gt;
&lt;td&gt;#1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hermes Agent&lt;/td&gt;
&lt;td&gt;+3,800&lt;/td&gt;
&lt;td&gt;#3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenClaw&lt;/td&gt;
&lt;td&gt;+1,700&lt;/td&gt;
&lt;td&gt;#11&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;OpenClaw had a +40,000/week peak in early February 2026 during the post-relaunch explosion. At +1,700/week in May, it is still growing in absolute terms — 373k stars does not happen without weekly adds — but it has settled into a mature project cadence, not a growth sprint.&lt;/p&gt;

&lt;p&gt;Hermes Agent at +3,800/week is the fastest-growing agent runtime on the leaderboard right now, despite having less than half of OpenClaw's cumulative stars. Its growth curve is steeper than OpenClaw's was at the same age (week 12 post-launch).&lt;/p&gt;

&lt;p&gt;The broader &lt;strong&gt;search interest trend&lt;/strong&gt; corroborates the star-growth pattern. Queries for "Hermes Agent" and "hermes-agent install" have been rising consistently since the February launch; "OpenClaw" search volume peaked in late January and has been flat-to-declining since. The intersection point — where Hermes search volume equals OpenClaw's — has not yet been reached, but the trajectories suggest it will cross sometime in Q3 2026 if current rates hold.&lt;/p&gt;

&lt;p&gt;The HN community has also shifted: threads about OpenClaw now centre on security hardening, transport trust (Telegram's lack of default end-to-end encryption), and maintenance overhead. Threads about Hermes Agent are still mostly "how do I set this up for X" — an earlier-stage energy that reflects a project still in its adoption phase.&lt;/p&gt;

&lt;h2&gt;
  
  
  Useful links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://openclawai.io" rel="noopener noreferrer"&gt;OpenClaw official site&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/NousResearch/hermes-agent" rel="noopener noreferrer"&gt;Hermes Agent — Nous Research&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://openrouter.ai/apps" rel="noopener noreferrer"&gt;OpenRouter app rankings (live)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://openclawchronicles.com" rel="noopener noreferrer"&gt;OpenClaw Chronicles — community news&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://hermesatlas.com" rel="noopener noreferrer"&gt;Hermes Atlas — monthly state reports&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>hermes</category>
      <category>openclaw</category>
      <category>community</category>
      <category>selfhosting</category>
    </item>
    <item>
      <title>Go Unit Testing: Structure &amp; Best Practices</title>
      <dc:creator>Rost</dc:creator>
      <pubDate>Sun, 24 May 2026 02:28:55 +0000</pubDate>
      <link>https://dev.to/rosgluk/go-unit-testing-structure-best-practices-1k19</link>
      <guid>https://dev.to/rosgluk/go-unit-testing-structure-best-practices-1k19</guid>
      <description>&lt;p&gt;&lt;a href="https://www.glukhov.org/app-architecture/testing-architecture/unit-testing-in-go/" rel="noopener noreferrer"&gt;Go's built-in testing package&lt;/a&gt;&lt;br&gt;
provides a powerful, minimalist framework for writing unit tests without external dependencies.&lt;br&gt;
Here are the testing fundamentals, project structure, and advanced patterns to build reliable Go applications.&lt;/p&gt;


&lt;h2&gt;
  
  
  Why Testing Matters in Go
&lt;/h2&gt;

&lt;p&gt;Go's philosophy emphasizes simplicity and reliability. The standard library includes the &lt;code&gt;testing&lt;/code&gt; package, making unit testing a first-class citizen in the Go ecosystem. Well-tested Go code improves maintainability, catches bugs early, and provides documentation through examples. If you're new to Go, check out our &lt;a href="https://www.glukhov.org/developer-tools/cheatsheets/golang-cheatsheet/" rel="noopener noreferrer"&gt;Go Cheat Sheet&lt;/a&gt; for a quick reference of the language fundamentals.&lt;/p&gt;

&lt;p&gt;Key benefits of Go testing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Built-in support&lt;/strong&gt;: No external frameworks required&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fast execution&lt;/strong&gt;: Concurrent test execution by default&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Simple syntax&lt;/strong&gt;: Minimal boilerplate code&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rich tooling&lt;/strong&gt;: Coverage reports, benchmarks, and profiling&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CI/CD friendly&lt;/strong&gt;: Easy integration with automated pipelines&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Project Structure for Go Tests
&lt;/h2&gt;

&lt;p&gt;Go tests live alongside your production code with a clear naming convention:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;myproject/
├── go.mod
├── main.go
├── calculator.go
├── calculator_test.go
├── utils/
│   ├── helper.go
│   └── helper_test.go
└── models/
    ├── user.go
    └── user_test.go
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Key conventions:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Test files end with &lt;code&gt;_test.go&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Tests are in the same package as the code (or use &lt;code&gt;_test&lt;/code&gt; suffix for black-box testing)&lt;/li&gt;
&lt;li&gt;Each source file can have a corresponding test file&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Package Testing Approaches
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;White-box testing&lt;/strong&gt; (same package):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;package&lt;/span&gt; &lt;span class="n"&gt;calculator&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="s"&gt;"testing"&lt;/span&gt;
&lt;span class="c"&gt;// Can access unexported functions and variables&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Black-box testing&lt;/strong&gt; (external package):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;package&lt;/span&gt; &lt;span class="n"&gt;calculator_test&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"testing"&lt;/span&gt;
    &lt;span class="s"&gt;"myproject/calculator"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c"&gt;// Can only access exported functions (recommended for public APIs)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Basic Test Structure
&lt;/h2&gt;

&lt;p&gt;Every test function follows this pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;package&lt;/span&gt; &lt;span class="n"&gt;calculator&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="s"&gt;"testing"&lt;/span&gt;

&lt;span class="c"&gt;// Test function must start with "Test"&lt;/span&gt;
&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;TestAdd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;testing&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;Add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;expected&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;expected&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Errorf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Add(2, 3) = %d; want %d"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Testing.T methods:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;t.Error()&lt;/code&gt; / &lt;code&gt;t.Errorf()&lt;/code&gt;: Mark test as failed but continue&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;t.Fatal()&lt;/code&gt; / &lt;code&gt;t.Fatalf()&lt;/code&gt;: Mark test as failed and stop immediately&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;t.Log()&lt;/code&gt; / &lt;code&gt;t.Logf()&lt;/code&gt;: Log output (only shown with &lt;code&gt;-v&lt;/code&gt; flag)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;t.Skip()&lt;/code&gt; / &lt;code&gt;t.Skipf()&lt;/code&gt;: Skip the test&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;t.Parallel()&lt;/code&gt;: Run test in parallel with other parallel tests&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;t.Log&lt;/code&gt; is for human-readable test diagnostics. In running services, &lt;strong&gt;log/slog&lt;/strong&gt; and JSON-friendly records are usually a better match for aggregation and incident debugging. See &lt;a href="https://www.glukhov.org/observability/logging/structured-logging-go-slog/" rel="noopener noreferrer"&gt;Structured Logging in Go with slog for Observability and Alerting&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Table-Driven Tests: The Go Way
&lt;/h2&gt;

&lt;p&gt;Table-driven tests are the idiomatic Go approach for testing multiple scenarios. With &lt;a href="https://www.glukhov.org/app-architecture/code-architecture/generics-in-go/" rel="noopener noreferrer"&gt;Go generics&lt;/a&gt;, you can also create type-safe test helpers that work across different data types:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;TestCalculate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;testing&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;tests&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;name&lt;/span&gt;     &lt;span class="kt"&gt;string&lt;/span&gt;
        &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;     &lt;span class="kt"&gt;int&lt;/span&gt;
        &lt;span class="n"&gt;op&lt;/span&gt;       &lt;span class="kt"&gt;string&lt;/span&gt;
        &lt;span class="n"&gt;expected&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;
        &lt;span class="n"&gt;wantErr&lt;/span&gt;  &lt;span class="kt"&gt;bool&lt;/span&gt;
    &lt;span class="p"&gt;}{&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;"addition"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"+"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;false&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;"subtraction"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"-"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;false&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;"multiplication"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"*"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;false&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;"division"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"/"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;false&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;"division by zero"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"/"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;true&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tt&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;tests&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;testing&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;Calculate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;tt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;wantErr&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Errorf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Calculate() error = %v, wantErr %v"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;wantErr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;

            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;tt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;expected&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Errorf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Calculate(%d, %d, %q) = %d; want %d"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                    &lt;span class="n"&gt;tt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Advantages:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Single test function for multiple scenarios&lt;/li&gt;
&lt;li&gt;Easy to add new test cases&lt;/li&gt;
&lt;li&gt;Clear documentation of expected behavior&lt;/li&gt;
&lt;li&gt;Better test organization and maintainability&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Running Tests
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Basic Commands
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Run tests in current directory&lt;/span&gt;
go &lt;span class="nb"&gt;test&lt;/span&gt;

&lt;span class="c"&gt;# Run tests with verbose output&lt;/span&gt;
go &lt;span class="nb"&gt;test&lt;/span&gt; &lt;span class="nt"&gt;-v&lt;/span&gt;

&lt;span class="c"&gt;# Run tests in all subdirectories&lt;/span&gt;
go &lt;span class="nb"&gt;test&lt;/span&gt; ./...

&lt;span class="c"&gt;# Run specific test&lt;/span&gt;
go &lt;span class="nb"&gt;test&lt;/span&gt; &lt;span class="nt"&gt;-run&lt;/span&gt; TestAdd

&lt;span class="c"&gt;# Run tests matching pattern&lt;/span&gt;
go &lt;span class="nb"&gt;test&lt;/span&gt; &lt;span class="nt"&gt;-run&lt;/span&gt; TestCalculate/addition

&lt;span class="c"&gt;# Run tests in parallel (default is GOMAXPROCS)&lt;/span&gt;
go &lt;span class="nb"&gt;test&lt;/span&gt; &lt;span class="nt"&gt;-parallel&lt;/span&gt; 4

&lt;span class="c"&gt;# Run tests with timeout&lt;/span&gt;
go &lt;span class="nb"&gt;test&lt;/span&gt; &lt;span class="nt"&gt;-timeout&lt;/span&gt; 30s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Test Coverage
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Run tests with coverage&lt;/span&gt;
go &lt;span class="nb"&gt;test&lt;/span&gt; &lt;span class="nt"&gt;-cover&lt;/span&gt;

&lt;span class="c"&gt;# Generate coverage profile&lt;/span&gt;
go &lt;span class="nb"&gt;test&lt;/span&gt; &lt;span class="nt"&gt;-coverprofile&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;coverage.out

&lt;span class="c"&gt;# View coverage in browser&lt;/span&gt;
go tool cover &lt;span class="nt"&gt;-html&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;coverage.out

&lt;span class="c"&gt;# Show coverage by function&lt;/span&gt;
go tool cover &lt;span class="nt"&gt;-func&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;coverage.out

&lt;span class="c"&gt;# Set coverage mode (set, count, atomic)&lt;/span&gt;
go &lt;span class="nb"&gt;test&lt;/span&gt; &lt;span class="nt"&gt;-covermode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;count &lt;span class="nt"&gt;-coverprofile&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;coverage.out
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Useful Flags
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;-short&lt;/code&gt;: Run tests marked with &lt;code&gt;if testing.Short()&lt;/code&gt; checks&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;-race&lt;/code&gt;: Enable race detector (finds concurrent access issues)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;-cpu&lt;/code&gt;: Specify GOMAXPROCS values&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;-count n&lt;/code&gt;: Run each test n times&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;-failfast&lt;/code&gt;: Stop on first test failure&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Test Helpers and Setup/Teardown
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Helper Functions
&lt;/h3&gt;

&lt;p&gt;Mark helper functions with &lt;code&gt;t.Helper()&lt;/code&gt; to improve error reporting:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;assertEqual&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;testing&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;got&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;want&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Helper&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="c"&gt;// This line is reported as the caller&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;got&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;want&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Errorf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"got %d, want %d"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;got&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;want&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;TestMath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;testing&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;Add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;assertEqual&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c"&gt;// Error line points here&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Setup and Teardown
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;TestMain&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;testing&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;M&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c"&gt;// Setup code here&lt;/span&gt;
    &lt;span class="n"&gt;setup&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c"&gt;// Run tests&lt;/span&gt;
    &lt;span class="n"&gt;code&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Run&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c"&gt;// Teardown code here&lt;/span&gt;
    &lt;span class="n"&gt;teardown&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Exit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Test Fixtures
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;setupTestCase&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;testing&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;testing&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"setup test case"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;testing&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"teardown test case"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;TestSomething&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;testing&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;teardown&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;setupTestCase&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;teardown&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c"&gt;// Test code here&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Mocking and Dependency Injection
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Interface-Based Mocking
&lt;/h3&gt;

&lt;p&gt;When testing code that interacts with databases, using interfaces makes it easy to create mock implementations. If you're working with PostgreSQL in Go, see our &lt;a href="https://www.glukhov.org/app-architecture/data-access/comparing-go-orms-gorm-ent-bun-sqlc/" rel="noopener noreferrer"&gt;comparison of Go ORMs&lt;/a&gt; for choosing the right database library with good testability.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// Production code&lt;/span&gt;
&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Database&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;GetUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;UserService&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;db&lt;/span&gt; &lt;span class="n"&gt;Database&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;UserService&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;GetUserName&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GetUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c"&gt;// Test code&lt;/span&gt;
&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;MockDatabase&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;MockDatabase&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;GetUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt; &lt;span class="n"&gt;ok&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;New&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"user not found"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;TestGetUserName&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;testing&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;mockDB&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;MockDatabase&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ID&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"Alice"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;service&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;UserService&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;mockDB&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GetUserName&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fatalf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"unexpected error: %v"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="s"&gt;"Alice"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Errorf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"got %s, want Alice"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Popular Testing Libraries
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Testify
&lt;/h3&gt;

&lt;p&gt;The most popular Go testing library for assertions and mocks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"github.com/stretchr/testify/assert"&lt;/span&gt;
    &lt;span class="s"&gt;"github.com/stretchr/testify/mock"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;TestWithTestify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;testing&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;Add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;assert&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Equal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"they should be equal"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;assert&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NotNil&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c"&gt;// Mock example&lt;/span&gt;
&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;MockDB&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;mock&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Mock&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;MockDB&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;GetUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Called&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Other Tools
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;gomock&lt;/strong&gt;: Google's mocking framework with code generation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;httptest&lt;/strong&gt;: Standard library for testing HTTP handlers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;testcontainers-go&lt;/strong&gt;: Integration testing with Docker containers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ginkgo/gomega&lt;/strong&gt;: BDD-style testing framework&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When testing integrations with external services like AI models, you'll need to mock or stub those dependencies. For example, if you're using &lt;a href="https://www.glukhov.org/llm-hosting/ollama/using-ollama-in-go/" rel="noopener noreferrer"&gt;Ollama in Go&lt;/a&gt;, consider creating interface wrappers to make your code more testable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Benchmark Tests
&lt;/h2&gt;

&lt;p&gt;Go includes built-in support for benchmarks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;BenchmarkAdd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;testing&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;B&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;Add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c"&gt;// Run benchmarks&lt;/span&gt;
&lt;span class="c"&gt;// go test -bench=. -benchmem&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output shows iterations per second and memory allocations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Best Practices
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Write table-driven tests&lt;/strong&gt;: Use the slice of structs pattern for multiple test cases&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use t.Run for subtests&lt;/strong&gt;: Better organization and can run subtests selectively&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test exported functions first&lt;/strong&gt;: Focus on public API behavior&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keep tests simple&lt;/strong&gt;: Each test should verify one thing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use meaningful test names&lt;/strong&gt;: Describe what is being tested and expected outcome&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Don't test implementation details&lt;/strong&gt;: Test behavior, not internals&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use interfaces for dependencies&lt;/strong&gt;: Makes mocking easier&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Aim for high coverage, but quality over quantity&lt;/strong&gt;: 100% coverage doesn't mean bug-free&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Run tests with -race flag&lt;/strong&gt;: Catch concurrency issues early&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use TestMain for expensive setup&lt;/strong&gt;: Avoid repeating setup in each test&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Example: Complete Test Suite
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;package&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"errors"&lt;/span&gt;
    &lt;span class="s"&gt;"testing"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;User&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;ID&lt;/span&gt;    &lt;span class="kt"&gt;int&lt;/span&gt;
    &lt;span class="n"&gt;Name&lt;/span&gt;  &lt;span class="kt"&gt;string&lt;/span&gt;
    &lt;span class="n"&gt;Email&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;ValidateUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;u&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;New&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"name cannot be empty"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Email&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;New&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"email cannot be empty"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c"&gt;// Test file: user_test.go&lt;/span&gt;
&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;TestValidateUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;testing&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;tests&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;name&lt;/span&gt;    &lt;span class="kt"&gt;string&lt;/span&gt;
        &lt;span class="n"&gt;user&lt;/span&gt;    &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;
        &lt;span class="n"&gt;wantErr&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt;
        &lt;span class="n"&gt;errMsg&lt;/span&gt;  &lt;span class="kt"&gt;string&lt;/span&gt;
    &lt;span class="p"&gt;}{&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;    &lt;span class="s"&gt;"valid user"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;    &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ID&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"Alice"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Email&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"alice@example.com"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="n"&gt;wantErr&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="no"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;    &lt;span class="s"&gt;"empty name"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;    &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ID&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Email&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"alice@example.com"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="n"&gt;wantErr&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="no"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;errMsg&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;  &lt;span class="s"&gt;"name cannot be empty"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;    &lt;span class="s"&gt;"empty email"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;    &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ID&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"Alice"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Email&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="n"&gt;wantErr&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="no"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;errMsg&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;  &lt;span class="s"&gt;"email cannot be empty"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tt&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;tests&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;testing&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;ValidateUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;tt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;wantErr&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Errorf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"ValidateUser() error = %v, wantErr %v"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;wantErr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;

            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;tt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;errMsg&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Errorf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"ValidateUser() error message = %v, want %v"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;tt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;errMsg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Useful Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://pkg.go.dev/testing" rel="noopener noreferrer"&gt;Official Go Testing Package Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://go.dev/blog/subtests" rel="noopener noreferrer"&gt;Go Blog: Table-Driven Tests&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/stretchr/testify" rel="noopener noreferrer"&gt;Testify GitHub Repository&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/golang/mock" rel="noopener noreferrer"&gt;GoMock Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://quii.gitbook.io/learn-go-with-tests/" rel="noopener noreferrer"&gt;Learn Go with Tests&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://go.dev/blog/cover" rel="noopener noreferrer"&gt;Go Code Coverage Tool&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.glukhov.org/developer-tools/cheatsheets/golang-cheatsheet/" rel="noopener noreferrer"&gt;Go Cheat Sheet&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.glukhov.org/observability/logging/structured-logging-go-slog/" rel="noopener noreferrer"&gt;Structured Logging in Go with slog for Observability and Alerting&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.glukhov.org/app-architecture/data-access/comparing-go-orms-gorm-ent-bun-sqlc/" rel="noopener noreferrer"&gt;Comparing Go ORMs for PostgreSQL: GORM vs Ent vs Bun vs sqlc&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.glukhov.org/llm-hosting/ollama/using-ollama-in-go/" rel="noopener noreferrer"&gt;Go SDKs for Ollama - comparison with examples&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.glukhov.org/developer-tools/cli-tools/go-cli-applications-with-cobra-and-viper/" rel="noopener noreferrer"&gt;Building CLI Applications in Go with Cobra &amp;amp; Viper&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.glukhov.org/app-architecture/code-architecture/generics-in-go/" rel="noopener noreferrer"&gt;Go Generics: Use Cases and Patterns&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Go's testing framework provides everything needed for comprehensive unit testing with minimal setup. By following Go idioms like table-driven tests, using interfaces for mocking, and leveraging built-in tools, you can create maintainable, reliable test suites that grow with your codebase.&lt;/p&gt;

&lt;p&gt;These testing practices apply to all types of Go applications, from web services to &lt;a href="https://www.glukhov.org/developer-tools/cli-tools/go-cli-applications-with-cobra-and-viper/" rel="noopener noreferrer"&gt;CLI applications built with Cobra &amp;amp; Viper&lt;/a&gt;. Testing command-line tools requires similar patterns with additional focus on testing input/output and flag parsing.&lt;/p&gt;

&lt;p&gt;Start with simple tests, gradually add coverage, and remember that testing is an investment in code quality and developer confidence. The Go community's emphasis on testing makes it easier to maintain projects long-term and collaborate effectively with team members.&lt;/p&gt;

&lt;p&gt;See the &lt;a href="https://www.glukhov.org/app-architecture/" rel="noopener noreferrer"&gt;App Architecture hub&lt;/a&gt; for related guides on Go project structure, dependency injection, API design, and integration patterns.&lt;/p&gt;

</description>
      <category>go</category>
      <category>dev</category>
      <category>devops</category>
    </item>
    <item>
      <title>Qwen 3.6 27B and 35B MTP vs Standard on 16GB GPU</title>
      <dc:creator>Rost</dc:creator>
      <pubDate>Sun, 24 May 2026 00:31:07 +0000</pubDate>
      <link>https://dev.to/rosgluk/qwen-36-27b-and-35b-mtp-vs-standard-on-16gb-gpu-42jd</link>
      <guid>https://dev.to/rosgluk/qwen-36-27b-and-35b-mtp-vs-standard-on-16gb-gpu-42jd</guid>
      <description>&lt;p&gt;I tested Speculative decoding (Multi-Token Prediction, MTP) performance in Qwen 3.6 27B and 35B on an RTX 4080 with 16 GB VRAM.&lt;/p&gt;

&lt;p&gt;For a broader view of token speeds and VRAM trade-offs across more models on the same hardware, see &lt;a href="https://www.glukhov.org/llm-performance/benchmarks/best-llm-on-16gb-vram-gpu/" rel="noopener noreferrer"&gt;16 GB VRAM LLM benchmarks with llama.cpp&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What MTP (Multi-Token Prediction) Is
&lt;/h2&gt;

&lt;p&gt;Multi-Token Prediction is a form of speculative decoding built directly into certain model checkpoints. Instead of predicting one token per forward pass, the model carries extra "MTP heads" that propose several future tokens in a single step — then verifies them in parallel. If the guesses are accepted, the effective throughput rises without changing the output quality.&lt;/p&gt;

&lt;p&gt;The Qwen 3.6 family ships both &lt;strong&gt;standard&lt;/strong&gt; GGUF files and &lt;strong&gt;MTP&lt;/strong&gt;-enabled variants. In llama.cpp, MTP is activated through:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nt"&gt;--spec-type&lt;/span&gt; draft-mtp &lt;span class="nt"&gt;--spec-draft-n-max&lt;/span&gt; 3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;--spec-draft-n-max&lt;/code&gt; is the key tuning knob. It sets how many speculative tokens the MTP head proposes at each step. Higher values give a potential speed boost but cost extra VRAM for the draft buffers — a real constraint on 16 GB cards.&lt;/p&gt;

&lt;h2&gt;
  
  
  What and How I Tested
&lt;/h2&gt;

&lt;p&gt;I tested how the two Qwen 3.6 models behave with MTP enabled versus standard decoding on a &lt;strong&gt;GPU with 16 GB VRAM (RTX 4080)&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;To fit model weights and KV cache into VRAM I used heavily quantised variants:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;Qwen3.6-27B-UD-IQ3_XXS&lt;/code&gt; and &lt;code&gt;Qwen3.6-27B-UD-IQ3_XXS-MTP&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Qwen3.6-35B-A3B-UD-IQ3_S&lt;/code&gt; and &lt;code&gt;Qwen3.6-35B-A3B-UD-IQ3_S-MTP&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Two context budgets are tracked per run:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Avg Ctx&lt;/strong&gt; — the context size at which llama.cpp occupies ~14.8 GB VRAM, leaving other apps (Xorg, GNOME Shell, Cursor) a comfortable ~500 MB buffer.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Max Ctx&lt;/strong&gt; — the largest context llama.cpp could allocate given that the same desktop apps already hold ~500 MB VRAM.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A key reason for keeping the average context at a practical target is that &lt;a href="https://www.glukhov.org/ai-systems/hermes/" rel="noopener noreferrer"&gt;Hermes Agent&lt;/a&gt; — which I use as the primary AI assistant connecting to llama.cpp on this machine — &lt;strong&gt;requires at least 64 K context&lt;/strong&gt; by default and will reject models with a smaller window at startup. Models below that threshold cannot maintain enough working memory for multi-step tool-calling workflows. For llama.cpp this means passing &lt;code&gt;--ctx-size 65536&lt;/code&gt; or larger. Any MTP configuration that compresses the average usable context significantly below 64 K is therefore unsuitable for daily Hermes workloads, which is why the Avg Ctx numbers in the tables below are the most decision-relevant ones.&lt;/p&gt;

&lt;p&gt;Both KV cache quantisation levels were tested: &lt;strong&gt;q8&lt;/strong&gt; (higher quality, more VRAM) and &lt;strong&gt;q5&lt;/strong&gt; (lower VRAM, longer context). Be aware that moving from q8 to q5 KV cache can cause a noticeable quality drop — in my testing the degradation was significant enough to make q5 unsuitable for my workloads. The speed and context numbers for q5 are included for completeness, but you should test response quality on your own tasks before committing to it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Qwen 3.6 27B MTP vs Standard
&lt;/h2&gt;

&lt;h3&gt;
  
  
  KV Cache q8
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;MTP max 1&lt;/th&gt;
&lt;th&gt;MTP max 2&lt;/th&gt;
&lt;th&gt;MTP max 3&lt;/th&gt;
&lt;th&gt;MTP max 4&lt;/th&gt;
&lt;th&gt;Standard (IQ3_XXS)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Prompt Speed&lt;/td&gt;
&lt;td&gt;148 t/s&lt;/td&gt;
&lt;td&gt;151 t/s&lt;/td&gt;
&lt;td&gt;148 t/s&lt;/td&gt;
&lt;td&gt;147 t/s&lt;/td&gt;
&lt;td&gt;200 t/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gen Speed&lt;/td&gt;
&lt;td&gt;65 t/s&lt;/td&gt;
&lt;td&gt;75 t/s&lt;/td&gt;
&lt;td&gt;73 t/s&lt;/td&gt;
&lt;td&gt;75 t/s&lt;/td&gt;
&lt;td&gt;45 t/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Avg Ctx&lt;/td&gt;
&lt;td&gt;40 K&lt;/td&gt;
&lt;td&gt;40 K&lt;/td&gt;
&lt;td&gt;40 K&lt;/td&gt;
&lt;td&gt;30 K&lt;/td&gt;
&lt;td&gt;80 K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Max Ctx&lt;/td&gt;
&lt;td&gt;60 K&lt;/td&gt;
&lt;td&gt;60 K&lt;/td&gt;
&lt;td&gt;60 K&lt;/td&gt;
&lt;td&gt;50 K&lt;/td&gt;
&lt;td&gt;100 K&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;With q8 KV cache, MTP at &lt;code&gt;--spec-draft-n-max 2&lt;/code&gt; delivers &lt;strong&gt;~67 % faster generation&lt;/strong&gt; (75 vs 45 t/s) at the cost of halving the average context window from 80 K to 40 K. Prompt ingestion speed drops from 200 to ~150 t/s because MTP requires device-to-host transfers during the prefill phase.&lt;/p&gt;

&lt;h3&gt;
  
  
  KV Cache q5
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;MTP max 1&lt;/th&gt;
&lt;th&gt;MTP max 2&lt;/th&gt;
&lt;th&gt;MTP max 3&lt;/th&gt;
&lt;th&gt;MTP max 4&lt;/th&gt;
&lt;th&gt;Standard (IQ3_XXS)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Prompt Speed&lt;/td&gt;
&lt;td&gt;145 t/s&lt;/td&gt;
&lt;td&gt;144 t/s&lt;/td&gt;
&lt;td&gt;141 t/s&lt;/td&gt;
&lt;td&gt;139 t/s&lt;/td&gt;
&lt;td&gt;191 t/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gen Speed&lt;/td&gt;
&lt;td&gt;57 t/s&lt;/td&gt;
&lt;td&gt;62 t/s&lt;/td&gt;
&lt;td&gt;67 t/s&lt;/td&gt;
&lt;td&gt;66 t/s&lt;/td&gt;
&lt;td&gt;41 t/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Avg Ctx&lt;/td&gt;
&lt;td&gt;70 K&lt;/td&gt;
&lt;td&gt;60 K&lt;/td&gt;
&lt;td&gt;60 K&lt;/td&gt;
&lt;td&gt;50 K&lt;/td&gt;
&lt;td&gt;130 K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Max Ctx&lt;/td&gt;
&lt;td&gt;100 K&lt;/td&gt;
&lt;td&gt;100 K&lt;/td&gt;
&lt;td&gt;90 K&lt;/td&gt;
&lt;td&gt;80 K&lt;/td&gt;
&lt;td&gt;160 K&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Switching to q5 KV cache recovers meaningful context: &lt;code&gt;--spec-draft-n-max 1&lt;/code&gt; gives 70 K average context at 57 t/s — a &lt;strong&gt;39 % generation speedup&lt;/strong&gt; over standard decoding while still keeping the context window at a useful size. At &lt;code&gt;--spec-draft-n-max 3&lt;/code&gt; the context drops to 60 K but generation reaches 67 t/s (&lt;strong&gt;+63 %&lt;/strong&gt;).&lt;/p&gt;

&lt;h3&gt;
  
  
  Qwen 3.6 27B Takeaway
&lt;/h3&gt;

&lt;p&gt;MTP is genuinely useful for the 27B dense model. The sweet spot on 16 GB VRAM is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;q8 KV + &lt;code&gt;--spec-draft-n-max 2&lt;/code&gt;&lt;/strong&gt; — best raw speed (75 t/s), context down to 40–60 K&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;q5 KV + &lt;code&gt;--spec-draft-n-max 1&lt;/code&gt;&lt;/strong&gt; — best speed-vs-context balance (57 t/s, 70 K avg context)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Qwen 3.6 35B MTP vs Standard
&lt;/h2&gt;

&lt;p&gt;The 35B model is a Mixture-of-Experts (MoE) architecture (&lt;code&gt;35B-A3B&lt;/code&gt; means 35B total parameters, ~3B active per token). MoE models usually benefit more from MTP because the sparse routing keeps the MTP head computationally cheap relative to a full forward pass.&lt;/p&gt;

&lt;h3&gt;
  
  
  KV Cache q8
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;MTP max 1&lt;/th&gt;
&lt;th&gt;MTP max 2&lt;/th&gt;
&lt;th&gt;MTP max 3&lt;/th&gt;
&lt;th&gt;MTP max 4&lt;/th&gt;
&lt;th&gt;Standard (IQ3_S)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Prompt Speed&lt;/td&gt;
&lt;td&gt;277 t/s&lt;/td&gt;
&lt;td&gt;277 t/s&lt;/td&gt;
&lt;td&gt;265 t/s&lt;/td&gt;
&lt;td&gt;275 t/s&lt;/td&gt;
&lt;td&gt;368 t/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gen Speed&lt;/td&gt;
&lt;td&gt;186 t/s&lt;/td&gt;
&lt;td&gt;189 t/s&lt;/td&gt;
&lt;td&gt;180 t/s&lt;/td&gt;
&lt;td&gt;171 t/s&lt;/td&gt;
&lt;td&gt;146 t/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Avg Ctx&lt;/td&gt;
&lt;td&gt;15 K&lt;/td&gt;
&lt;td&gt;10 K&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;80 K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Max Ctx&lt;/td&gt;
&lt;td&gt;80 K&lt;/td&gt;
&lt;td&gt;70 K&lt;/td&gt;
&lt;td&gt;60 K&lt;/td&gt;
&lt;td&gt;50 K&lt;/td&gt;
&lt;td&gt;150 K&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The MoE architecture delivers impressive raw generation speed with MTP (+27 % at max 1, +29 % at max 2 vs standard 146 t/s). But the practical problem is the average context. With q8 KV cache, even &lt;code&gt;--spec-draft-n-max 1&lt;/code&gt; only gives 15 K average context — barely enough for modest tasks. Higher draft depths have no viable average context at all on a 16 GB card.&lt;/p&gt;

&lt;p&gt;This is the central VRAM cost question for MTP on consumer hardware: the extra draft buffers eat into the remaining VRAM budget directly, and the 35B-A3B model with q8 KV cache leaves very little headroom.&lt;/p&gt;

&lt;h3&gt;
  
  
  KV Cache q5
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;MTP max 1&lt;/th&gt;
&lt;th&gt;MTP max 2&lt;/th&gt;
&lt;th&gt;MTP max 3&lt;/th&gt;
&lt;th&gt;MTP max 4&lt;/th&gt;
&lt;th&gt;Standard (IQ3_S)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Prompt Speed&lt;/td&gt;
&lt;td&gt;264 t/s&lt;/td&gt;
&lt;td&gt;266 t/s&lt;/td&gt;
&lt;td&gt;270 t/s&lt;/td&gt;
&lt;td&gt;264 t/s&lt;/td&gt;
&lt;td&gt;343 t/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gen Speed&lt;/td&gt;
&lt;td&gt;151 t/s&lt;/td&gt;
&lt;td&gt;147 t/s&lt;/td&gt;
&lt;td&gt;137 t/s&lt;/td&gt;
&lt;td&gt;131 t/s&lt;/td&gt;
&lt;td&gt;122 t/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Avg Ctx&lt;/td&gt;
&lt;td&gt;10 K&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;120 K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Max Ctx&lt;/td&gt;
&lt;td&gt;120 K&lt;/td&gt;
&lt;td&gt;110 K&lt;/td&gt;
&lt;td&gt;110 K&lt;/td&gt;
&lt;td&gt;80 K&lt;/td&gt;
&lt;td&gt;200 K&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;q5 KV cache only marginally improves the average context story. &lt;code&gt;--spec-draft-n-max 1&lt;/code&gt; gives 10 K average context at 151 t/s. Standard decoding at q5 gives 122 t/s with 120 K average context.&lt;/p&gt;

&lt;h3&gt;
  
  
  Qwen 3.6 35B Takeaway
&lt;/h3&gt;

&lt;p&gt;On a 16 GB GPU the 35B MoE model with MTP faces a hard wall: the average usable context collapses to 10–15 K tokens, making it impractical for real workloads. Standard decoding at 122–146 t/s with 80–120 K context is significantly more useful.&lt;/p&gt;

&lt;p&gt;If you have &lt;strong&gt;24 GB+ VRAM&lt;/strong&gt;, the 35B + MTP combination becomes much more attractive — the context window issue disappears and you keep the speed benefit.&lt;/p&gt;

&lt;h2&gt;
  
  
  Choosing the Right &lt;code&gt;--spec-draft-n-max&lt;/code&gt; Value
&lt;/h2&gt;

&lt;p&gt;The question of how many speculative tokens to propose per step (&lt;code&gt;--spec-draft-n-max&lt;/code&gt;) does not have a single right answer — it depends on both model architecture and available VRAM:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For &lt;strong&gt;27B dense&lt;/strong&gt; on 16 GB: &lt;code&gt;--spec-draft-n-max 2&lt;/code&gt; with q8 KV is the fastest, &lt;code&gt;--spec-draft-n-max 1&lt;/code&gt; with q5 KV is the most context-friendly.&lt;/li&gt;
&lt;li&gt;For &lt;strong&gt;35B MoE&lt;/strong&gt; on 16 GB: &lt;code&gt;--spec-draft-n-max 1&lt;/code&gt; is the only option that keeps any usable context, and even then only marginally.&lt;/li&gt;
&lt;li&gt;Higher values (&lt;code&gt;3&lt;/code&gt;, &lt;code&gt;4&lt;/code&gt;) increase VRAM pressure without proportional speed gains — at max 4 you're spending roughly the same extra VRAM as max 2 but gen speed doesn't keep pace.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How to Enable MTP in llama.cpp
&lt;/h2&gt;

&lt;p&gt;Make sure you use an MTP-enabled GGUF (the filename contains &lt;code&gt;MTP&lt;/code&gt;). If you are new to llama.cpp flags, the &lt;a href="https://www.glukhov.org/llm-hosting/llama-cpp/" rel="noopener noreferrer"&gt;llama.cpp Quickstart with CLI and Server&lt;/a&gt; covers all the fundamentals. Then launch llama-server or llama-cli with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;llama-server &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--model&lt;/span&gt; Qwen3.6-27B-UD-IQ3_XXS-MTP.gguf &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--ctx-size&lt;/span&gt; 40000 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-ngl&lt;/span&gt; 99 &lt;span class="nt"&gt;--flash-attn&lt;/span&gt; on &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--cache-type-k&lt;/span&gt; q8_0 &lt;span class="nt"&gt;--cache-type-v&lt;/span&gt; q8_0 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--spec-type&lt;/span&gt; draft-mtp &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--spec-draft-n-max&lt;/span&gt; 2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For q5 KV cache, replace &lt;code&gt;q8_0&lt;/code&gt; with &lt;code&gt;q5_1&lt;/code&gt; or &lt;code&gt;q5_0&lt;/code&gt; and adjust &lt;code&gt;--ctx-size&lt;/code&gt; upward:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;llama-server &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--model&lt;/span&gt; Qwen3.6-27B-UD-IQ3_XXS-MTP.gguf &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--ctx-size&lt;/span&gt; 80000 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-ngl&lt;/span&gt; 99 &lt;span class="nt"&gt;--flash-attn&lt;/span&gt; on &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--cache-type-k&lt;/span&gt; q5_1 &lt;span class="nt"&gt;--cache-type-v&lt;/span&gt; q5_1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--spec-type&lt;/span&gt; draft-mtp &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--spec-draft-n-max&lt;/span&gt; 1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;MTP is activated automatically once llama.cpp sees the MTP heads in the GGUF file and &lt;code&gt;--spec-type draft-mtp&lt;/code&gt; is set.&lt;br&gt;
So standard Qwen3.6-27B-UD-IQ3_XXS.gguf will not work in MTP mode, you will need Qwen3.6-27B-UD-IQ3_XXS-MTP.gguf.&lt;br&gt;
But the Qwen3.6-27B-UD-IQ3_XXS-MTP.gguf can work in both Speculative decoding mode and autoregressive one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;On a 16 GB GPU (RTX 4080), with these quants, llama.cpp's MTP is a clear win for &lt;strong&gt;Qwen 3.6 27B&lt;/strong&gt; and a net negative for &lt;strong&gt;Qwen 3.6 35B&lt;/strong&gt; in practical use:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Qwen 3.6 27B (IQ3_XXS) — MTP is worthwhile:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;q8 KV + MTP max 2 → &lt;strong&gt;~67 % faster generation&lt;/strong&gt;, context 40–60 K (vs 80–100 K without MTP)&lt;/li&gt;
&lt;li&gt;q5 KV + MTP max 1 → &lt;strong&gt;~39 % faster generation&lt;/strong&gt;, context 70–100 K (vs 130–160 K without MTP)&lt;/li&gt;
&lt;li&gt;Good balance of speed and VRAM efficiency at &lt;code&gt;--spec-draft-n-max 2&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Qwen 3.6 35B (IQ3_S) — MTP is not practical at 16 GB:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Generation speed is 27–29 % higher but average context collapses to 10–15 K at q8, 10 K at q5&lt;/li&gt;
&lt;li&gt;Standard decoding at 122–146 t/s with 80–120 K context is more useful for real tasks&lt;/li&gt;
&lt;li&gt;The situation improves substantially on 24 GB+ VRAM&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;On paper, q5 KV cache is the obvious answer for maximising context window while keeping MTP speed gains — but in practice the quality drop moving from q8 to q5 can be significant. Test q5 on your own tasks before adopting it; for my workloads the degradation was unacceptable, and q8 with a tighter context budget remains the better trade-off.&lt;/p&gt;

&lt;p&gt;For the wider picture of LLM serving options and infrastructure trade-offs, see the &lt;a href="https://www.glukhov.org/llm-hosting/" rel="noopener noreferrer"&gt;LLM Hosting in 2026&lt;/a&gt; pillar and &lt;a href="https://www.glukhov.org/llm-performance/" rel="noopener noreferrer"&gt;LLM Performance in 2026&lt;/a&gt;. If you are tuning Qwen 3.6 sampler settings alongside MTP, the &lt;a href="https://www.glukhov.org/llm-performance/benchmarks/agentic-inference-parameters-reference/" rel="noopener noreferrer"&gt;Agentic LLM Inference Parameters Reference for Qwen 3.6 and Gemma 4&lt;/a&gt; is a useful companion.&lt;/p&gt;

</description>
      <category>selfhosting</category>
      <category>llm</category>
      <category>ai</category>
      <category>llamacpp</category>
    </item>
    <item>
      <title>Unload All llama.cpp Router Models Without Restarting</title>
      <dc:creator>Rost</dc:creator>
      <pubDate>Wed, 20 May 2026 01:00:03 +0000</pubDate>
      <link>https://dev.to/rosgluk/unload-all-llamacpp-router-models-without-restarting-8dj</link>
      <guid>https://dev.to/rosgluk/unload-all-llamacpp-router-models-without-restarting-8dj</guid>
      <description>&lt;p&gt;&lt;a href="https://www.glukhov.org/llm-hosting/llama-cpp/llama-server-router-mode/" rel="noopener noreferrer"&gt;llama.cpp router mode&lt;/a&gt; is one of the most useful changes to &lt;code&gt;llama-server&lt;/code&gt; in years. It finally gives local LLM operators something close to the model management experience people expect from Ollama, while keeping the raw performance and low-level control that make &lt;a href="https://www.glukhov.org/llm-hosting/llama-cpp/" rel="noopener noreferrer"&gt;llama.cpp&lt;/a&gt; worth using in the first place.&lt;/p&gt;

&lt;p&gt;But there is one sharp edge: unloading everything is not a single magic button in the HTTP API.&lt;/p&gt;

&lt;p&gt;The router can list models. It can load a model. It can unload a model. It can evict the least recently used model when &lt;code&gt;--models-max&lt;/code&gt; is reached. What it does not currently document as a first-class endpoint is a universal &lt;code&gt;unload all models now&lt;/code&gt; call.&lt;/p&gt;

&lt;p&gt;That is not a real blocker. The correct pattern is simple, explicit, and scriptable:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Ask the router which models exist.&lt;/li&gt;
&lt;li&gt;Filter the models whose status is &lt;code&gt;loaded&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Call &lt;code&gt;/models/unload&lt;/code&gt; once per loaded model.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is the approach I recommend for serious &lt;a href="https://www.glukhov.org/llm-hosting/" rel="noopener noreferrer"&gt;local LLM workflows&lt;/a&gt;. It is boring, visible, and easy to debug. That is exactly what you want when your goal is to free VRAM without restarting the whole inference service.&lt;/p&gt;

&lt;h2&gt;
  
  
  What llama.cpp router mode actually does
&lt;/h2&gt;

&lt;p&gt;In classic &lt;code&gt;llama-server&lt;/code&gt; usage, you start one server with one model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;llama-server &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--model&lt;/span&gt; ./models/qwen3-8b.gguf &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--port&lt;/span&gt; 8080
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Router mode changes that model. Instead of binding the server to one GGUF file, the router becomes a coordinator for multiple models. It can discover models from a cache or from a models directory, load them on demand, route requests to the correct model, and unload models when needed.&lt;/p&gt;

&lt;p&gt;A typical router-mode startup looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;llama-server &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--models-dir&lt;/span&gt; ./models &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--models-max&lt;/span&gt; 4 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--port&lt;/span&gt; 8080
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The important option here is &lt;code&gt;--models-max&lt;/code&gt;. It controls how many models may be loaded at the same time. If the limit is reached, llama.cpp can evict the least recently used model. That is useful, but it is not a substitute for a deliberate unload operation. LRU eviction is reactive. An unload script is operational control.&lt;/p&gt;

&lt;p&gt;My opinionated take: if you run local models for real work, you should treat router mode like an inference process manager, not like a toy chat server. Explicit lifecycle operations matter.&lt;/p&gt;

&lt;h2&gt;
  
  
  The model management endpoints you need
&lt;/h2&gt;

&lt;p&gt;The main endpoint for discovery is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; http://localhost:8080/models | jq
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That endpoint returns the models known to the router and their current lifecycle status. The exact JSON shape can vary slightly between builds, so inspect your own response before writing automation.&lt;/p&gt;

&lt;p&gt;A common response shape looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"data"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"qwen3-8b"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"loaded"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"llama-3.2-3b"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"unloaded"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To unload one model, call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:8080/models/unload &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"model":"qwen3-8b"}'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  | jq
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is the primitive operation. Everything else in this article builds on that.&lt;/p&gt;

&lt;h2&gt;
  
  
  There is no documented unload all endpoint
&lt;/h2&gt;

&lt;p&gt;This is the part that trips people up.&lt;/p&gt;

&lt;p&gt;You might expect something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:8080/models/unload-all
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Do not build around that assumption. The documented operation is per model. You pass a model identifier to &lt;code&gt;/models/unload&lt;/code&gt;, and llama.cpp unloads that one model.&lt;/p&gt;

&lt;p&gt;This is not necessarily bad API design. A per-model operation is safer. It makes the caller decide what should be unloaded. It also avoids surprising production behavior where one admin request accidentally kills every warm model being used by other clients.&lt;/p&gt;

&lt;p&gt;For a workstation, an unload-all shortcut would be convenient. For a multi-user inference box, explicit loops are better.&lt;/p&gt;

&lt;h2&gt;
  
  
  Unload one model first
&lt;/h2&gt;

&lt;p&gt;Before automating anything, test the exact model identifier your router expects.&lt;/p&gt;

&lt;p&gt;First list models:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; http://localhost:8080/models | jq
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pick one loaded model from the output, then unload it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:8080/models/unload &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"model":"qwen3-8b"}'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  | jq
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check the model list again:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; http://localhost:8080/models | jq
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the model status changes to &lt;code&gt;unloaded&lt;/code&gt;, your endpoint, port, and model identifier are correct.&lt;/p&gt;

&lt;p&gt;If it does not work, do not guess. Inspect the JSON. Router aliases, GGUF filenames, and model IDs are often not the same string.&lt;/p&gt;

&lt;h2&gt;
  
  
  Unload all loaded models with curl and jq
&lt;/h2&gt;

&lt;p&gt;Once the single-model unload works, the unload-all pattern is just a shell loop.&lt;/p&gt;

&lt;p&gt;Use this when your &lt;code&gt;/models&lt;/code&gt; response has &lt;code&gt;.data[].id&lt;/code&gt; and &lt;code&gt;.data[].status&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; http://localhost:8080/models &lt;span class="se"&gt;\&lt;/span&gt;
| jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'.data[] | select(.status == "loaded") | .id'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
| &lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="nv"&gt;IFS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;read&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; model&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Unloading: &lt;/span&gt;&lt;span class="nv"&gt;$model&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
    curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:8080/models/unload &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s2"&gt;"{&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;model&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="nv"&gt;$model&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;}"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
      | jq
  &lt;span class="k"&gt;done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the whole trick. It is not glamorous, but it is the right shape for an admin operation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It only unloads models that are actually loaded.&lt;/li&gt;
&lt;li&gt;It prints what it is doing.&lt;/li&gt;
&lt;li&gt;It fails model by model instead of hiding everything behind one opaque action.&lt;/li&gt;
&lt;li&gt;It works from cron, systemd hooks, SSH, or CI jobs.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  A reusable script for production use
&lt;/h2&gt;

&lt;p&gt;For anything you run more than twice, stop pasting one-liners. Save a script.&lt;/p&gt;

&lt;p&gt;Create &lt;code&gt;llama-router-unload-all.sh&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/usr/bin/env bash&lt;/span&gt;
&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-euo&lt;/span&gt; pipefail

&lt;span class="nv"&gt;LLAMA_SERVER_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;LLAMA_SERVER_URL&lt;/span&gt;&lt;span class="k"&gt;:-&lt;/span&gt;&lt;span class="nv"&gt;http&lt;/span&gt;://localhost:8080&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="nv"&gt;models_json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;curl &lt;span class="nt"&gt;-fsS&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$LLAMA_SERVER_URL&lt;/span&gt;&lt;span class="s2"&gt;/models"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="nv"&gt;loaded_models&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;printf&lt;/span&gt; &lt;span class="s1"&gt;'%s'&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$models_json&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  | jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'.data[] | select(.status == "loaded") | .id'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nt"&gt;-z&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$loaded_models&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
  &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"No loaded models found."&lt;/span&gt;
  &lt;span class="nb"&gt;exit &lt;/span&gt;0
&lt;span class="k"&gt;fi

&lt;/span&gt;&lt;span class="nb"&gt;printf&lt;/span&gt; &lt;span class="s1"&gt;'%s\n'&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$loaded_models&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | &lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="nv"&gt;IFS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;read&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; model&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nt"&gt;-z&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$model&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="k"&gt;continue

  &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Unloading: &lt;/span&gt;&lt;span class="nv"&gt;$model&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

  curl &lt;span class="nt"&gt;-fsS&lt;/span&gt; &lt;span class="nt"&gt;-X&lt;/span&gt; POST &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$LLAMA_SERVER_URL&lt;/span&gt;&lt;span class="s2"&gt;/models/unload"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s2"&gt;"{&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;model&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="nv"&gt;$model&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;}"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    | jq

&lt;span class="k"&gt;done

&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Done. Current model state:"&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsS&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$LLAMA_SERVER_URL&lt;/span&gt;&lt;span class="s2"&gt;/models"&lt;/span&gt; | jq
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Make it executable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;chmod&lt;/span&gt; +x llama-router-unload-all.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run it against the default local server:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./llama-router-unload-all.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run it against another host:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;LLAMA_SERVER_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;http://192.168.1.50:8080 ./llama-router-unload-all.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the version I would actually keep in a tools directory. It uses &lt;code&gt;curl -f&lt;/code&gt; so HTTP errors fail the script, and it lets you override the server URL without editing the file.&lt;/p&gt;

&lt;h2&gt;
  
  
  Adapting the script to your JSON shape
&lt;/h2&gt;

&lt;p&gt;Do not blindly assume every llama.cpp build returns the exact same fields forever. Router mode is still evolving, and your build may expose a slightly different JSON shape.&lt;/p&gt;

&lt;p&gt;Start by inspecting the response:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; http://localhost:8080/models | jq
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The script uses this filter:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'.data[] | select(.status == "loaded") | .id'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If your model identifier is in &lt;code&gt;.name&lt;/code&gt;, change it to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'.data[] | select(.status == "loaded") | .name'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If your status field uses another value, adjust the filter accordingly. The principle is what matters: select loaded models, extract the identifier accepted by &lt;code&gt;/models/unload&lt;/code&gt;, then call unload for each one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why models may load again after you unload them
&lt;/h2&gt;

&lt;p&gt;This is the most common source of confusion.&lt;/p&gt;

&lt;p&gt;Router mode supports on-demand loading. If a client sends a chat completion request for a model that is currently unloaded, the router may load it again automatically.&lt;/p&gt;

&lt;p&gt;That means this sequence is possible:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;You unload every model.&lt;/li&gt;
&lt;li&gt;Open WebUI, a test script, or an agent sends a request.&lt;/li&gt;
&lt;li&gt;llama.cpp loads the requested model again.&lt;/li&gt;
&lt;li&gt;You think unload failed, but it did not.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The fix is operational, not technical. Stop client traffic first if your goal is to keep VRAM free.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Stop benchmark scripts.&lt;/li&gt;
&lt;li&gt;Pause agents and cron jobs.&lt;/li&gt;
&lt;li&gt;Close or disconnect Open WebUI sessions.&lt;/li&gt;
&lt;li&gt;Disable health checks that accidentally perform real model requests.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Unloading is not a firewall. If clients keep asking for models, router mode is doing its job by serving them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Open WebUI and the Eject button
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.glukhov.org/llm-hosting/llm-frontends/open-webui-overview-quickstart-and-alternatives/" rel="noopener noreferrer"&gt;Open WebUI&lt;/a&gt; can integrate with llama.cpp model unload support. When the provider is configured as &lt;code&gt;llama.cpp&lt;/code&gt;, Open WebUI can show loaded-model state and expose an Eject action for admins.&lt;/p&gt;

&lt;p&gt;Under the hood, that action calls Open WebUI's own unload API, which then calls llama.cpp's &lt;code&gt;/models/unload&lt;/code&gt; endpoint on the configured connection.&lt;/p&gt;

&lt;p&gt;That is nice for manual operation, but I would still keep the shell script. A UI button is convenient. A script is auditable, repeatable, and usable on a headless box at 2 AM.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to use unload all
&lt;/h2&gt;

&lt;p&gt;Unloading every loaded model is useful when you want to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Free GPU memory before starting a larger model.&lt;/li&gt;
&lt;li&gt;Reset a development box without restarting &lt;code&gt;llama-server&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Prepare for a benchmark run with a clean memory state.&lt;/li&gt;
&lt;li&gt;Drain local inference workloads before maintenance.&lt;/li&gt;
&lt;li&gt;Recover from a messy session where too many models were warmed.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It is not the right tool when active users are depending on warm models. In that case, tune &lt;code&gt;--models-max&lt;/code&gt;, use deliberate routing, and let LRU eviction do part of the work. If you need smarter timeout-based unloading with per-model lifecycle control, &lt;a href="https://www.glukhov.org/llm-hosting/llama-swap/" rel="noopener noreferrer"&gt;llama-swap&lt;/a&gt; is a purpose-built proxy that layers exactly that on top of any &lt;code&gt;llama-server&lt;/code&gt; setup.&lt;/p&gt;

&lt;p&gt;My rule is simple: use LRU for normal pressure, use explicit unload for operator intent.&lt;/p&gt;

&lt;h2&gt;
  
  
  Troubleshooting
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The models endpoint returns 404
&lt;/h3&gt;

&lt;p&gt;You may not be running a router-capable build, or you may be calling the wrong port.&lt;/p&gt;

&lt;p&gt;Check the server process and available options:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;llama-server &lt;span class="nt"&gt;--help&lt;/span&gt; | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-i&lt;/span&gt; models
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then test both endpoints:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; http://localhost:8080/models | jq
curl &lt;span class="nt"&gt;-s&lt;/span&gt; http://localhost:8080/v1/models | jq
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;/v1/models&lt;/code&gt; endpoint is the OpenAI-compatible model list. The &lt;code&gt;/models&lt;/code&gt; endpoint is the router model-management endpoint. They are related, but they are not the same thing.&lt;/p&gt;

&lt;h3&gt;
  
  
  jq is not installed
&lt;/h3&gt;

&lt;p&gt;Install it before scripting JSON parsing.&lt;/p&gt;

&lt;p&gt;On Ubuntu or Debian:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;apt-get update
&lt;span class="nb"&gt;sudo &lt;/span&gt;apt-get &lt;span class="nb"&gt;install &lt;/span&gt;jq
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On macOS with Homebrew:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;brew &lt;span class="nb"&gt;install &lt;/span&gt;jq
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The unload call returns an error
&lt;/h3&gt;

&lt;p&gt;Most failures come from passing the wrong model identifier. Use the exact identifier returned by &lt;code&gt;/models&lt;/code&gt;, not the filename you think should work.&lt;/p&gt;

&lt;p&gt;Also check whether your model name contains quotes, slashes, or spaces. The script above handles normal strings well, but unusual names may require more careful JSON construction.&lt;/p&gt;

&lt;p&gt;For maximum safety, you can build the POST body with &lt;code&gt;jq&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;jq &lt;span class="nt"&gt;-n&lt;/span&gt; &lt;span class="nt"&gt;--arg&lt;/span&gt; model &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$model&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="s1"&gt;'{model: $model}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A more defensive unload loop would use that body instead of hand-escaped JSON.&lt;/p&gt;

&lt;h3&gt;
  
  
  VRAM is not freed immediately
&lt;/h3&gt;

&lt;p&gt;First confirm the model status changed. Then check whether another request reloaded it. Also remember that GPU memory tools can lag or report allocator behavior rather than instant application-level intent.&lt;/p&gt;

&lt;p&gt;The practical test is simple: stop traffic, unload models, list model status, then inspect GPU memory. For measured VRAM usage across model sizes and context windows on llama.cpp, the &lt;a href="https://www.glukhov.org/llm-performance/benchmarks/best-llm-on-16gb-vram-gpu/" rel="noopener noreferrer"&gt;16 GB VRAM llama.cpp benchmarks&lt;/a&gt; give concrete figures to sanity-check against.&lt;/p&gt;

&lt;h2&gt;
  
  
  A safer JSON body version
&lt;/h2&gt;

&lt;p&gt;If your model identifiers contain unusual characters, use &lt;code&gt;jq&lt;/code&gt; to generate the JSON request body:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; http://localhost:8080/models &lt;span class="se"&gt;\&lt;/span&gt;
| jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'.data[] | select(.status == "loaded") | .id'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
| &lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="nv"&gt;IFS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;read&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; model&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Unloading: &lt;/span&gt;&lt;span class="nv"&gt;$model&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
    &lt;span class="nv"&gt;body&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;jq &lt;span class="nt"&gt;-n&lt;/span&gt; &lt;span class="nt"&gt;--arg&lt;/span&gt; model &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$model&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="s1"&gt;'{model: $model}'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
    curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:8080/models/unload &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$body&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
      | jq
  &lt;span class="k"&gt;done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the version to use if your models are named with repository-style identifiers, custom aliases, or paths.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final take
&lt;/h2&gt;

&lt;p&gt;llama.cpp router mode is a big step forward for local LLM operations. It gives you dynamic loading, model switching, and memory-aware eviction without giving up the directness of &lt;code&gt;llama-server&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;But do not wait for a perfect unload-all endpoint. The clean solution already exists: list loaded models and unload them one by one.&lt;/p&gt;

&lt;p&gt;That pattern is explicit. It is scriptable. It works over SSH. It plays nicely with Open WebUI. And most importantly, it frees VRAM without restarting the router.&lt;/p&gt;

&lt;p&gt;For local AI infrastructure, that is exactly the kind of boring control surface you want.&lt;/p&gt;

</description>
      <category>cheatsheet</category>
      <category>selfhosting</category>
      <category>llm</category>
    </item>
    <item>
      <title>LLM Wiki - Compiled Knowledge That RAG Cannot Replace</title>
      <dc:creator>Rost</dc:creator>
      <pubDate>Mon, 18 May 2026 09:22:56 +0000</pubDate>
      <link>https://dev.to/rosgluk/llm-wiki-compiled-knowledge-that-rag-cannot-replace-8op</link>
      <guid>https://dev.to/rosgluk/llm-wiki-compiled-knowledge-that-rag-cannot-replace-8op</guid>
      <description>&lt;p&gt;The premise is simple: compiled knowledge is more reusable than retrieved fragments.&lt;br&gt;
RAG became the default answer to a straightforward question - how do I give an LLM access to external knowledge?&lt;/p&gt;



&lt;p&gt;And the usual architecture is by now familiar.&lt;br&gt;
Take documents, split them into chunks, embed the chunks, store them in a vector database, retrieve relevant pieces at query time, and pass them into the model. That pattern is useful, but it is also overused. RAG is very good at access and not automatically good at structure. It can find relevant fragments but does not create a stable understanding of a domain, it can retrieve context but does not decide what the canonical explanation is, and it can answer from documents but does not maintain a living knowledge base.&lt;/p&gt;

&lt;p&gt;LLM Wiki is not just another retrieval pattern but a different way to think about knowledge architecture entirely. Instead of asking the model to synthesize from raw chunks every time a question is asked, an LLM Wiki uses the model earlier in the pipeline, performing synthesis at ingest time and storing the result as structured, readable, linked knowledge.&lt;/p&gt;

&lt;p&gt;A good shorthand is this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;RAG retrieves knowledge at query time.&lt;/li&gt;
&lt;li&gt;LLM Wiki compiles knowledge at ingest time.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That distinction changes cost, latency, quality, maintenance, governance, and failure modes - and it is the central reason LLM Wiki deserves its own architecture category.&lt;/p&gt;
&lt;h2&gt;
  
  
  RAG optimizes retrieval, not representation
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.glukhov.org/rag/" rel="noopener noreferrer"&gt;RAG&lt;/a&gt; is powerful because it lets a language model use information outside its training data, making it useful for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;company documentation&lt;/li&gt;
&lt;li&gt;product manuals&lt;/li&gt;
&lt;li&gt;technical support&lt;/li&gt;
&lt;li&gt;internal search&lt;/li&gt;
&lt;li&gt;research assistants&lt;/li&gt;
&lt;li&gt;policy lookup&lt;/li&gt;
&lt;li&gt;code documentation&lt;/li&gt;
&lt;li&gt;knowledge base chatbots&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But RAG has a structural weakness: it often treats knowledge as a pile of retrievable fragments rather than a structured model of a domain.&lt;/p&gt;

&lt;p&gt;A typical RAG system works like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Collect documents.&lt;/li&gt;
&lt;li&gt;Split them into chunks.&lt;/li&gt;
&lt;li&gt;Create embeddings.&lt;/li&gt;
&lt;li&gt;Store the chunks in a vector database.&lt;/li&gt;
&lt;li&gt;Retrieve similar chunks for each query.&lt;/li&gt;
&lt;li&gt;Ask the LLM to answer using those chunks.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This works well for many questions, but it also creates repeated interpretation work for complex ones. Every time a user asks something conceptually rich, the system has to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;retrieve fragments&lt;/li&gt;
&lt;li&gt;decide which fragments matter&lt;/li&gt;
&lt;li&gt;infer relationships&lt;/li&gt;
&lt;li&gt;resolve contradictions&lt;/li&gt;
&lt;li&gt;build a temporary explanation&lt;/li&gt;
&lt;li&gt;produce an answer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then that synthesis disappears and the next query starts from scratch. This is fine when questions are simple, but it becomes wasteful when the same concepts are repeatedly reconstructed from raw fragments.&lt;/p&gt;

&lt;p&gt;The most common RAG mistake is assuming that better retrieval equals better knowledge. Sometimes that is true, but often it is not, because &lt;a href="https://www.glukhov.org/knowledge-management/foundations/retrieval-vs-representation/" rel="noopener noreferrer"&gt;retrieval and representation solve different problems&lt;/a&gt;. Retrieval answers which pieces of text are relevant; representation answers how knowledge should be structured in the first place. A RAG system can retrieve five accurate chunks about a topic and still fail because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the chunks are outdated&lt;/li&gt;
&lt;li&gt;the documents contradict each other&lt;/li&gt;
&lt;li&gt;the important concept is spread across pages&lt;/li&gt;
&lt;li&gt;the source uses inconsistent terminology&lt;/li&gt;
&lt;li&gt;the answer requires synthesis, not lookup&lt;/li&gt;
&lt;li&gt;there is no canonical page&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;RAG is an access layer, not a knowledge model by itself, and an LLM Wiki exists precisely because some knowledge should be represented before it is retrieved.&lt;/p&gt;
&lt;h2&gt;
  
  
  What is an LLM Wiki?
&lt;/h2&gt;

&lt;p&gt;An LLM Wiki is a knowledge system where a language model helps transform source material into structured wiki-like knowledge. Instead of storing only raw documents and retrieving chunks later, the system creates derived knowledge artifacts such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;topic pages&lt;/li&gt;
&lt;li&gt;summaries&lt;/li&gt;
&lt;li&gt;glossaries&lt;/li&gt;
&lt;li&gt;concept pages&lt;/li&gt;
&lt;li&gt;entity pages&lt;/li&gt;
&lt;li&gt;cross-links&lt;/li&gt;
&lt;li&gt;comparisons&lt;/li&gt;
&lt;li&gt;contradiction notes&lt;/li&gt;
&lt;li&gt;source references&lt;/li&gt;
&lt;li&gt;decision records&lt;/li&gt;
&lt;li&gt;explanations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The output is usually human-readable and, in many implementations, stored as plain Markdown, which matters because Markdown makes the system:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;inspectable&lt;/li&gt;
&lt;li&gt;portable&lt;/li&gt;
&lt;li&gt;editable&lt;/li&gt;
&lt;li&gt;versionable&lt;/li&gt;
&lt;li&gt;easy to diff&lt;/li&gt;
&lt;li&gt;compatible with static sites and PKM tools&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The idea is not that the LLM magically knows everything but that the LLM helps maintain a structured layer over the source material, acting as a structuring assistant rather than the final authority.&lt;/p&gt;
&lt;h2&gt;
  
  
  The core idea
&lt;/h2&gt;

&lt;p&gt;The core idea of LLM Wiki is ingest-time knowledge synthesis. In a RAG system, synthesis usually happens when a user asks a question; in an LLM Wiki, synthesis happens earlier, during ingestion, before any question has been asked.&lt;/p&gt;

&lt;p&gt;A simplified pipeline looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sources
  -&amp;gt; ingest
  -&amp;gt; summarize
  -&amp;gt; structure
  -&amp;gt; link
  -&amp;gt; maintain
  -&amp;gt; query or browse
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The system does not wait until query time to figure out what the knowledge means - it creates a reusable structure in advance, which makes LLM Wiki closer to a compiled knowledge base than a search pipeline.&lt;/p&gt;

&lt;h2&gt;
  
  
  A practical example
&lt;/h2&gt;

&lt;p&gt;Imagine you have 60 articles about local LLM hosting. A RAG system might split them into chunks and retrieve relevant sections when you ask about the differences between Ollama, vLLM, llama.cpp, and SGLang, then let the LLM assemble an answer from those retrieved fragments.&lt;/p&gt;

&lt;p&gt;An LLM Wiki system does something different. At ingest time, it creates structured pages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ollama.md&lt;/li&gt;
&lt;li&gt;vllm.md&lt;/li&gt;
&lt;li&gt;llama-cpp.md&lt;/li&gt;
&lt;li&gt;sglang.md&lt;/li&gt;
&lt;li&gt;local-llm-hosting-overview.md&lt;/li&gt;
&lt;li&gt;inference-backends-comparison.md&lt;/li&gt;
&lt;li&gt;gpu-memory-and-context-length.md&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then it links them. When you later ask a question, the system is not starting from raw fragments but from a structured knowledge layer that was already assembled before the question arrived - and for conceptual and comparative questions, that difference in quality is significant.&lt;/p&gt;

&lt;h2&gt;
  
  
  How LLM Wiki works
&lt;/h2&gt;

&lt;p&gt;There is no single official implementation, but most LLM Wiki systems follow the same conceptual stages.&lt;/p&gt;

&lt;h3&gt;
  
  
  Source collection
&lt;/h3&gt;

&lt;p&gt;The system starts with source material - blog posts, PDFs, Markdown notes, technical documentation, transcripts, papers, meeting notes, bookmarks, code comments, and README files - which should be preserved as a separate layer, distinct from the generated wiki. This matters because generated wiki pages are derived knowledge, not original truth, and a serious LLM Wiki should always maintain links back to sources so that every generated page can answer the basic question: where did this claim come from?&lt;/p&gt;

&lt;h3&gt;
  
  
  Ingestion and extraction
&lt;/h3&gt;

&lt;p&gt;During ingestion, the system reads source material and extracts useful knowledge. It may identify:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;main topics&lt;/li&gt;
&lt;li&gt;entities and tools&lt;/li&gt;
&lt;li&gt;definitions&lt;/li&gt;
&lt;li&gt;claims&lt;/li&gt;
&lt;li&gt;decisions&lt;/li&gt;
&lt;li&gt;examples&lt;/li&gt;
&lt;li&gt;contradictions between sources&lt;/li&gt;
&lt;li&gt;open questions&lt;/li&gt;
&lt;li&gt;recurring concepts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This stage is where LLM Wiki starts to differ from ordinary RAG: while RAG usually chunks documents for retrieval, LLM Wiki tries to understand and reshape the material conceptually rather than just making it searchable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Summarization
&lt;/h3&gt;

&lt;p&gt;The system creates summaries, but useful summaries are not just shorter versions of text - they should preserve the structure of the argument. A weak summary says "this document discusses local LLM hosting tools." A useful summary says "this document compares local LLM hosting tools by deployment complexity, GPU usage, API compatibility, and production readiness, positioning Ollama as easy for local use, vLLM as stronger for server workloads, and llama.cpp as flexible for quantized models."&lt;/p&gt;

&lt;p&gt;For technical knowledge, a summary should capture:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what problem it solves&lt;/li&gt;
&lt;li&gt;what assumptions it makes&lt;/li&gt;
&lt;li&gt;what tradeoffs it contains&lt;/li&gt;
&lt;li&gt;what dependencies it has&lt;/li&gt;
&lt;li&gt;what is still uncertain&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is where LLMs are genuinely useful, because they are good at compressing messy prose into structured explanations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Structuring
&lt;/h3&gt;

&lt;p&gt;Summaries alone are not enough - the system must also decide where knowledge belongs, which is the representation layer. Common structures include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;topic pages&lt;/li&gt;
&lt;li&gt;concept pages&lt;/li&gt;
&lt;li&gt;index pages&lt;/li&gt;
&lt;li&gt;comparison pages&lt;/li&gt;
&lt;li&gt;glossary entries&lt;/li&gt;
&lt;li&gt;how-to pages&lt;/li&gt;
&lt;li&gt;architecture notes&lt;/li&gt;
&lt;li&gt;decision records&lt;/li&gt;
&lt;li&gt;maps of related pages&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A pile of summaries is not a wiki; a wiki needs page boundaries, links, and recurring structure, and a good LLM Wiki is not measured by page count but by whether pages become genuinely reusable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Linking
&lt;/h3&gt;

&lt;p&gt;Links define the shape of the knowledge system. In a normal document archive, relationships are often implicit; in an LLM Wiki, they should become explicit. Useful link types include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;concept to concept&lt;/li&gt;
&lt;li&gt;article to summary&lt;/li&gt;
&lt;li&gt;tool to comparison&lt;/li&gt;
&lt;li&gt;problem to solution&lt;/li&gt;
&lt;li&gt;architecture to implementation&lt;/li&gt;
&lt;li&gt;source to derived page&lt;/li&gt;
&lt;li&gt;glossary term to detailed page&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is one of the most important differences between LLM Wiki and basic summarization: summaries reduce text, but links build a knowledge graph.&lt;/p&gt;

&lt;h3&gt;
  
  
  Review and correction
&lt;/h3&gt;

&lt;p&gt;This stage is optional only in toy systems; in serious systems, human review is essential. The review process should check:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;whether summaries are faithful&lt;/li&gt;
&lt;li&gt;whether links are useful&lt;/li&gt;
&lt;li&gt;whether claims are sourced&lt;/li&gt;
&lt;li&gt;whether pages are duplicated&lt;/li&gt;
&lt;li&gt;whether concepts are misplaced&lt;/li&gt;
&lt;li&gt;whether outdated information is marked&lt;/li&gt;
&lt;li&gt;whether generated pages overstate certainty&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;LLM Wiki can reduce human effort, but it should never remove human responsibility.&lt;/p&gt;

&lt;h2&gt;
  
  
  LLM Wiki vs RAG
&lt;/h2&gt;

&lt;p&gt;The cleanest distinction between LLM Wiki and RAG is timing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Query-time synthesis
&lt;/h3&gt;

&lt;p&gt;In RAG, the system retrieves information when a user asks a question.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;query
  -&amp;gt; retrieve chunks
  -&amp;gt; assemble context
  -&amp;gt; generate answer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is flexible and works well when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the corpus is large&lt;/li&gt;
&lt;li&gt;information changes often&lt;/li&gt;
&lt;li&gt;questions are unpredictable&lt;/li&gt;
&lt;li&gt;you need broad coverage&lt;/li&gt;
&lt;li&gt;you cannot curate everything&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But it may be less coherent for conceptual questions, because the model has to synthesize from fragments each time, which can produce inconsistent answers across similar queries.&lt;/p&gt;

&lt;h3&gt;
  
  
  Ingest-time synthesis
&lt;/h3&gt;

&lt;p&gt;In LLM Wiki, the system performs synthesis before the question arrives.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sources
  -&amp;gt; summarize
  -&amp;gt; structure
  -&amp;gt; link
  -&amp;gt; query or browse later
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is less flexible but more coherent, and it works well when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the corpus is manageable&lt;/li&gt;
&lt;li&gt;the domain is stable&lt;/li&gt;
&lt;li&gt;concepts repeat&lt;/li&gt;
&lt;li&gt;human readability matters&lt;/li&gt;
&lt;li&gt;you want reusable synthesis&lt;/li&gt;
&lt;li&gt;you want a maintained knowledge layer&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The main differences
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;RAG&lt;/th&gt;
&lt;th&gt;LLM Wiki&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Main timing&lt;/td&gt;
&lt;td&gt;Query time&lt;/td&gt;
&lt;td&gt;Ingest time&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Main operation&lt;/td&gt;
&lt;td&gt;Retrieve chunks&lt;/td&gt;
&lt;td&gt;Compile knowledge&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Best corpus&lt;/td&gt;
&lt;td&gt;Large and changing&lt;/td&gt;
&lt;td&gt;Curated and stable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Output&lt;/td&gt;
&lt;td&gt;Generated answer&lt;/td&gt;
&lt;td&gt;Structured knowledge pages&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Infrastructure&lt;/td&gt;
&lt;td&gt;Search index or vector DB&lt;/td&gt;
&lt;td&gt;Markdown or wiki structure&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Strength&lt;/td&gt;
&lt;td&gt;Flexible access&lt;/td&gt;
&lt;td&gt;Reusable synthesis&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Weakness&lt;/td&gt;
&lt;td&gt;Fragmented context&lt;/td&gt;
&lt;td&gt;Maintenance drift&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Human readability&lt;/td&gt;
&lt;td&gt;Often indirect&lt;/td&gt;
&lt;td&gt;Usually direct&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Complementary, not mutually exclusive
&lt;/h3&gt;

&lt;p&gt;The debate should not be framed as "LLM Wiki or RAG" - that is the wrong question. LLM Wiki does not replace RAG in most production systems; both have distinct and complementary roles. A well-designed system may look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;raw documents
  -&amp;gt; source store
  -&amp;gt; LLM Wiki synthesis
  -&amp;gt; reviewed knowledge pages
  -&amp;gt; search index
  -&amp;gt; RAG over source and synthesis
  -&amp;gt; answer with citations
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In that architecture, LLM Wiki improves the representation layer and RAG improves the access layer. Use RAG for retrieval over large and changing corpora, use LLM Wiki for compiled synthesis over stable and curated knowledge, and use both together when you need scale and coherence at the same time.&lt;/p&gt;

&lt;h2&gt;
  
  
  LLM Wiki vs adjacent systems
&lt;/h2&gt;

&lt;h3&gt;
  
  
  LLM Wiki vs summarization
&lt;/h3&gt;

&lt;p&gt;A weak LLM Wiki is just a folder of generated summaries, and that is not enough. Summarization compresses content; LLM Wiki structures it. A real LLM Wiki needs stable pages, links, concepts, indexes, source tracking, revision history, maintenance workflows, and conflict detection - the wiki part matters as much as the LLM part.&lt;/p&gt;

&lt;h3&gt;
  
  
  LLM Wiki vs knowledge graph
&lt;/h3&gt;

&lt;p&gt;A knowledge graph represents entities and relationships explicitly, while an LLM Wiki creates a softer, document-oriented graph through Markdown pages and links. A mature system can use both: the wiki provides human-readable explanations and the knowledge graph provides precisely structured, machine-queryable relationships.&lt;/p&gt;

&lt;h3&gt;
  
  
  LLM Wiki vs agent memory
&lt;/h3&gt;

&lt;p&gt;LLM Wiki is also different from &lt;a href="https://www.glukhov.org/ai-systems/memory/agent-memory-providers/" rel="noopener noreferrer"&gt;AI memory&lt;/a&gt;. Memory stores context that affects future behavior, while an LLM Wiki stores structured knowledge that can be read, searched, reviewed, and linked by both humans and systems.&lt;/p&gt;

&lt;p&gt;Memory might remember:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the user prefers Go examples&lt;/li&gt;
&lt;li&gt;the project avoids ORMs&lt;/li&gt;
&lt;li&gt;the agent tried a command yesterday&lt;/li&gt;
&lt;li&gt;a bug investigation failed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;An LLM Wiki might store:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what Go database access patterns exist&lt;/li&gt;
&lt;li&gt;how sqlc compares with GORM&lt;/li&gt;
&lt;li&gt;why outbox patterns matter&lt;/li&gt;
&lt;li&gt;how RAG differs from memory systems&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Memory is behavioral context; LLM Wiki is represented knowledge - and mixing the two leads to systems that are hard to inspect, audit, or maintain.&lt;/p&gt;

&lt;h2&gt;
  
  
  When LLM Wiki works well
&lt;/h2&gt;

&lt;p&gt;LLM Wiki works best for stable domains, personal research, curated corpora, technical documentation, and situations where repeated synthesis over the same material is wasteful.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stable domains
&lt;/h3&gt;

&lt;p&gt;LLM Wiki works best when the domain does not change every hour. Good examples include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;technical concepts&lt;/li&gt;
&lt;li&gt;research notes&lt;/li&gt;
&lt;li&gt;learning material&lt;/li&gt;
&lt;li&gt;architecture patterns&lt;/li&gt;
&lt;li&gt;book notes&lt;/li&gt;
&lt;li&gt;model comparison notes&lt;/li&gt;
&lt;li&gt;internal engineering principles&lt;/li&gt;
&lt;li&gt;curated documentation&lt;/li&gt;
&lt;li&gt;personal knowledge bases&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If knowledge is stable enough to summarize without becoming stale within days, LLM Wiki can deliver lasting value that compounds as the wiki grows.&lt;/p&gt;

&lt;h3&gt;
  
  
  Research synthesis
&lt;/h3&gt;

&lt;p&gt;Research synthesis is one of the strongest use cases, because researchers often read many sources and repeatedly ask the same meta-questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What are the main ideas?&lt;/li&gt;
&lt;li&gt;Which sources agree?&lt;/li&gt;
&lt;li&gt;Which sources conflict?&lt;/li&gt;
&lt;li&gt;What concepts repeat?&lt;/li&gt;
&lt;li&gt;What is the current state of the topic?&lt;/li&gt;
&lt;li&gt;What should I read next?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;LLM Wiki helps turn that research material into reusable structure - topic pages, comparison pages, contradiction notes, and related links - so the researcher does not have to rebuild the same mental map every time they return to a domain. It is especially useful when working with papers, technical articles, transcripts, documentation, notes, and experiment logs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Personal knowledge systems
&lt;/h3&gt;

&lt;p&gt;LLM Wiki fits naturally with &lt;a href="https://www.glukhov.org/knowledge-management/foundations/pkm-vs-rag-vs-wiki-vs-memory-systems/" rel="noopener noreferrer"&gt;PKM and the broader knowledge systems spectrum&lt;/a&gt; and &lt;a href="https://www.glukhov.org/knowledge-management/foundations/second-brain/" rel="noopener noreferrer"&gt;second brain&lt;/a&gt; workflows because a personal knowledge system already contains:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;notes&lt;/li&gt;
&lt;li&gt;links&lt;/li&gt;
&lt;li&gt;unfinished ideas&lt;/li&gt;
&lt;li&gt;summaries&lt;/li&gt;
&lt;li&gt;references&lt;/li&gt;
&lt;li&gt;topic maps&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;An LLM can help maintain the structure by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;summarizing long notes&lt;/li&gt;
&lt;li&gt;proposing links&lt;/li&gt;
&lt;li&gt;creating topic pages&lt;/li&gt;
&lt;li&gt;detecting duplicate concepts&lt;/li&gt;
&lt;li&gt;extracting glossary terms&lt;/li&gt;
&lt;li&gt;generating index pages&lt;/li&gt;
&lt;li&gt;identifying gaps&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The human remains the editor, which is the right relationship between human judgment and machine assistance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Technical blogging
&lt;/h3&gt;

&lt;p&gt;A technical blog can use LLM Wiki ideas internally even without building a full automated system. A well-structured site can include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;pillar pages&lt;/li&gt;
&lt;li&gt;cluster index pages&lt;/li&gt;
&lt;li&gt;topic summaries&lt;/li&gt;
&lt;li&gt;related article maps&lt;/li&gt;
&lt;li&gt;glossary pages&lt;/li&gt;
&lt;li&gt;comparison pages&lt;/li&gt;
&lt;li&gt;canonical explainers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is not only SEO but knowledge representation: a well-structured technical blog becomes more valuable when articles are connected into a durable knowledge structure that both humans and AI systems can navigate.&lt;/p&gt;

&lt;h3&gt;
  
  
  Small team knowledge bases
&lt;/h3&gt;

&lt;p&gt;LLM Wiki can work well for small teams with curated knowledge, including engineering decisions, product architecture, onboarding notes, support playbooks, internal standards, postmortems, and runbooks. The key condition is governance: someone must review and maintain the generated structure, because without clear ownership the wiki decays into noise regardless of how well it was initially generated.&lt;/p&gt;

&lt;h2&gt;
  
  
  When LLM Wiki is a poor fit
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Highly dynamic data
&lt;/h3&gt;

&lt;p&gt;LLM Wiki is weaker when information changes constantly. Live inventory, pricing feeds, incident status, financial market data, rapidly changing support tickets, and real-time logs are all better served by retrieval or direct API access. Compiling fast-moving data into static summaries is counterproductive unless you have a strong refresh process that keeps the compiled layer in sync with reality.&lt;/p&gt;

&lt;h3&gt;
  
  
  Large unmanaged corpora
&lt;/h3&gt;

&lt;p&gt;LLM Wiki does not automatically scale to millions of documents. At large scale, the difficult problems extend well beyond generation and include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;access control&lt;/li&gt;
&lt;li&gt;data lineage&lt;/li&gt;
&lt;li&gt;ownership&lt;/li&gt;
&lt;li&gt;deduplication&lt;/li&gt;
&lt;li&gt;indexing&lt;/li&gt;
&lt;li&gt;freshness tracking&lt;/li&gt;
&lt;li&gt;evaluation&lt;/li&gt;
&lt;li&gt;governance&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A simple Markdown wiki is not equipped to address those needs, and at enterprise scale, LLM Wiki may become one layer inside a larger knowledge architecture rather than the whole system.&lt;/p&gt;

&lt;h3&gt;
  
  
  Low-quality sources
&lt;/h3&gt;

&lt;p&gt;LLM Wiki cannot reliably fix bad sources. If the source material is contradictory, outdated, low quality, duplicated, incomplete, or badly scoped, generated pages may look polished but be wrong. This is dangerous precisely because a clean generated page creates false confidence - the formatting signals quality even when the underlying content does not justify it.&lt;/p&gt;

&lt;h3&gt;
  
  
  No review process
&lt;/h3&gt;

&lt;p&gt;LLM Wiki without review is risky because generated structure creates authority. A bad answer in RAG may affect one query, but a bad generated wiki page may affect many future queries, readers, and agents that retrieve from it. The model may overgeneralize, miss exceptions, invent structure, merge incompatible ideas, hide uncertainty, create misleading links, or summarize outdated material as though it were current - so for any knowledge that actually matters, human review is not optional.&lt;/p&gt;

&lt;h2&gt;
  
  
  Limitations and failure modes
&lt;/h2&gt;

&lt;p&gt;The main risks of building an LLM Wiki are stale summaries, hallucinated synthesis baked into the knowledge base, weak source tracking, maintenance cost, and false confidence in generated structure.&lt;/p&gt;

&lt;h3&gt;
  
  
  Maintenance drift
&lt;/h3&gt;

&lt;p&gt;Knowledge drift happens when generated pages stop matching the underlying sources. This can happen because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;sources changed&lt;/li&gt;
&lt;li&gt;new sources were added&lt;/li&gt;
&lt;li&gt;old pages were not refreshed&lt;/li&gt;
&lt;li&gt;summaries were edited manually&lt;/li&gt;
&lt;li&gt;links became outdated&lt;/li&gt;
&lt;li&gt;model output changed over time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Drift is the central operational risk of LLM Wiki, and a good system needs explicit refresh and validation workflows to catch it before it propagates.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hallucinated synthesis
&lt;/h3&gt;

&lt;p&gt;RAG can hallucinate at answer time, but LLM Wiki can hallucinate at ingest time, which is more subtle and more dangerous. If a generated wiki page contains a wrong synthesis, future users may treat that page as ground truth, and future AI systems may retrieve it and amplify the mistake further. Generated structure needs provenance, and every important claim should link back to its original sources so the hallucination can be caught during review rather than silently embedded in the knowledge base.&lt;/p&gt;

&lt;h3&gt;
  
  
  Over-structuring
&lt;/h3&gt;

&lt;p&gt;Once you have an LLM that can create pages cheaply, it is tempting to create too many of them. You can end up with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;empty taxonomy&lt;/li&gt;
&lt;li&gt;duplicate concepts&lt;/li&gt;
&lt;li&gt;shallow pages&lt;/li&gt;
&lt;li&gt;meaningless links&lt;/li&gt;
&lt;li&gt;generated clutter&lt;/li&gt;
&lt;li&gt;fake completeness&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A useful wiki is not measured by page count but by whether pages are actually reused, linked, and updated over time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Unclear ownership
&lt;/h3&gt;

&lt;p&gt;The model cannot own the page. A serious system needs clear ownership rules covering:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;who reviews pages&lt;/li&gt;
&lt;li&gt;who approves updates&lt;/li&gt;
&lt;li&gt;who deletes stale pages&lt;/li&gt;
&lt;li&gt;who resolves contradictions&lt;/li&gt;
&lt;li&gt;who decides canonical structure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without that clarity, LLM Wiki becomes another abandoned knowledge base - well-intentioned, well-generated, and quietly ignored.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture patterns
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Pattern 1. Personal LLM Wiki
&lt;/h3&gt;

&lt;p&gt;The personal pattern is the simplest and most practical version, best suited for individuals.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;notes and sources
  -&amp;gt; LLM assisted summaries
  -&amp;gt; Markdown pages
  -&amp;gt; manual review
  -&amp;gt; [Obsidian](https://www.glukhov.org/knowledge-management/tools/obsidian-for-personal-knowledge-management/ "Using Obsidian for Personal Knowledge Management") or static site
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It works well for researchers, writers, engineers, technical bloggers, students, and consultants, where the value comes from reducing repeated synthesis and making personal knowledge easier to navigate without requiring any team coordination or governance infrastructure.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern 2. Team LLM Wiki
&lt;/h3&gt;

&lt;p&gt;The team pattern is best for small groups and needs more governance than the personal version.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;team docs
  -&amp;gt; ingest workflow
  -&amp;gt; generated draft pages
  -&amp;gt; review queue
  -&amp;gt; published wiki
  -&amp;gt; search or RAG layer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The review queue is critical here, because generated knowledge should never be published directly into a team source of truth without a human checkpoint - even a lightweight review process catches the most dangerous hallucinations before they become institutional knowledge.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern 3. LLM Wiki plus RAG
&lt;/h3&gt;

&lt;p&gt;This is often the most balanced architecture, giving you both raw source access and compiled synthesis.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;raw sources
  -&amp;gt; LLM Wiki pages
  -&amp;gt; reviewed knowledge base
  -&amp;gt; search index
  -&amp;gt; RAG over raw and compiled knowledge
  -&amp;gt; cited answer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The RAG system can retrieve from original documents, generated summaries, topic pages, comparison pages, and glossary entries, which makes retrieval quality significantly stronger than operating over raw documents alone.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern 4. LLM Wiki as site architecture
&lt;/h3&gt;

&lt;p&gt;For a technical website, LLM Wiki ideas can guide content structure even without automation.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;articles
  -&amp;gt; pillar pages
  -&amp;gt; topic maps
  -&amp;gt; comparisons
  -&amp;gt; internal links
  -&amp;gt; search and AI access
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This turns a blog into a knowledge system where articles are not just posts but nodes in a structured map - a significant difference for both reader experience and machine-readable discoverability.&lt;/p&gt;

&lt;h2&gt;
  
  
  LLM Wiki design principles
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Keep raw sources separate
&lt;/h3&gt;

&lt;p&gt;Never lose the original source. Generated pages should not replace source documents but sit above them - the source layer provides evidence, the wiki layer provides interpretation, and losing the original means losing the ability to verify, challenge, or update the interpretation derived from it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Use Markdown where possible
&lt;/h3&gt;

&lt;p&gt;Markdown is boring and excellent. It is portable, readable, diffable, versionable, easy to edit, friendly to static sites, and friendly to PKM tools. Boring formats survive longer than clever platforms, which means a Markdown-based LLM Wiki built today will still be usable long after whatever proprietary database you might have chosen has gone through multiple breaking migrations. For syntax reference, see the &lt;a href="https://www.glukhov.org/documentation-tools/markdown/markdown-cheatsheet/" rel="noopener noreferrer"&gt;Markdown Cheatsheet&lt;/a&gt; and the guide to &lt;a href="https://www.glukhov.org/documentation-tools/markdown/markdown-codeblocks/" rel="noopener noreferrer"&gt;Markdown Code Blocks&lt;/a&gt;, which are especially relevant when structuring wiki pages that include technical content.&lt;/p&gt;

&lt;h3&gt;
  
  
  Track provenance
&lt;/h3&gt;

&lt;p&gt;Every generated page should answer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What sources created this?&lt;/li&gt;
&lt;li&gt;When was it generated?&lt;/li&gt;
&lt;li&gt;When was it reviewed?&lt;/li&gt;
&lt;li&gt;What changed?&lt;/li&gt;
&lt;li&gt;Who approved it?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without provenance, trust collapses over time as pages drift further from their origins. A practical page schema might look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;title
summary
status
sources
last_reviewed
related_pages
concepts
open_questions
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For technical content, add:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;applies_to
version
examples
tradeoffs
failure_modes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For research content, add:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;claims
evidence
contradictions
confidence
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Prefer fewer better pages
&lt;/h3&gt;

&lt;p&gt;Do not generate a page for every minor idea. Prefer strong concept pages, useful comparison pages, topic indexes, canonical summaries, and glossary entries that earn their place. A small useful wiki with twenty well-maintained pages beats a large generated mess with two hundred pages nobody reads or updates.&lt;/p&gt;

&lt;h3&gt;
  
  
  Make links meaningful
&lt;/h3&gt;

&lt;p&gt;Links should explain relationships rather than just connect pages at random. Useful link types include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;related concept&lt;/li&gt;
&lt;li&gt;depends on&lt;/li&gt;
&lt;li&gt;contrasts with&lt;/li&gt;
&lt;li&gt;example of&lt;/li&gt;
&lt;li&gt;source for&lt;/li&gt;
&lt;li&gt;expands on&lt;/li&gt;
&lt;li&gt;implementation of&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Random links create noise and erode reader trust in the structure.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mark uncertainty
&lt;/h3&gt;

&lt;p&gt;LLM Wiki pages should not pretend all knowledge is equally certain. Useful status markers include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;confirmed&lt;/li&gt;
&lt;li&gt;likely&lt;/li&gt;
&lt;li&gt;disputed&lt;/li&gt;
&lt;li&gt;outdated&lt;/li&gt;
&lt;li&gt;needs review&lt;/li&gt;
&lt;li&gt;source conflict&lt;/li&gt;
&lt;li&gt;generated summary&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These markers protect readers from false confidence and give maintainers a clear signal about which pages need attention.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to evaluate an LLM Wiki
&lt;/h2&gt;

&lt;p&gt;Do not only ask whether the generated pages look impressive - ask whether they improve knowledge work. Useful evaluation questions include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Can users find concepts faster?&lt;/li&gt;
&lt;li&gt;Are repeated questions answered better?&lt;/li&gt;
&lt;li&gt;Are source links preserved?&lt;/li&gt;
&lt;li&gt;Are contradictions easier to see?&lt;/li&gt;
&lt;li&gt;Are pages reused?&lt;/li&gt;
&lt;li&gt;Are summaries accurate?&lt;/li&gt;
&lt;li&gt;Is stale content detected?&lt;/li&gt;
&lt;li&gt;Does the wiki reduce repeated synthesis?&lt;/li&gt;
&lt;li&gt;Does it help humans write or decide?&lt;/li&gt;
&lt;li&gt;Does it improve RAG answer quality?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the answer is no to most of these, the wiki is decoration regardless of how many pages it contains.&lt;/p&gt;

&lt;h2&gt;
  
  
  LLM Wiki and knowledge management
&lt;/h2&gt;

&lt;p&gt;LLM Wiki belongs in &lt;a href="https://www.glukhov.org/knowledge-management/" rel="noopener noreferrer"&gt;knowledge management&lt;/a&gt; because it is fundamentally about representation, not primarily about model hosting, vector search, or agent execution. It answers a different question: how should knowledge be structured so that humans and AI systems can reuse it? That places it in the knowledge systems architecture layer, connecting naturally to PKM, wikis, RAG, agent memory, knowledge graphs, technical publishing, and research synthesis.&lt;/p&gt;

&lt;p&gt;A clean layer model looks like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Human thinking - PKM, explore and develop ideas&lt;/li&gt;
&lt;li&gt;Shared knowledge - Wiki, maintain canonical pages&lt;/li&gt;
&lt;li&gt;Compiled knowledge - LLM Wiki, generate structured synthesis&lt;/li&gt;
&lt;li&gt;Machine access - RAG, retrieve context at query time&lt;/li&gt;
&lt;li&gt;Agent continuity - Memory, persist behavior and preferences&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;LLM Wiki occupies the compiled knowledge layer, and that position is what makes it useful - it is the layer that turns a pile of documents into something both humans and machines can navigate and reason over.&lt;/p&gt;

&lt;h2&gt;
  
  
  My opinionated take
&lt;/h2&gt;

&lt;p&gt;LLM Wiki is important, but the hype is slightly wrong - it is not a RAG killer, but a reminder that knowledge representation matters. The industry spent years optimizing retrieval pipelines, and that work was necessary, but many systems still retrieve from badly structured knowledge. Better embeddings and better rerankers help, but they cannot fully compensate for a weak knowledge layer.&lt;/p&gt;

&lt;p&gt;LLM Wiki pushes the conversation back toward structure by asking better questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What are the core concepts?&lt;/li&gt;
&lt;li&gt;What is canonical?&lt;/li&gt;
&lt;li&gt;How do ideas connect?&lt;/li&gt;
&lt;li&gt;What should be summarized once?&lt;/li&gt;
&lt;li&gt;What should be retrieved fresh?&lt;/li&gt;
&lt;li&gt;What should be reviewed by humans?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the right conversation, and the future is not just better vector search but layered knowledge systems where representation, retrieval, and memory each play a distinct and well-understood role.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;LLM Wiki is an architecture pattern for compiled knowledge that uses language models to help transform source material into structured, linked, reusable knowledge before questions are asked. Its core workflow is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;summarize
  -&amp;gt; structure
  -&amp;gt; link
  -&amp;gt; review
  -&amp;gt; reuse
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Compared with RAG, the main difference is timing: RAG performs synthesis at query time, while LLM Wiki performs synthesis at ingest time, which makes it valuable for stable domains, research synthesis, personal knowledge bases, technical blogs, and curated team knowledge.&lt;/p&gt;

&lt;p&gt;But it has real limitations. It can drift when sources change, hallucinate when model output is wrong, create false confidence when review is absent, and collapse into noise when ownership is unclear. Used badly, it becomes another abandoned wiki. Used well, it becomes the representation layer between raw documents and AI systems - not a replacement for RAG, but the missing layer that makes retrieval worth using.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources and further reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://aws.amazon.com/what-is/retrieval-augmented-generation/" rel="noopener noreferrer"&gt;AWS - What Is Retrieval Augmented Generation?&lt;/a&gt; - AWS foundational overview of how RAG pipelines are constructed and when they are appropriate.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.ibm.com/think/topics/retrieval-augmented-generation" rel="noopener noreferrer"&gt;IBM - Retrieval Augmented Generation&lt;/a&gt; - IBM overview of RAG architecture, covering grounding, hallucination reduction, and enterprise use cases.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://cloud.google.com/use-cases/retrieval-augmented-generation" rel="noopener noreferrer"&gt;Google Cloud - Retrieval Augmented Generation&lt;/a&gt; - Google Cloud perspective on RAG use cases, system design, and integration with vector search.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://atlan.com/know/llm-wiki-vs-rag-knowledge-base/" rel="noopener noreferrer"&gt;Atlan - LLM Wiki vs RAG Knowledge Base&lt;/a&gt; - Practical comparison of LLM Wiki and RAG approaches from a data catalog perspective.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://ranjankumar.in/llm-wiki-synthesis-time-decision-rag-agentic-memory" rel="noopener noreferrer"&gt;Ranjan Kumar - LLM Wiki, Synthesis Time, RAG, and Agentic Memory&lt;/a&gt; - In-depth discussion of the timing distinction between synthesis approaches and how they fit into agentic architectures.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/vishalmysore/rag-vs-agent-memory-vs-llm-wiki-a-practical-comparison-1oo6"&gt;Dev.to - RAG vs Agent Memory vs LLM Wiki&lt;/a&gt; - Practical comparison of all three knowledge patterns with implementation notes.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://blog.starmorph.com/blog/karpathy-llm-wiki-knowledge-base-guide" rel="noopener noreferrer"&gt;Starmorph - Karpathy LLM Wiki Knowledge Base Guide&lt;/a&gt; - Guide inspired by Andrej Karpathy's framing of LLM Wiki as a compiled knowledge system.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.mindstudio.ai/blog/llm-wiki-vs-rag-knowledge-base" rel="noopener noreferrer"&gt;MindStudio - LLM Wiki vs RAG Knowledge Base&lt;/a&gt; - MindStudio perspective on choosing between LLM Wiki and RAG for AI assistant knowledge.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>wiki</category>
      <category>knowledgemanagement</category>
      <category>rag</category>
      <category>aisystems</category>
    </item>
    <item>
      <title>Retrieval vs Representation in Knowledge Systems</title>
      <dc:creator>Rost</dc:creator>
      <pubDate>Mon, 18 May 2026 09:22:43 +0000</pubDate>
      <link>https://dev.to/rosgluk/retrieval-vs-representation-in-knowledge-systems-5e49</link>
      <guid>https://dev.to/rosgluk/retrieval-vs-representation-in-knowledge-systems-5e49</guid>
      <description>&lt;p&gt;Most modern knowledge systems optimize retrieval, and that is understandable.&lt;br&gt;
Search is visible, easy to demo, and feels magical when it works. Type a question, get an answer.&lt;/p&gt;



&lt;p&gt;But retrieval is only one half of the problem. The deeper question is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What shape does the knowledge have before anything tries to retrieve it?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is representation — the structure behind the knowledge:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;notes&lt;/li&gt;
&lt;li&gt;pages&lt;/li&gt;
&lt;li&gt;schemas&lt;/li&gt;
&lt;li&gt;graphs&lt;/li&gt;
&lt;li&gt;entities&lt;/li&gt;
&lt;li&gt;relationships&lt;/li&gt;
&lt;li&gt;summaries&lt;/li&gt;
&lt;li&gt;taxonomies&lt;/li&gt;
&lt;li&gt;source boundaries&lt;/li&gt;
&lt;li&gt;canonical versions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Retrieval asks:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Can I find something relevant?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Representation asks:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Is the knowledge organized in a way that makes sense?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;These are not the same problem. A RAG system with poor representation becomes a fast interface to a messy archive. It can retrieve fragments, but it cannot fix broken structure. It can quote documents, but it cannot decide which one is canonical. It can assemble context, but it cannot guarantee that the underlying knowledge is coherent.&lt;/p&gt;

&lt;p&gt;This is why &lt;a href="https://www.glukhov.org/knowledge-management/knowledge-systems-architectures/compiled-knowledge/what-is-llm-wiki/" rel="noopener noreferrer"&gt;LLM Wiki&lt;/a&gt; style systems are interesting: they shift effort from query time to ingest time. Instead of only retrieving chunks when a user asks a question, they attempt to pre-structure knowledge into pages, concepts, summaries, and links. That does not make RAG obsolete — it means retrieval and representation are different layers, and good knowledge systems need both.&lt;/p&gt;
&lt;h2&gt;
  
  
  The core distinction
&lt;/h2&gt;

&lt;p&gt;Retrieval is about access; representation is about meaning.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Question&lt;/th&gt;
&lt;th&gt;Examples&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Retrieval&lt;/td&gt;
&lt;td&gt;How do I find the right information?&lt;/td&gt;
&lt;td&gt;search, embeddings, BM25, reranking, vector stores&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Representation&lt;/td&gt;
&lt;td&gt;How is knowledge structured?&lt;/td&gt;
&lt;td&gt;notes, wikis, graphs, schemas, ontologies&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reasoning&lt;/td&gt;
&lt;td&gt;How do I use the knowledge?&lt;/td&gt;
&lt;td&gt;synthesis, comparison, inference, decision making&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A weak system often jumps straight to retrieval; a strong system first asks:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;What are the core concepts?&lt;/li&gt;
&lt;li&gt;What is the canonical source?&lt;/li&gt;
&lt;li&gt;What relationships matter?&lt;/li&gt;
&lt;li&gt;What changes over time?&lt;/li&gt;
&lt;li&gt;What should be retrieved?&lt;/li&gt;
&lt;li&gt;What should already be represented?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is the difference between search over documents and an actual knowledge system.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why retrieval became dominant
&lt;/h2&gt;

&lt;p&gt;Retrieval became dominant because it maps well to the modern AI stack. A typical &lt;a href="https://www.glukhov.org/rag/" rel="noopener noreferrer"&gt;RAG&lt;/a&gt; pipeline looks like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Load documents&lt;/li&gt;
&lt;li&gt;Split them into chunks&lt;/li&gt;
&lt;li&gt;Generate embeddings&lt;/li&gt;
&lt;li&gt;Store vectors&lt;/li&gt;
&lt;li&gt;Retrieve relevant chunks&lt;/li&gt;
&lt;li&gt;Optionally rerank them&lt;/li&gt;
&lt;li&gt;Put them into an LLM prompt&lt;/li&gt;
&lt;li&gt;Generate an answer&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This pipeline is practical: it is relatively easy to build, works with messy documents, scales to large corpora, avoids retraining models, and gives LLMs access to current information. That is why RAG became the default pattern for "AI over documents."&lt;/p&gt;

&lt;p&gt;But there is a trap:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;RAG improves access to knowledge. It does not automatically improve the knowledge.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If your content is duplicated, outdated, contradictory, badly chunked, or poorly named, retrieval will surface those problems — often with confidence.&lt;/p&gt;
&lt;h2&gt;
  
  
  What representation means
&lt;/h2&gt;

&lt;p&gt;Representation is the way knowledge is shaped before retrieval happens. It answers questions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Is this knowledge stored as documents, notes, entities, or facts?&lt;/li&gt;
&lt;li&gt;Are relationships explicit or implicit?&lt;/li&gt;
&lt;li&gt;Are there canonical pages?&lt;/li&gt;
&lt;li&gt;Are there summaries?&lt;/li&gt;
&lt;li&gt;Are concepts linked?&lt;/li&gt;
&lt;li&gt;Is the system organized by topic, workflow, time, or ownership?&lt;/li&gt;
&lt;li&gt;Can a human maintain it?&lt;/li&gt;
&lt;li&gt;Can a machine reason over it?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Representation is not decoration — it determines what kind of operations are possible.&lt;/p&gt;
&lt;h2&gt;
  
  
  Forms of representation
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Documents
&lt;/h3&gt;

&lt;p&gt;Documents are the most common representation. Examples include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;articles&lt;/li&gt;
&lt;li&gt;PDFs&lt;/li&gt;
&lt;li&gt;manuals&lt;/li&gt;
&lt;li&gt;reports&lt;/li&gt;
&lt;li&gt;README files&lt;/li&gt;
&lt;li&gt;support pages&lt;/li&gt;
&lt;li&gt;blog posts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Documents are easy for humans to write, but they are often hard for machines to use because they mix facts, narrative, context, examples, opinions, outdated sections, and repeated explanations into the same container. Documents are good containers, but they are not always good knowledge structures.&lt;/p&gt;
&lt;h3&gt;
  
  
  Notes
&lt;/h3&gt;

&lt;p&gt;Notes are more flexible than documents. They can be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;atomic&lt;/li&gt;
&lt;li&gt;linked&lt;/li&gt;
&lt;li&gt;private&lt;/li&gt;
&lt;li&gt;unfinished&lt;/li&gt;
&lt;li&gt;concept focused&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A note system, such as a &lt;a href="https://www.glukhov.org/knowledge-management/foundations/pkm-vs-rag-vs-wiki-vs-memory-systems/" rel="noopener noreferrer"&gt;PKM&lt;/a&gt; or &lt;a href="https://www.glukhov.org/knowledge-management/foundations/second-brain/" rel="noopener noreferrer"&gt;second brain&lt;/a&gt;, can represent evolving knowledge better than a polished document repository. Good notes capture thinking in progress; bad notes become an unsearchable junk drawer.&lt;/p&gt;
&lt;h3&gt;
  
  
  Wikis
&lt;/h3&gt;

&lt;p&gt;Wikis represent knowledge as maintained pages. A good wiki has:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;stable pages&lt;/li&gt;
&lt;li&gt;clear topics&lt;/li&gt;
&lt;li&gt;internal links&lt;/li&gt;
&lt;li&gt;ownership&lt;/li&gt;
&lt;li&gt;canonical answers&lt;/li&gt;
&lt;li&gt;update patterns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A wiki is stronger than a loose document dump because it gives knowledge a home. "Deployment checklist" lives in one place. "Incident response" lives in one place. "RAG architecture" lives in one place. That matters because retrieval works better when knowledge has a stable structure.&lt;/p&gt;
&lt;h3&gt;
  
  
  Knowledge graphs
&lt;/h3&gt;

&lt;p&gt;Knowledge graphs represent knowledge as entities and relationships. Instead of storing only text, they model things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Person works on Project&lt;/li&gt;
&lt;li&gt;Model supports ContextLength&lt;/li&gt;
&lt;li&gt;Page depends on Concept&lt;/li&gt;
&lt;li&gt;Service connects to Database&lt;/li&gt;
&lt;li&gt;Tool implements Protocol&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Graphs are powerful because relationships become explicit, which helps with traversal, dependency analysis, entity resolution, lineage, reasoning, and recommendations. But graphs are expensive to maintain and they are not magic — a bad graph is just structured confusion.&lt;/p&gt;
&lt;h3&gt;
  
  
  Schemas and ontologies
&lt;/h3&gt;

&lt;p&gt;Schemas define expected structure; ontologies go further and define types, relations, and constraints. They answer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What kinds of things exist?&lt;/li&gt;
&lt;li&gt;What properties do they have?&lt;/li&gt;
&lt;li&gt;How can they relate?&lt;/li&gt;
&lt;li&gt;What rules apply?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is useful when correctness matters, such as in medical knowledge, legal knowledge, enterprise data catalogs, product taxonomies, and compliance systems. The tradeoff is rigidity: the more formal the representation, the more expensive it is to evolve.&lt;/p&gt;
&lt;h3&gt;
  
  
  LLM-generated representations
&lt;/h3&gt;

&lt;p&gt;Modern systems increasingly use LLMs to create representations. Examples include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;summaries&lt;/li&gt;
&lt;li&gt;extracted entities&lt;/li&gt;
&lt;li&gt;topic pages&lt;/li&gt;
&lt;li&gt;concept maps&lt;/li&gt;
&lt;li&gt;synthetic FAQs&lt;/li&gt;
&lt;li&gt;document outlines&lt;/li&gt;
&lt;li&gt;cross-links&lt;/li&gt;
&lt;li&gt;glossary entries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is where LLM Wiki style systems sit. They use the model not only to answer queries but to pre-process and structure knowledge before the query happens. RAG says "retrieve relevant chunks at query time"; LLM Wiki says "compile useful knowledge structures at ingest time." Both patterns can coexist in the same architecture.&lt;/p&gt;
&lt;h2&gt;
  
  
  What retrieval means
&lt;/h2&gt;

&lt;p&gt;Retrieval is the process of finding relevant information. Common retrieval methods include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;keyword search&lt;/li&gt;
&lt;li&gt;full text search&lt;/li&gt;
&lt;li&gt;vector search&lt;/li&gt;
&lt;li&gt;hybrid search&lt;/li&gt;
&lt;li&gt;metadata filtering&lt;/li&gt;
&lt;li&gt;graph traversal&lt;/li&gt;
&lt;li&gt;reranking&lt;/li&gt;
&lt;li&gt;query rewriting&lt;/li&gt;
&lt;li&gt;agentic search&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Retrieval is not one thing — it is a layered stack of complementary methods.&lt;/p&gt;
&lt;h3&gt;
  
  
  Keyword search
&lt;/h3&gt;

&lt;p&gt;Keyword search matches terms and is still useful because it is predictable, debuggable, fast, and good for exact terms, IDs, error messages, names, and code. Its weakness is semantic mismatch: if the user searches "how to stop repeated answers" but the document says "presence penalty", keyword search may miss the best result.&lt;/p&gt;
&lt;h3&gt;
  
  
  Vector search
&lt;/h3&gt;

&lt;p&gt;Vector search retrieves by semantic similarity. It is useful when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;wording differs&lt;/li&gt;
&lt;li&gt;concepts are fuzzy&lt;/li&gt;
&lt;li&gt;users ask natural language questions&lt;/li&gt;
&lt;li&gt;documents use inconsistent terminology&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Its weakness is precision — vector search can retrieve things that feel related but are not actually correct, which is especially risky in technical systems.&lt;/p&gt;
&lt;h3&gt;
  
  
  Hybrid search
&lt;/h3&gt;

&lt;p&gt;Hybrid search combines keyword and vector retrieval, which is often better than either alone. Keyword search catches exact matches; vector search catches conceptual matches. For technical knowledge bases, hybrid retrieval is usually a strong default.&lt;/p&gt;
&lt;h3&gt;
  
  
  Reranking
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://www.glukhov.org/rag/reranking/reranking-with-embedding-models/" rel="noopener noreferrer"&gt;Reranking&lt;/a&gt; takes an initial set of retrieved results and reorders them using a stronger model. This improves quality because the first retrieval step is often broad. A typical pattern retrieves 50 chunks, reranks to the top 5 or 10, then passes only the best context to the LLM. Reranking is one of the most practical ways to improve RAG quality.&lt;/p&gt;
&lt;h3&gt;
  
  
  Agentic retrieval
&lt;/h3&gt;

&lt;p&gt;Agentic retrieval turns search into a process. Instead of one query, an agent may:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Ask an initial question&lt;/li&gt;
&lt;li&gt;Search&lt;/li&gt;
&lt;li&gt;Inspect results&lt;/li&gt;
&lt;li&gt;Reformulate the query&lt;/li&gt;
&lt;li&gt;Search again&lt;/li&gt;
&lt;li&gt;Compare sources&lt;/li&gt;
&lt;li&gt;Synthesize an answer&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is closer to research than search. It is useful for complex questions, but it is slower and harder to control.&lt;/p&gt;
&lt;h2&gt;
  
  
  Retrieval without representation is fragile
&lt;/h2&gt;

&lt;p&gt;A retrieval system can only retrieve what exists. It cannot reliably fix:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;unclear concepts&lt;/li&gt;
&lt;li&gt;duplicate pages&lt;/li&gt;
&lt;li&gt;inconsistent terminology&lt;/li&gt;
&lt;li&gt;stale documentation&lt;/li&gt;
&lt;li&gt;missing source ownership&lt;/li&gt;
&lt;li&gt;contradictory statements&lt;/li&gt;
&lt;li&gt;weak internal linking&lt;/li&gt;
&lt;li&gt;bad document boundaries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the most common mistake in RAG projects: teams build a vector database and expect it to become a knowledge system. A vector database is not a knowledge architecture — it is an access layer.&lt;/p&gt;
&lt;h2&gt;
  
  
  Representation without retrieval is isolated
&lt;/h2&gt;

&lt;p&gt;The opposite failure also exists. You can have a beautifully structured knowledge base that nobody can find. This happens with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;over-designed wikis&lt;/li&gt;
&lt;li&gt;deep folder trees&lt;/li&gt;
&lt;li&gt;rigid taxonomies&lt;/li&gt;
&lt;li&gt;poorly indexed documentation&lt;/li&gt;
&lt;li&gt;private note systems with no discovery&lt;/li&gt;
&lt;li&gt;graphs without usable interfaces&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Representation gives knowledge structure; retrieval gives knowledge reach. You need both.&lt;/p&gt;
&lt;h2&gt;
  
  
  The tradeoff map
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Speed vs coherence
&lt;/h3&gt;

&lt;p&gt;Retrieval is fast to build and representation takes longer. If you need a prototype, retrieval wins; if you need long-term trust, representation matters more.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Priority&lt;/th&gt;
&lt;th&gt;Better starting point&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Fast Q&amp;amp;A over many docs&lt;/td&gt;
&lt;td&gt;Retrieval&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Stable technical knowledge&lt;/td&gt;
&lt;td&gt;Representation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Exploratory research&lt;/td&gt;
&lt;td&gt;PKM plus retrieval&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Enterprise assistant&lt;/td&gt;
&lt;td&gt;Structured corpus plus RAG&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agent memory&lt;/td&gt;
&lt;td&gt;Representation plus selective retrieval&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A pure RAG prototype can be built quickly, but a reliable knowledge system takes curation.&lt;/p&gt;
&lt;h3&gt;
  
  
  Flexibility vs consistency
&lt;/h3&gt;

&lt;p&gt;Loose documents are flexible; structured knowledge is consistent. Flexibility helps when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the domain changes quickly&lt;/li&gt;
&lt;li&gt;knowledge is incomplete&lt;/li&gt;
&lt;li&gt;users are exploring&lt;/li&gt;
&lt;li&gt;the system is personal&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Consistency helps when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;multiple people rely on it&lt;/li&gt;
&lt;li&gt;answers must be trusted&lt;/li&gt;
&lt;li&gt;workflows depend on it&lt;/li&gt;
&lt;li&gt;AI systems consume it&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The more people or agents depend on knowledge, the more representation matters.&lt;/p&gt;
&lt;h3&gt;
  
  
  Recall vs precision
&lt;/h3&gt;

&lt;p&gt;Retrieval systems often optimize recall first, which means finding anything that might be relevant. But good answers need precision, which means finding the best evidence rather than merely related evidence. Representation improves precision by making concepts and boundaries clearer — a well-structured page is easier to retrieve accurately than a random paragraph buried inside a long document.&lt;/p&gt;
&lt;h3&gt;
  
  
  Ingest-time cost vs query-time cost
&lt;/h3&gt;

&lt;p&gt;RAG usually pushes work to query time. At query time, the system:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;rewrites the query&lt;/li&gt;
&lt;li&gt;retrieves chunks&lt;/li&gt;
&lt;li&gt;reranks results&lt;/li&gt;
&lt;li&gt;assembles context&lt;/li&gt;
&lt;li&gt;asks the model to reason over fragments&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;LLM Wiki style systems push more work to ingest time. At ingest time, the system:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;reads sources&lt;/li&gt;
&lt;li&gt;extracts concepts&lt;/li&gt;
&lt;li&gt;writes summaries&lt;/li&gt;
&lt;li&gt;creates pages&lt;/li&gt;
&lt;li&gt;links related ideas&lt;/li&gt;
&lt;li&gt;maintains structure&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Architecture&lt;/th&gt;
&lt;th&gt;Expensive step&lt;/th&gt;
&lt;th&gt;Benefit&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;RAG&lt;/td&gt;
&lt;td&gt;Query time&lt;/td&gt;
&lt;td&gt;Flexible retrieval&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LLM Wiki&lt;/td&gt;
&lt;td&gt;Ingest time&lt;/td&gt;
&lt;td&gt;Pre-compiled structure&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Knowledge graph&lt;/td&gt;
&lt;td&gt;Modeling time&lt;/td&gt;
&lt;td&gt;Explicit relationships&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Wiki&lt;/td&gt;
&lt;td&gt;Maintenance time&lt;/td&gt;
&lt;td&gt;Canonical knowledge&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;None of these is universally better — they optimize different costs.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why LLM Wiki exists
&lt;/h2&gt;

&lt;p&gt;LLM Wiki exists because retrieval alone often repeats work. In a normal RAG system, every query may force the model to interpret raw fragments again:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Retrieve chunks about a topic&lt;/li&gt;
&lt;li&gt;Ask the LLM to infer the concept&lt;/li&gt;
&lt;li&gt;Generate an answer&lt;/li&gt;
&lt;li&gt;Forget the synthesis&lt;/li&gt;
&lt;li&gt;Repeat next time&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;LLM Wiki says:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Stop re-deriving the same synthesis. Compile it.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Instead of only storing raw documents, it creates structured pages that summarize and connect knowledge, which can improve coherence, reuse, token efficiency, human readability, and long-term maintenance. But it has a cost: the system must maintain the wiki, and if the wiki is wrong, stale, or hallucinated, the structure becomes dangerous.&lt;/p&gt;
&lt;h2&gt;
  
  
  RAG hallucination vs bad representation
&lt;/h2&gt;

&lt;p&gt;People often blame the LLM when a RAG system gives a bad answer, and sometimes that is correct. But many failures are actually retrieval or representation failures.&lt;/p&gt;
&lt;h3&gt;
  
  
  Failure mode 1. Correct document, wrong chunk
&lt;/h3&gt;

&lt;p&gt;The answer exists, but &lt;a href="https://www.glukhov.org/rag/retrieval/chunking-strategies-in-rag/" rel="noopener noreferrer"&gt;chunking&lt;/a&gt; splits it badly. The model receives:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;half of a paragraph&lt;/li&gt;
&lt;li&gt;missing context&lt;/li&gt;
&lt;li&gt;a table without explanation&lt;/li&gt;
&lt;li&gt;a definition without constraints&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The LLM fills those gaps, which looks like hallucination, but the deeper problem is broken representation.&lt;/p&gt;
&lt;h3&gt;
  
  
  Failure mode 2. Related chunk, wrong answer
&lt;/h3&gt;

&lt;p&gt;Vector search retrieves something semantically similar but operationally wrong. The query asks about production deployment; the retrieved chunk discusses local development. The terms overlap but the meaning differs, so the model answers with local setup instructions for a production problem. This is retrieval imprecision.&lt;/p&gt;
&lt;h3&gt;
  
  
  Failure mode 3. Conflicting sources
&lt;/h3&gt;

&lt;p&gt;Two documents disagree — one old, one new. The retrieval system returns both, and the LLM merges them into a confident but invalid answer. This is not just a retrieval problem but a representation problem, because the knowledge base lacks canonical state.&lt;/p&gt;
&lt;h3&gt;
  
  
  Failure mode 4. No concept model
&lt;/h3&gt;

&lt;p&gt;The system has many documents but no model of the domain. It does not know that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"agent memory" differs from "RAG"&lt;/li&gt;
&lt;li&gt;"wiki" differs from "PKM"&lt;/li&gt;
&lt;li&gt;"embedding search" differs from "full text search"&lt;/li&gt;
&lt;li&gt;"deployment" differs from "hosting"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without conceptual representation, retrieval becomes fuzzy matching.&lt;/p&gt;
&lt;h3&gt;
  
  
  Failure mode 5. Generated structure becomes fake authority
&lt;/h3&gt;

&lt;p&gt;LLM Wiki systems have their own failure mode. If an LLM generates a clean page from bad sources, the result can look more authoritative than the original material. This is dangerous: a polished hallucination is worse than a messy source document. Any generated representation needs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;source links&lt;/li&gt;
&lt;li&gt;review&lt;/li&gt;
&lt;li&gt;update rules&lt;/li&gt;
&lt;li&gt;confidence markers&lt;/li&gt;
&lt;li&gt;ownership&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Design implications
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Optimize retrieval when the corpus is large and dynamic
&lt;/h3&gt;

&lt;p&gt;Retrieval should be the priority when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the corpus is huge&lt;/li&gt;
&lt;li&gt;documents change frequently&lt;/li&gt;
&lt;li&gt;users ask many unpredictable questions&lt;/li&gt;
&lt;li&gt;you need broad coverage&lt;/li&gt;
&lt;li&gt;perfect structure is unrealistic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Examples: support knowledge bases, enterprise document search, research assistants, internal chat over many files, legal discovery, and customer service bots. In these cases, invest in strong retrieval:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;hybrid search&lt;/li&gt;
&lt;li&gt;metadata filters&lt;/li&gt;
&lt;li&gt;reranking&lt;/li&gt;
&lt;li&gt;query rewriting&lt;/li&gt;
&lt;li&gt;source citation&lt;/li&gt;
&lt;li&gt;evaluation sets&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Optimize representation when coherence matters
&lt;/h3&gt;

&lt;p&gt;Representation should be the priority when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;knowledge must be trusted&lt;/li&gt;
&lt;li&gt;answers must be consistent&lt;/li&gt;
&lt;li&gt;concepts are reused often&lt;/li&gt;
&lt;li&gt;the domain has clear structure&lt;/li&gt;
&lt;li&gt;multiple systems depend on it&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Examples: architecture knowledge, product documentation, compliance rules, API references, operational runbooks, curated research collections, and technical blog clusters. In these cases, invest in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;canonical pages&lt;/li&gt;
&lt;li&gt;glossary terms&lt;/li&gt;
&lt;li&gt;diagrams&lt;/li&gt;
&lt;li&gt;internal links&lt;/li&gt;
&lt;li&gt;ownership&lt;/li&gt;
&lt;li&gt;versioning&lt;/li&gt;
&lt;li&gt;review cadence&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Optimize both when AI systems depend on knowledge
&lt;/h3&gt;

&lt;p&gt;If an AI agent depends on the knowledge, retrieval alone is usually not enough. Agents need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;stable context&lt;/li&gt;
&lt;li&gt;clear task rules&lt;/li&gt;
&lt;li&gt;durable memory&lt;/li&gt;
&lt;li&gt;structured references&lt;/li&gt;
&lt;li&gt;source boundaries&lt;/li&gt;
&lt;li&gt;update behavior&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For agentic systems, representation becomes part of system design. A coding agent does not only need to retrieve "some docs" — it needs to know:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;project conventions&lt;/li&gt;
&lt;li&gt;architecture decisions&lt;/li&gt;
&lt;li&gt;command patterns&lt;/li&gt;
&lt;li&gt;forbidden dependencies&lt;/li&gt;
&lt;li&gt;testing workflow&lt;/li&gt;
&lt;li&gt;deployment rules&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Some of that belongs in RAG, some belongs in memory, and some belongs in structured project documentation.&lt;/p&gt;
&lt;h2&gt;
  
  
  Practical decision framework
&lt;/h2&gt;
&lt;h3&gt;
  
  
  If the problem is finding information
&lt;/h3&gt;

&lt;p&gt;Optimize retrieval. Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Find relevant pages."&lt;/li&gt;
&lt;li&gt;"Answer questions over documents."&lt;/li&gt;
&lt;li&gt;"Search across many PDFs."&lt;/li&gt;
&lt;li&gt;"Locate similar support tickets."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;full text search&lt;/li&gt;
&lt;li&gt;vector search&lt;/li&gt;
&lt;li&gt;hybrid retrieval&lt;/li&gt;
&lt;li&gt;reranking&lt;/li&gt;
&lt;li&gt;metadata filtering&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  If the problem is making knowledge coherent
&lt;/h3&gt;

&lt;p&gt;Optimize representation. Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Create a canonical explanation."&lt;/li&gt;
&lt;li&gt;"Resolve duplicate pages."&lt;/li&gt;
&lt;li&gt;"Define the domain model."&lt;/li&gt;
&lt;li&gt;"Build a stable knowledge base."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;wiki pages&lt;/li&gt;
&lt;li&gt;concept maps&lt;/li&gt;
&lt;li&gt;taxonomies&lt;/li&gt;
&lt;li&gt;knowledge graphs&lt;/li&gt;
&lt;li&gt;summaries&lt;/li&gt;
&lt;li&gt;schemas&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  If the problem is repeated synthesis
&lt;/h3&gt;

&lt;p&gt;Use compiled representation. Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"We answer the same conceptual questions repeatedly."&lt;/li&gt;
&lt;li&gt;"The system keeps re-summarizing the same sources."&lt;/li&gt;
&lt;li&gt;"We need a stable synthesis layer."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;LLM Wiki&lt;/li&gt;
&lt;li&gt;curated summaries&lt;/li&gt;
&lt;li&gt;topic pages&lt;/li&gt;
&lt;li&gt;human-reviewed generated pages&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  If the problem is adaptive continuity
&lt;/h3&gt;

&lt;p&gt;Use memory. Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"The agent should remember user preferences."&lt;/li&gt;
&lt;li&gt;"The coding agent should remember project conventions."&lt;/li&gt;
&lt;li&gt;"The assistant should continue work across sessions."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;agent memory&lt;/li&gt;
&lt;li&gt;preference stores&lt;/li&gt;
&lt;li&gt;episodic memory&lt;/li&gt;
&lt;li&gt;semantic memory&lt;/li&gt;
&lt;li&gt;project memory&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  How this applies to a technical blog
&lt;/h2&gt;

&lt;p&gt;A technical blog can be more than a sequence of posts — it can become a represented knowledge system. Articles are documents, categories are weak taxonomy, internal links are graph edges, pillar pages are canonical summaries, series pages are curated pathways, and search is retrieval. If you only publish isolated posts, retrieval has to work harder. If you build strong representation, retrieval becomes easier.&lt;/p&gt;

&lt;p&gt;That means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;clear cluster boundaries&lt;/li&gt;
&lt;li&gt;stable slugs&lt;/li&gt;
&lt;li&gt;canonical pages&lt;/li&gt;
&lt;li&gt;comparison pages&lt;/li&gt;
&lt;li&gt;glossary-style explainers&lt;/li&gt;
&lt;li&gt;internal links&lt;/li&gt;
&lt;li&gt;structured metadata&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is why site architecture matters — not just for SEO, but because it is knowledge representation. The &lt;a href="https://www.glukhov.org/knowledge-management/" rel="noopener noreferrer"&gt;Knowledge Management&lt;/a&gt; cluster on this site is itself an example of representation-first publishing.&lt;/p&gt;
&lt;h2&gt;
  
  
  How this applies to RAG
&lt;/h2&gt;

&lt;p&gt;RAG quality depends heavily on representation. A well-structured source corpus improves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;chunk quality&lt;/li&gt;
&lt;li&gt;retrieval accuracy&lt;/li&gt;
&lt;li&gt;citation quality&lt;/li&gt;
&lt;li&gt;answer consistency&lt;/li&gt;
&lt;li&gt;evaluation clarity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Before building a complex RAG pipeline, ask:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Are the source documents current?&lt;/li&gt;
&lt;li&gt;Are duplicates removed?&lt;/li&gt;
&lt;li&gt;Are important concepts clearly named?&lt;/li&gt;
&lt;li&gt;Are pages scoped correctly?&lt;/li&gt;
&lt;li&gt;Are tables and code blocks retrievable?&lt;/li&gt;
&lt;li&gt;Are canonical answers obvious?&lt;/li&gt;
&lt;li&gt;Are document boundaries meaningful?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If the answer is no, better embeddings will only help so much.&lt;/p&gt;
&lt;h2&gt;
  
  
  How this applies to LLM Wiki
&lt;/h2&gt;

&lt;p&gt;LLM Wiki is a representation-first pattern. It is useful when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the corpus is small or medium sized&lt;/li&gt;
&lt;li&gt;knowledge is stable enough to summarize&lt;/li&gt;
&lt;li&gt;repeated synthesis is expensive&lt;/li&gt;
&lt;li&gt;humans benefit from readable pages&lt;/li&gt;
&lt;li&gt;you want structure before retrieval&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It is less useful when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the corpus is massive&lt;/li&gt;
&lt;li&gt;content changes constantly&lt;/li&gt;
&lt;li&gt;freshness is more important than coherence&lt;/li&gt;
&lt;li&gt;governance is weak&lt;/li&gt;
&lt;li&gt;generated summaries cannot be reviewed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;LLM Wiki is not a replacement for RAG but a different layer, and a strong system can use both:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;LLM Wiki creates structured summaries.&lt;/li&gt;
&lt;li&gt;RAG retrieves from raw sources and wiki pages.&lt;/li&gt;
&lt;li&gt;Human review keeps the representation trustworthy.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  Suggested architecture patterns
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Pattern 1. Retrieval first
&lt;/h3&gt;

&lt;p&gt;Use when speed matters.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;documents
  -&amp;gt; chunks
  -&amp;gt; embeddings
  -&amp;gt; retrieval
  -&amp;gt; LLM answer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Good for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;prototypes&lt;/li&gt;
&lt;li&gt;broad search&lt;/li&gt;
&lt;li&gt;large corpora&lt;/li&gt;
&lt;li&gt;early experiments&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Weakness: coherence depends on source quality.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern 2. Representation first
&lt;/h3&gt;

&lt;p&gt;Use when trust matters.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sources
  -&amp;gt; curated pages
  -&amp;gt; internal links
  -&amp;gt; maintained knowledge base
  -&amp;gt; search or RAG
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Good for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;documentation&lt;/li&gt;
&lt;li&gt;technical knowledge&lt;/li&gt;
&lt;li&gt;long-term content&lt;/li&gt;
&lt;li&gt;team knowledge&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Weakness: requires maintenance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern 3. Compiled knowledge
&lt;/h3&gt;

&lt;p&gt;Use when repeated synthesis matters.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;raw sources
  -&amp;gt; LLM extraction
  -&amp;gt; generated summaries
  -&amp;gt; topic pages
  -&amp;gt; reviewed knowledge base
  -&amp;gt; retrieval
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Good for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;LLM Wiki systems&lt;/li&gt;
&lt;li&gt;research collections&lt;/li&gt;
&lt;li&gt;personal knowledge bases&lt;/li&gt;
&lt;li&gt;stable domains&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Weakness: generated structure must be audited.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern 4. Hybrid knowledge architecture
&lt;/h3&gt;

&lt;p&gt;Use when building serious systems.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;raw documents
  -&amp;gt; structured knowledge layer
  -&amp;gt; search index
  -&amp;gt; retrieval and reranking
  -&amp;gt; AI answer
  -&amp;gt; feedback and maintenance
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Good for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;production RAG&lt;/li&gt;
&lt;li&gt;internal knowledge systems&lt;/li&gt;
&lt;li&gt;AI assistants&lt;/li&gt;
&lt;li&gt;technical publishing systems&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Weakness: more moving parts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Evaluation questions
&lt;/h2&gt;

&lt;p&gt;To evaluate retrieval, ask:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Did the system find the right source?&lt;/li&gt;
&lt;li&gt;Did it rank the right source highly?&lt;/li&gt;
&lt;li&gt;Did it retrieve enough context?&lt;/li&gt;
&lt;li&gt;Did it avoid irrelevant context?&lt;/li&gt;
&lt;li&gt;Did the answer cite the correct source?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To evaluate representation, ask:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Is the knowledge structured clearly?&lt;/li&gt;
&lt;li&gt;Is there a canonical page?&lt;/li&gt;
&lt;li&gt;Are concepts named consistently?&lt;/li&gt;
&lt;li&gt;Are relationships explicit?&lt;/li&gt;
&lt;li&gt;Is the content maintained?&lt;/li&gt;
&lt;li&gt;Can humans and machines both use it?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Do not evaluate a knowledge system only by answer quality — a good answer can hide a bad structure.&lt;/p&gt;

&lt;h2&gt;
  
  
  The opinionated rule
&lt;/h2&gt;

&lt;p&gt;If your system fails occasionally, improve retrieval. If it fails repeatedly in the same conceptual area, improve representation.&lt;/p&gt;

&lt;p&gt;Bad retrieval misses the right information. Bad representation means the right information does not really exist in a usable shape.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Retrieval and representation solve different problems: retrieval gives access, representation gives structure. RAG is powerful because it makes external knowledge available to LLMs at query time, but RAG does not automatically make knowledge coherent, canonical, or maintained. That is why wikis, &lt;a href="https://www.glukhov.org/knowledge-management/foundations/pkm-vs-rag-vs-wiki-vs-memory-systems/" rel="noopener noreferrer"&gt;PKM systems&lt;/a&gt;, knowledge graphs, and LLM Wiki style systems still matter.&lt;/p&gt;

&lt;p&gt;The future is not retrieval vs representation but layered knowledge systems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;representation for structure&lt;/li&gt;
&lt;li&gt;retrieval for access&lt;/li&gt;
&lt;li&gt;memory for continuity&lt;/li&gt;
&lt;li&gt;reasoning for synthesis&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you are building a serious knowledge system, do not start with the vector database. Start with the shape of the knowledge, then decide how it should be retrieved.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources and further reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://cloud.google.com/use-cases/retrieval-augmented-generation" rel="noopener noreferrer"&gt;Google Cloud — Retrieval Augmented Generation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/what-is/retrieval-augmented-generation/" rel="noopener noreferrer"&gt;AWS — What Is Retrieval Augmented Generation?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.ibm.com/think/topics/retrieval-augmented-generation" rel="noopener noreferrer"&gt;IBM — Retrieval Augmented Generation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.ibm.com/think/topics/knowledge-management" rel="noopener noreferrer"&gt;IBM — Knowledge Management&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://atlan.com/know/llm-wiki-vs-rag-knowledge-base/" rel="noopener noreferrer"&gt;Atlan — LLM Wiki vs RAG Knowledge Base&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/vishalmysore/rag-vs-agent-memory-vs-llm-wiki-a-practical-comparison-1oo6"&gt;Dev.to — RAG vs Agent Memory vs LLM Wiki&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://blog.starmorph.com/blog/karpathy-llm-wiki-knowledge-base-guide" rel="noopener noreferrer"&gt;Starmorph — Karpathy LLM Wiki Knowledge Base Guide&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>knowledgemanagement</category>
      <category>wiki</category>
      <category>architecture</category>
      <category>rag</category>
    </item>
    <item>
      <title>PKM vs RAG vs Wiki vs Memory Systems Explained Clearly</title>
      <dc:creator>Rost</dc:creator>
      <pubDate>Sun, 17 May 2026 08:50:06 +0000</pubDate>
      <link>https://dev.to/rosgluk/pkm-vs-rag-vs-wiki-vs-memory-systems-explained-clearly-4b1c</link>
      <guid>https://dev.to/rosgluk/pkm-vs-rag-vs-wiki-vs-memory-systems-explained-clearly-4b1c</guid>
      <description>&lt;p&gt;PKM, RAG, wikis, and AI memory systems are often discussed as if they solve the same problem.&lt;br&gt;
They do not.&lt;br&gt;
They all deal with knowledge, but they operate at different layers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PKM helps humans think.&lt;/li&gt;
&lt;li&gt;Wikis help groups preserve shared knowledge.&lt;/li&gt;
&lt;li&gt;RAG helps machines retrieve external knowledge.&lt;/li&gt;
&lt;li&gt;Memory systems help AI agents persist context over time.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Confusing these systems leads to bad architecture.&lt;/p&gt;

&lt;p&gt;You get wikis full of personal scratch notes, RAG systems without a source of truth, memory layers pretending to be databases, and PKM tools overloaded with automation they were never designed to handle.&lt;/p&gt;

&lt;p&gt;A better model is to see them as different parts of a knowledge systems spectrum.&lt;/p&gt;

&lt;p&gt;This article compares PKM, RAG, wikis, and AI memory systems by structure, retrieval, ownership, evolution, and real-world use cases.&lt;/p&gt;

&lt;h2&gt;
  
  
  The short version
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;System&lt;/th&gt;
&lt;th&gt;Primary user&lt;/th&gt;
&lt;th&gt;Main purpose&lt;/th&gt;
&lt;th&gt;Best for&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;PKM&lt;/td&gt;
&lt;td&gt;Individual&lt;/td&gt;
&lt;td&gt;Develop personal knowledge&lt;/td&gt;
&lt;td&gt;Thinking, learning, synthesis&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Wiki&lt;/td&gt;
&lt;td&gt;Team or public group&lt;/td&gt;
&lt;td&gt;Maintain shared knowledge&lt;/td&gt;
&lt;td&gt;Documentation, policies, reference&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RAG&lt;/td&gt;
&lt;td&gt;Machine system&lt;/td&gt;
&lt;td&gt;Retrieve context for generation&lt;/td&gt;
&lt;td&gt;AI answers over external data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI memory&lt;/td&gt;
&lt;td&gt;AI agent&lt;/td&gt;
&lt;td&gt;Persist context over time&lt;/td&gt;
&lt;td&gt;Long-running agents and personalization&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The most important distinction is this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;PKM and wikis structure knowledge. RAG retrieves knowledge. Memory systems evolve agent context.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is the core mental model.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why these systems are confused
&lt;/h2&gt;

&lt;p&gt;They overlap in visible behavior.&lt;/p&gt;

&lt;p&gt;All of them can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;store notes&lt;/li&gt;
&lt;li&gt;retrieve information&lt;/li&gt;
&lt;li&gt;answer questions&lt;/li&gt;
&lt;li&gt;organize references&lt;/li&gt;
&lt;li&gt;connect ideas&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But they differ in intent.&lt;/p&gt;

&lt;p&gt;A PKM system is not just a private wiki.&lt;br&gt;
A wiki is not just a RAG database.&lt;br&gt;
A RAG pipeline is not an AI memory.&lt;br&gt;
An AI memory system is not a replacement for structured documentation.&lt;/p&gt;

&lt;p&gt;The confusion comes from treating "knowledge" as one thing.&lt;/p&gt;

&lt;p&gt;In practice, knowledge has multiple layers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Capture&lt;/li&gt;
&lt;li&gt;Structure&lt;/li&gt;
&lt;li&gt;Retrieval&lt;/li&gt;
&lt;li&gt;Interpretation&lt;/li&gt;
&lt;li&gt;Reuse&lt;/li&gt;
&lt;li&gt;Evolution&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Different systems optimize different stages.&lt;/p&gt;

&lt;h2&gt;
  
  
  The four paradigms
&lt;/h2&gt;

&lt;h2&gt;
  
  
  1. PKM
&lt;/h2&gt;

&lt;p&gt;PKM stands for &lt;a href="https://www.glukhov.org/knowledge-management/foundations/personal-knowledge-management/" rel="noopener noreferrer"&gt;personal knowledge management&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;It is the practice of capturing, organizing, connecting, and using knowledge for personal work.&lt;/p&gt;

&lt;p&gt;Typical PKM systems include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.glukhov.org/knowledge-management/tools/obsidian-for-personal-knowledge-management/" rel="noopener noreferrer"&gt;Obsidian&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Logseq&lt;/li&gt;
&lt;li&gt;Notion&lt;/li&gt;
&lt;li&gt;plain Markdown folders&lt;/li&gt;
&lt;li&gt;Zettelkasten systems&lt;/li&gt;
&lt;li&gt;second brain systems&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;PKM is human driven.&lt;/p&gt;

&lt;p&gt;The goal is not just storage. The goal is better thinking.&lt;/p&gt;

&lt;h3&gt;
  
  
  What PKM is good at
&lt;/h3&gt;

&lt;p&gt;PKM works well for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;learning a new domain&lt;/li&gt;
&lt;li&gt;developing original ideas&lt;/li&gt;
&lt;li&gt;connecting notes over time&lt;/li&gt;
&lt;li&gt;writing articles or books&lt;/li&gt;
&lt;li&gt;tracking personal research&lt;/li&gt;
&lt;li&gt;building a second brain&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A good PKM system is messy in a useful way. It supports unfinished thoughts, partial ideas, private context, and evolving concepts.&lt;/p&gt;

&lt;p&gt;This is why PKM is not the same as documentation.&lt;/p&gt;

&lt;p&gt;Documentation wants clarity.&lt;br&gt;
PKM tolerates ambiguity.&lt;/p&gt;

&lt;h3&gt;
  
  
  PKM failure modes
&lt;/h3&gt;

&lt;p&gt;PKM often fails when it becomes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a dumping ground&lt;/li&gt;
&lt;li&gt;a folder taxonomy project&lt;/li&gt;
&lt;li&gt;a productivity aesthetic&lt;/li&gt;
&lt;li&gt;a tool optimization hobby&lt;/li&gt;
&lt;li&gt;a private archive nobody uses&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The main risk is collection without synthesis.&lt;/p&gt;

&lt;p&gt;If you only save information, you do not have a knowledge system. You have a personal landfill.&lt;/p&gt;

&lt;h3&gt;
  
  
  Opinionated take
&lt;/h3&gt;

&lt;p&gt;PKM should optimize for reuse, not capture.&lt;/p&gt;

&lt;p&gt;Capturing everything feels productive, but it creates debt. The real value appears when notes become connected, rewritten, compressed, and used in output.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Wiki
&lt;/h2&gt;

&lt;p&gt;A wiki is a structured knowledge base designed for shared reference.&lt;/p&gt;

&lt;p&gt;Typical wiki systems include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.glukhov.org/knowledge-management/self-hosted-knowledge/dokuwiki-selfhosted-wiki-alternatives/" rel="noopener noreferrer"&gt;DokuWiki&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;MediaWiki&lt;/li&gt;
&lt;li&gt;Confluence&lt;/li&gt;
&lt;li&gt;BookStack&lt;/li&gt;
&lt;li&gt;Git based documentation sites&lt;/li&gt;
&lt;li&gt;internal company knowledge bases&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A wiki is usually more formal than PKM.&lt;/p&gt;

&lt;p&gt;It should answer:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What do we know, and where is the current version?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  What wikis are good at
&lt;/h3&gt;

&lt;p&gt;Wikis work well for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;team documentation&lt;/li&gt;
&lt;li&gt;operational runbooks&lt;/li&gt;
&lt;li&gt;product knowledge&lt;/li&gt;
&lt;li&gt;policy documents&lt;/li&gt;
&lt;li&gt;technical reference&lt;/li&gt;
&lt;li&gt;onboarding material&lt;/li&gt;
&lt;li&gt;stable domain knowledge&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A wiki is a social contract.&lt;/p&gt;

&lt;p&gt;It says:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This page is the place where this knowledge lives.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That makes ownership and maintenance critical.&lt;/p&gt;

&lt;h3&gt;
  
  
  Wiki failure modes
&lt;/h3&gt;

&lt;p&gt;Wikis often fail because they become stale.&lt;/p&gt;

&lt;p&gt;Common problems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;no page owners&lt;/li&gt;
&lt;li&gt;outdated screenshots&lt;/li&gt;
&lt;li&gt;duplicate pages&lt;/li&gt;
&lt;li&gt;unclear canonical versions&lt;/li&gt;
&lt;li&gt;too much hierarchy&lt;/li&gt;
&lt;li&gt;no maintenance rhythm&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A wiki with old information is worse than no wiki, because it creates false confidence.&lt;/p&gt;

&lt;h3&gt;
  
  
  Opinionated take
&lt;/h3&gt;

&lt;p&gt;A wiki should be boring.&lt;/p&gt;

&lt;p&gt;That is a compliment.&lt;/p&gt;

&lt;p&gt;A good wiki is not where ideas are born. It is where stable knowledge is preserved after it becomes useful to others.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. RAG
&lt;/h2&gt;

&lt;p&gt;RAG stands for &lt;a href="https://www.glukhov.org/rag/" rel="noopener noreferrer"&gt;retrieval augmented generation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;It is an AI architecture where a system retrieves relevant external information before asking a language model to generate an answer.&lt;/p&gt;

&lt;p&gt;A basic RAG pipeline usually has:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Documents&lt;/li&gt;
&lt;li&gt;Chunking&lt;/li&gt;
&lt;li&gt;Embeddings or search index&lt;/li&gt;
&lt;li&gt;Retrieval&lt;/li&gt;
&lt;li&gt;Optional reranking&lt;/li&gt;
&lt;li&gt;Prompt assembly&lt;/li&gt;
&lt;li&gt;LLM generation&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;RAG is machine driven.&lt;/p&gt;

&lt;p&gt;The goal is not to create knowledge. The goal is to give a model relevant context at query time.&lt;/p&gt;

&lt;h3&gt;
  
  
  What RAG is good at
&lt;/h3&gt;

&lt;p&gt;RAG works well for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;question answering over documents&lt;/li&gt;
&lt;li&gt;internal search assistants&lt;/li&gt;
&lt;li&gt;support bots&lt;/li&gt;
&lt;li&gt;technical documentation assistants&lt;/li&gt;
&lt;li&gt;compliance lookup&lt;/li&gt;
&lt;li&gt;research over large corpora&lt;/li&gt;
&lt;li&gt;connecting LLMs to updated information&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;RAG is especially useful when the model cannot or should not memorize the information.&lt;/p&gt;

&lt;h3&gt;
  
  
  RAG failure modes
&lt;/h3&gt;

&lt;p&gt;RAG often fails when teams treat it as magic search.&lt;/p&gt;

&lt;p&gt;Common problems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;bad chunking&lt;/li&gt;
&lt;li&gt;weak retrieval&lt;/li&gt;
&lt;li&gt;noisy context&lt;/li&gt;
&lt;li&gt;missing metadata&lt;/li&gt;
&lt;li&gt;no source of truth&lt;/li&gt;
&lt;li&gt;stale documents&lt;/li&gt;
&lt;li&gt;weak evaluation&lt;/li&gt;
&lt;li&gt;no human feedback loop&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;RAG does not fix bad knowledge management.&lt;/p&gt;

&lt;p&gt;If the underlying content is fragmented, outdated, or contradictory, the RAG system will surface that mess with confidence.&lt;/p&gt;

&lt;h3&gt;
  
  
  Opinionated take
&lt;/h3&gt;

&lt;p&gt;RAG is not a knowledge strategy.&lt;/p&gt;

&lt;p&gt;RAG is an access strategy.&lt;/p&gt;

&lt;p&gt;It helps machines access knowledge, but it does not decide what knowledge is valid, maintained, canonical, or useful.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. AI memory systems
&lt;/h2&gt;

&lt;p&gt;AI memory systems give agents persistent context beyond a single prompt or conversation.&lt;/p&gt;

&lt;p&gt;They may store:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;user preferences&lt;/li&gt;
&lt;li&gt;past decisions&lt;/li&gt;
&lt;li&gt;long-term facts&lt;/li&gt;
&lt;li&gt;task history&lt;/li&gt;
&lt;li&gt;summaries&lt;/li&gt;
&lt;li&gt;reflections&lt;/li&gt;
&lt;li&gt;extracted entities&lt;/li&gt;
&lt;li&gt;episodic memories&lt;/li&gt;
&lt;li&gt;semantic memories&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Examples and related ideas include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;MemGPT style memory tiers&lt;/li&gt;
&lt;li&gt;long-term agent memory&lt;/li&gt;
&lt;li&gt;episodic memory&lt;/li&gt;
&lt;li&gt;semantic memory&lt;/li&gt;
&lt;li&gt;vector memory&lt;/li&gt;
&lt;li&gt;profile memory&lt;/li&gt;
&lt;li&gt;tool state memory&lt;/li&gt;
&lt;li&gt;reflective agents&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AI memory is agent driven.&lt;/p&gt;

&lt;p&gt;The goal is continuity.&lt;/p&gt;

&lt;h3&gt;
  
  
  What AI memory is good at
&lt;/h3&gt;

&lt;p&gt;AI memory systems work well for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;personal assistants&lt;/li&gt;
&lt;li&gt;long-running coding agents&lt;/li&gt;
&lt;li&gt;research agents&lt;/li&gt;
&lt;li&gt;customer support agents&lt;/li&gt;
&lt;li&gt;tutoring systems&lt;/li&gt;
&lt;li&gt;workflow automation&lt;/li&gt;
&lt;li&gt;persistent companions&lt;/li&gt;
&lt;li&gt;multi-session task execution&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Memory matters when the system must behave as if it remembers.&lt;/p&gt;

&lt;h3&gt;
  
  
  AI memory failure modes
&lt;/h3&gt;

&lt;p&gt;Memory systems are dangerous when unmanaged.&lt;/p&gt;

&lt;p&gt;Common problems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;remembering wrong facts&lt;/li&gt;
&lt;li&gt;storing too much&lt;/li&gt;
&lt;li&gt;privacy risk&lt;/li&gt;
&lt;li&gt;stale preferences&lt;/li&gt;
&lt;li&gt;poor memory ranking&lt;/li&gt;
&lt;li&gt;memory poisoning&lt;/li&gt;
&lt;li&gt;no forgetting mechanism&lt;/li&gt;
&lt;li&gt;confusing memory with truth&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A memory system needs governance.&lt;/p&gt;

&lt;p&gt;It should answer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What should be remembered?&lt;/li&gt;
&lt;li&gt;Who approved it?&lt;/li&gt;
&lt;li&gt;How long should it live?&lt;/li&gt;
&lt;li&gt;When should it be forgotten?&lt;/li&gt;
&lt;li&gt;How is it corrected?&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Opinionated take
&lt;/h3&gt;

&lt;p&gt;AI memory is not just long context.&lt;/p&gt;

&lt;p&gt;Long context lets a model see more at once.&lt;br&gt;
Memory decides what survives across time.&lt;/p&gt;

&lt;p&gt;Those are different problems.&lt;/p&gt;

&lt;h2&gt;
  
  
  Core differences table
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;PKM&lt;/th&gt;
&lt;th&gt;Wiki&lt;/th&gt;
&lt;th&gt;RAG&lt;/th&gt;
&lt;th&gt;AI memory&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Primary user&lt;/td&gt;
&lt;td&gt;Individual&lt;/td&gt;
&lt;td&gt;Team or public group&lt;/td&gt;
&lt;td&gt;AI system&lt;/td&gt;
&lt;td&gt;AI agent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Main function&lt;/td&gt;
&lt;td&gt;Thinking&lt;/td&gt;
&lt;td&gt;Shared reference&lt;/td&gt;
&lt;td&gt;Query time retrieval&lt;/td&gt;
&lt;td&gt;Persistent context&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Knowledge state&lt;/td&gt;
&lt;td&gt;Evolving&lt;/td&gt;
&lt;td&gt;Stabilized&lt;/td&gt;
&lt;td&gt;Retrieved&lt;/td&gt;
&lt;td&gt;Adaptive&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Structure&lt;/td&gt;
&lt;td&gt;Flexible&lt;/td&gt;
&lt;td&gt;Explicit&lt;/td&gt;
&lt;td&gt;Index based&lt;/td&gt;
&lt;td&gt;Learned or extracted&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Retrieval style&lt;/td&gt;
&lt;td&gt;Human search and linking&lt;/td&gt;
&lt;td&gt;Navigation and search&lt;/td&gt;
&lt;td&gt;Semantic or hybrid retrieval&lt;/td&gt;
&lt;td&gt;Relevance plus salience&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ownership&lt;/td&gt;
&lt;td&gt;Personal&lt;/td&gt;
&lt;td&gt;Page or team owners&lt;/td&gt;
&lt;td&gt;System maintainers&lt;/td&gt;
&lt;td&gt;Agent or user controlled&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Time horizon&lt;/td&gt;
&lt;td&gt;Long term personal&lt;/td&gt;
&lt;td&gt;Long term shared&lt;/td&gt;
&lt;td&gt;Query time&lt;/td&gt;
&lt;td&gt;Multi-session&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Best output&lt;/td&gt;
&lt;td&gt;Insight&lt;/td&gt;
&lt;td&gt;Reliable reference&lt;/td&gt;
&lt;td&gt;Grounded answer&lt;/td&gt;
&lt;td&gt;Continuity&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Main risk&lt;/td&gt;
&lt;td&gt;Hoarding&lt;/td&gt;
&lt;td&gt;Staleness&lt;/td&gt;
&lt;td&gt;Bad retrieval&lt;/td&gt;
&lt;td&gt;Bad memory&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Good metric&lt;/td&gt;
&lt;td&gt;Reuse in thinking&lt;/td&gt;
&lt;td&gt;Trust and freshness&lt;/td&gt;
&lt;td&gt;Answer quality&lt;/td&gt;
&lt;td&gt;Helpful continuity&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Structure vs retrieval vs evolution
&lt;/h2&gt;

&lt;p&gt;The simplest way to understand these systems is to compare what they optimize. The architectural implications of that distinction are explored in depth in &lt;a href="https://www.glukhov.org/knowledge-management/foundations/retrieval-vs-representation/" rel="noopener noreferrer"&gt;Retrieval vs Representation in Knowledge Systems&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  PKM optimizes personal evolution
&lt;/h3&gt;

&lt;p&gt;PKM is about how your understanding changes.&lt;/p&gt;

&lt;p&gt;You collect material, rewrite it, connect it, and turn it into something useful.&lt;/p&gt;

&lt;p&gt;The output is often:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a better mental model&lt;/li&gt;
&lt;li&gt;a written article&lt;/li&gt;
&lt;li&gt;a decision&lt;/li&gt;
&lt;li&gt;a research direction&lt;/li&gt;
&lt;li&gt;a reusable insight&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;PKM is not primarily about fast lookup. It is about long-term sensemaking.&lt;/p&gt;

&lt;h3&gt;
  
  
  Wikis optimize shared structure
&lt;/h3&gt;

&lt;p&gt;Wikis are about stable knowledge.&lt;/p&gt;

&lt;p&gt;They ask:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What is the current answer?&lt;/li&gt;
&lt;li&gt;Who owns it?&lt;/li&gt;
&lt;li&gt;Where should people go?&lt;/li&gt;
&lt;li&gt;What should be updated?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A wiki works when people trust it.&lt;/p&gt;

&lt;h3&gt;
  
  
  RAG optimizes machine retrieval
&lt;/h3&gt;

&lt;p&gt;RAG is about retrieving the right context at the right time.&lt;/p&gt;

&lt;p&gt;It asks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What documents are relevant?&lt;/li&gt;
&lt;li&gt;Which chunks should be used?&lt;/li&gt;
&lt;li&gt;How much context fits?&lt;/li&gt;
&lt;li&gt;What should the model cite?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;RAG works when retrieval quality is high and the source corpus is trustworthy.&lt;/p&gt;

&lt;h3&gt;
  
  
  AI memory optimizes continuity
&lt;/h3&gt;

&lt;p&gt;Memory systems are about persistence across sessions.&lt;/p&gt;

&lt;p&gt;They ask:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What should the agent remember?&lt;/li&gt;
&lt;li&gt;What should be forgotten?&lt;/li&gt;
&lt;li&gt;Which memory matters now?&lt;/li&gt;
&lt;li&gt;How should memory change behavior?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Memory works when it improves future behavior without polluting the agent with stale or incorrect context.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to use PKM
&lt;/h2&gt;

&lt;p&gt;Use PKM when the knowledge is personal, unfinished, or exploratory.&lt;/p&gt;

&lt;p&gt;Good scenarios:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;learning distributed systems&lt;/li&gt;
&lt;li&gt;planning articles&lt;/li&gt;
&lt;li&gt;researching LLM architecture&lt;/li&gt;
&lt;li&gt;collecting book notes&lt;/li&gt;
&lt;li&gt;building a &lt;a href="https://www.glukhov.org/knowledge-management/foundations/second-brain/" rel="noopener noreferrer"&gt;second brain&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;tracking personal experiments&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use PKM when you are still thinking.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example
&lt;/h3&gt;

&lt;p&gt;You are learning about RAG evaluation.&lt;/p&gt;

&lt;p&gt;You collect:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;articles&lt;/li&gt;
&lt;li&gt;benchmark notes&lt;/li&gt;
&lt;li&gt;diagrams&lt;/li&gt;
&lt;li&gt;implementation ideas&lt;/li&gt;
&lt;li&gt;failures from your own experiments&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This belongs in PKM first.&lt;/p&gt;

&lt;p&gt;Later, once the knowledge stabilizes, you may publish an article or turn it into documentation.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to use a wiki
&lt;/h2&gt;

&lt;p&gt;Use a wiki when knowledge must be shared and maintained.&lt;/p&gt;

&lt;p&gt;Good scenarios:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;team onboarding&lt;/li&gt;
&lt;li&gt;API documentation&lt;/li&gt;
&lt;li&gt;operational runbooks&lt;/li&gt;
&lt;li&gt;architecture decision records&lt;/li&gt;
&lt;li&gt;product knowledge&lt;/li&gt;
&lt;li&gt;deployment instructions&lt;/li&gt;
&lt;li&gt;support procedures&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use a wiki when others need a reliable answer.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example
&lt;/h3&gt;

&lt;p&gt;Your team has one correct way to deploy a Hugo site to S3 and CloudFront.&lt;/p&gt;

&lt;p&gt;That does not belong only in someone's private notes.&lt;/p&gt;

&lt;p&gt;It belongs in a wiki or documentation system with clear ownership.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to use RAG
&lt;/h2&gt;

&lt;p&gt;Use RAG when an AI system needs access to external knowledge at query time.&lt;/p&gt;

&lt;p&gt;Good scenarios:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;chatbot over documentation&lt;/li&gt;
&lt;li&gt;search assistant over internal docs&lt;/li&gt;
&lt;li&gt;support assistant over help articles&lt;/li&gt;
&lt;li&gt;legal or compliance assistant&lt;/li&gt;
&lt;li&gt;research over large document sets&lt;/li&gt;
&lt;li&gt;developer assistant over code docs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use RAG when the problem is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The model needs information that lives outside its weights.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Example
&lt;/h3&gt;

&lt;p&gt;You have hundreds of technical articles and want an assistant to answer questions using them.&lt;/p&gt;

&lt;p&gt;RAG is a good fit.&lt;/p&gt;

&lt;p&gt;But only if the documents are clean enough to retrieve from.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to use AI memory
&lt;/h2&gt;

&lt;p&gt;Use AI memory when an agent needs continuity.&lt;/p&gt;

&lt;p&gt;Good scenarios:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;coding agents that remember project conventions&lt;/li&gt;
&lt;li&gt;personal assistants that remember preferences&lt;/li&gt;
&lt;li&gt;research agents that continue long investigations&lt;/li&gt;
&lt;li&gt;tutoring agents that remember student progress&lt;/li&gt;
&lt;li&gt;support agents that remember prior interactions&lt;/li&gt;
&lt;li&gt;autonomous agents that track goals&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use memory when the system must improve across time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example
&lt;/h3&gt;

&lt;p&gt;A coding agent should remember:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the project uses Go&lt;/li&gt;
&lt;li&gt;tests run with a specific command&lt;/li&gt;
&lt;li&gt;the user prefers minimal dependencies&lt;/li&gt;
&lt;li&gt;database migrations follow a convention&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is not just retrieval. It is persistent operating context.&lt;/p&gt;

&lt;h2&gt;
  
  
  How these systems combine
&lt;/h2&gt;

&lt;p&gt;The most useful systems are hybrids.&lt;/p&gt;

&lt;p&gt;A mature knowledge architecture might look like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;PKM for personal exploration&lt;/li&gt;
&lt;li&gt;Wiki for stable shared knowledge&lt;/li&gt;
&lt;li&gt;RAG for machine access&lt;/li&gt;
&lt;li&gt;AI memory for long-running agent continuity&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Each layer has a job.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pattern 1. PKM to wiki
&lt;/h2&gt;

&lt;p&gt;This is the human knowledge pipeline.&lt;/p&gt;

&lt;p&gt;Flow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Capture notes privately&lt;/li&gt;
&lt;li&gt;Connect ideas&lt;/li&gt;
&lt;li&gt;Distill insights&lt;/li&gt;
&lt;li&gt;Publish stable knowledge&lt;/li&gt;
&lt;li&gt;Maintain as shared reference&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is how personal research becomes organizational knowledge.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example
&lt;/h3&gt;

&lt;p&gt;You research self-hosted knowledge tools in Obsidian.&lt;/p&gt;

&lt;p&gt;After testing DokuWiki, Nextcloud, and static Markdown systems, you write a stable guide in your site or team wiki.&lt;/p&gt;

&lt;p&gt;PKM created the insight.&lt;br&gt;
The wiki preserves the result.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pattern 2. Wiki to RAG
&lt;/h2&gt;

&lt;p&gt;This is the machine access pipeline.&lt;/p&gt;

&lt;p&gt;Flow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Maintain canonical wiki pages&lt;/li&gt;
&lt;li&gt;Index them&lt;/li&gt;
&lt;li&gt;Retrieve relevant sections&lt;/li&gt;
&lt;li&gt;Generate grounded answers&lt;/li&gt;
&lt;li&gt;Link back to sources&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is one of the cleanest RAG patterns.&lt;/p&gt;

&lt;p&gt;The wiki remains the source of truth.&lt;br&gt;
RAG becomes the access layer.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example
&lt;/h3&gt;

&lt;p&gt;A support bot answers questions using a product wiki.&lt;/p&gt;

&lt;p&gt;The bot should not replace the wiki. It should cite and route users back to the canonical pages.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pattern 3. RAG plus memory
&lt;/h2&gt;

&lt;p&gt;This is the agent continuity pipeline.&lt;/p&gt;

&lt;p&gt;Flow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;RAG retrieves external facts&lt;/li&gt;
&lt;li&gt;Memory stores user or task context&lt;/li&gt;
&lt;li&gt;The agent combines both&lt;/li&gt;
&lt;li&gt;Future behavior improves&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;RAG answers:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What does the knowledge base say?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Memory answers:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What matters about this user, project, or task?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Example
&lt;/h3&gt;

&lt;p&gt;A coding agent uses RAG to retrieve framework docs.&lt;/p&gt;

&lt;p&gt;It uses memory to remember that your project avoids ORMs, prefers sqlc, and uses structured logging.&lt;/p&gt;

&lt;p&gt;Those are different knowledge types.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pattern 4. PKM plus AI assistant
&lt;/h2&gt;

&lt;p&gt;This is the hybrid thinking pipeline.&lt;/p&gt;

&lt;p&gt;Flow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Human captures notes&lt;/li&gt;
&lt;li&gt;AI summarizes and suggests links&lt;/li&gt;
&lt;li&gt;Human edits and validates&lt;/li&gt;
&lt;li&gt;Knowledge becomes more structured&lt;/li&gt;
&lt;li&gt;Some pages graduate to wiki or publication&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The AI augments the PKM system, but it should not own the truth.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example
&lt;/h3&gt;

&lt;p&gt;An AI assistant can suggest connections between notes about RAG, memory systems, and LLM Wiki.&lt;/p&gt;

&lt;p&gt;But the human decides which connections are meaningful.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common architecture mistakes
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Mistake 1. Treating RAG as a wiki
&lt;/h2&gt;

&lt;p&gt;RAG is not a knowledge base.&lt;/p&gt;

&lt;p&gt;It does not automatically create a canonical structure. It retrieves from whatever exists.&lt;/p&gt;

&lt;p&gt;If the source documents are bad, RAG becomes a confident interface to bad knowledge.&lt;/p&gt;

&lt;h2&gt;
  
  
  Mistake 2. Treating memory as a database
&lt;/h2&gt;

&lt;p&gt;AI memory is selective context, not general storage.&lt;/p&gt;

&lt;p&gt;A database stores records.&lt;br&gt;
Memory changes behavior.&lt;/p&gt;

&lt;p&gt;If you need exact facts, use a database or knowledge base.&lt;br&gt;
If you need continuity, use memory.&lt;/p&gt;

&lt;h2&gt;
  
  
  Mistake 3. Treating PKM as documentation
&lt;/h2&gt;

&lt;p&gt;PKM can be messy.&lt;/p&gt;

&lt;p&gt;Documentation should not be.&lt;/p&gt;

&lt;p&gt;Private notes can contain half-formed ideas. Shared documentation should contain stable, maintained knowledge.&lt;/p&gt;

&lt;h2&gt;
  
  
  Mistake 4. Treating a wiki as a thinking tool
&lt;/h2&gt;

&lt;p&gt;A wiki can support thinking, but it is not ideal for early exploration.&lt;/p&gt;

&lt;p&gt;If every early thought must become a polished page, people stop writing.&lt;/p&gt;

&lt;p&gt;Use PKM for rough thinking. Use wikis for durable knowledge.&lt;/p&gt;

&lt;h2&gt;
  
  
  Mistake 5. Treating long context as memory
&lt;/h2&gt;

&lt;p&gt;Long context is not memory.&lt;/p&gt;

&lt;p&gt;It only helps while the context is present.&lt;/p&gt;

&lt;p&gt;Memory persists, selects, updates, and sometimes forgets.&lt;/p&gt;

&lt;h2&gt;
  
  
  Decision guide
&lt;/h2&gt;

&lt;p&gt;Use this simple decision model.&lt;/p&gt;

&lt;h3&gt;
  
  
  If the knowledge is private and evolving
&lt;/h3&gt;

&lt;p&gt;Use PKM.&lt;/p&gt;

&lt;h3&gt;
  
  
  If the knowledge is shared and stable
&lt;/h3&gt;

&lt;p&gt;Use a wiki.&lt;/p&gt;

&lt;h3&gt;
  
  
  If an AI needs to answer from external documents
&lt;/h3&gt;

&lt;p&gt;Use RAG.&lt;/p&gt;

&lt;h3&gt;
  
  
  If an agent needs continuity over time
&lt;/h3&gt;

&lt;p&gt;Use memory.&lt;/p&gt;

&lt;h3&gt;
  
  
  If you need all four
&lt;/h3&gt;

&lt;p&gt;Build a layered system.&lt;/p&gt;

&lt;p&gt;Do not force one tool to do every job.&lt;/p&gt;

&lt;h2&gt;
  
  
  The knowledge systems spectrum
&lt;/h2&gt;

&lt;p&gt;These systems form a spectrum from human thinking to AI continuity.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;System&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Human thought&lt;/td&gt;
&lt;td&gt;PKM&lt;/td&gt;
&lt;td&gt;Explore and synthesize&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Shared structure&lt;/td&gt;
&lt;td&gt;Wiki&lt;/td&gt;
&lt;td&gt;Preserve and maintain&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Machine access&lt;/td&gt;
&lt;td&gt;RAG&lt;/td&gt;
&lt;td&gt;Retrieve and generate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agent continuity&lt;/td&gt;
&lt;td&gt;Memory&lt;/td&gt;
&lt;td&gt;Persist and adapt&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The direction matters.&lt;/p&gt;

&lt;p&gt;Knowledge often starts as personal thought, becomes shared structure, is indexed for machine retrieval, and then becomes part of persistent agent behavior.&lt;/p&gt;

&lt;p&gt;That is the modern knowledge stack.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where LLM Wiki fits
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.glukhov.org/knowledge-management/knowledge-systems-architectures/compiled-knowledge/what-is-llm-wiki/" rel="noopener noreferrer"&gt;LLM Wiki&lt;/a&gt; style systems sit between wiki and AI architecture.&lt;/p&gt;

&lt;p&gt;They are not classic RAG.&lt;/p&gt;

&lt;p&gt;Instead of retrieving chunks only at query time, they attempt to pre-structure knowledge into pages, summaries, entities, and links.&lt;/p&gt;

&lt;p&gt;That makes them closer to compiled knowledge systems.&lt;/p&gt;

&lt;p&gt;A useful placement:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;System&lt;/th&gt;
&lt;th&gt;Position&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Wiki&lt;/td&gt;
&lt;td&gt;Human maintained structured knowledge&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RAG&lt;/td&gt;
&lt;td&gt;Query time machine retrieval&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LLM Wiki&lt;/td&gt;
&lt;td&gt;Ingest time machine structured knowledge&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory&lt;/td&gt;
&lt;td&gt;Agent persistent context&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This is why LLM Wiki belongs near knowledge systems architecture, not inside ordinary RAG.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical examples
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Example 1. Personal technical blog
&lt;/h2&gt;

&lt;p&gt;A technical blogger might use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PKM for research notes&lt;/li&gt;
&lt;li&gt;Hugo site as published knowledge&lt;/li&gt;
&lt;li&gt;internal linking as wiki-like structure&lt;/li&gt;
&lt;li&gt;RAG later for site search&lt;/li&gt;
&lt;li&gt;AI memory for writing assistant preferences&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is a strong architecture.&lt;/p&gt;

&lt;p&gt;It keeps human judgment at the center while still allowing AI support.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example 2. Engineering team
&lt;/h2&gt;

&lt;p&gt;An engineering team might use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PKM for individual learning&lt;/li&gt;
&lt;li&gt;wiki for standards and runbooks&lt;/li&gt;
&lt;li&gt;RAG assistant for internal docs&lt;/li&gt;
&lt;li&gt;memory for coding agents working inside repositories&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The wiki should remain canonical.&lt;/p&gt;

&lt;p&gt;The RAG assistant should not invent process.&lt;br&gt;
The memory layer should remember project preferences, not replace architecture decisions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example 3. AI research workflow
&lt;/h2&gt;

&lt;p&gt;A researcher might use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PKM for paper notes&lt;/li&gt;
&lt;li&gt;wiki for stable summaries&lt;/li&gt;
&lt;li&gt;RAG for literature search&lt;/li&gt;
&lt;li&gt;memory for long-running research agents&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This works because each layer handles a different time scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  Security and governance
&lt;/h2&gt;

&lt;p&gt;Knowledge systems become risky when they store sensitive or stale information.&lt;/p&gt;

&lt;h3&gt;
  
  
  PKM governance
&lt;/h3&gt;

&lt;p&gt;Questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What should stay private?&lt;/li&gt;
&lt;li&gt;What should be published?&lt;/li&gt;
&lt;li&gt;What should be deleted?&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Wiki governance
&lt;/h3&gt;

&lt;p&gt;Questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Who owns each page?&lt;/li&gt;
&lt;li&gt;When was it last reviewed?&lt;/li&gt;
&lt;li&gt;What is canonical?&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  RAG governance
&lt;/h3&gt;

&lt;p&gt;Questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which sources are indexed?&lt;/li&gt;
&lt;li&gt;Are answers cited?&lt;/li&gt;
&lt;li&gt;How is retrieval evaluated?&lt;/li&gt;
&lt;li&gt;What content is excluded?&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Memory governance
&lt;/h3&gt;

&lt;p&gt;Questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What is remembered?&lt;/li&gt;
&lt;li&gt;Can users inspect memory?&lt;/li&gt;
&lt;li&gt;Can users delete memory?&lt;/li&gt;
&lt;li&gt;How are wrong memories corrected?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Memory needs the strictest governance because it can silently influence future behavior.&lt;/p&gt;

&lt;h2&gt;
  
  
  SEO and content strategy note
&lt;/h2&gt;

&lt;p&gt;If you run a technical site, this distinction is not only architectural. It is also editorial.&lt;/p&gt;

&lt;p&gt;You can map content like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PKM pages explain human knowledge practices.&lt;/li&gt;
&lt;li&gt;Wiki pages explain structured knowledge systems.&lt;/li&gt;
&lt;li&gt;RAG pages explain retrieval engineering.&lt;/li&gt;
&lt;li&gt;Memory pages explain persistent AI behavior.&lt;/li&gt;
&lt;li&gt;Architecture pages compare and connect the paradigms.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This gives your site a clean authority mesh instead of a pile of loosely related AI articles.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final conclusion
&lt;/h2&gt;

&lt;p&gt;PKM, RAG, wikis, and AI memory systems are not competitors.&lt;/p&gt;

&lt;p&gt;They are different answers to different questions.&lt;/p&gt;

&lt;p&gt;PKM asks:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;How do I think better over time?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A wiki asks:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What do we know, and where is the trusted version?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;RAG asks:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What external context should the model use right now?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;AI memory asks:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What should this agent remember for the future?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Once you separate those questions, the architecture becomes obvious.&lt;/p&gt;

&lt;p&gt;Use PKM for thinking.&lt;br&gt;
Use wikis for shared truth.&lt;br&gt;
Use RAG for retrieval.&lt;br&gt;
Use memory for continuity.&lt;/p&gt;

&lt;p&gt;The future is not one knowledge system that replaces all others.&lt;/p&gt;

&lt;p&gt;The future is layered knowledge architecture. For tools, methods, and self-hosted platforms across the full &lt;a href="https://www.glukhov.org/knowledge-management/" rel="noopener noreferrer"&gt;knowledge management&lt;/a&gt; spectrum, the cluster pillar maps the territory.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources and further reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://cloud.google.com/use-cases/retrieval-augmented-generation" rel="noopener noreferrer"&gt;https://cloud.google.com/use-cases/retrieval-augmented-generation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/what-is/retrieval-augmented-generation/" rel="noopener noreferrer"&gt;https://aws.amazon.com/what-is/retrieval-augmented-generation/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.ibm.com/think/topics/retrieval-augmented-generation" rel="noopener noreferrer"&gt;https://www.ibm.com/think/topics/retrieval-augmented-generation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.ibm.com/think/topics/knowledge-management" rel="noopener noreferrer"&gt;https://www.ibm.com/think/topics/knowledge-management&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2310.08560" rel="noopener noreferrer"&gt;https://arxiv.org/abs/2310.08560&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://research.memgpt.ai/" rel="noopener noreferrer"&gt;https://research.memgpt.ai/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://zettelkasten.de/posts/building-a-second-brain-and-zettelkasten/" rel="noopener noreferrer"&gt;https://zettelkasten.de/posts/building-a-second-brain-and-zettelkasten/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>rag</category>
      <category>wiki</category>
      <category>ai</category>
      <category>knowledgemanagement</category>
    </item>
    <item>
      <title>Agentic LLM Inference Parameters Reference for Qwen and Gemma</title>
      <dc:creator>Rost</dc:creator>
      <pubDate>Sun, 17 May 2026 02:27:20 +0000</pubDate>
      <link>https://dev.to/rosgluk/agentic-llm-inference-parameters-reference-for-qwen-and-gemma-4nkh</link>
      <guid>https://dev.to/rosgluk/agentic-llm-inference-parameters-reference-for-qwen-and-gemma-4nkh</guid>
      <description>&lt;p&gt;This page is a &lt;strong&gt;practical reference for agentic LLM inference tuning&lt;/strong&gt; (temperature, top_p, top_k, penalties, and how they interact in multi-step and tool-heavy workflows).&lt;/p&gt;

&lt;p&gt;It sits alongside the broader &lt;a href="https://www.glukhov.org/llm-performance/" rel="noopener noreferrer"&gt;LLM performance engineering hub&lt;/a&gt; and matches best with a clear &lt;a href="https://www.glukhov.org/llm-hosting/" rel="noopener noreferrer"&gt;LLM hosting and serving story&lt;/a&gt;—throughput and scheduling still dominate when the model is starved, but unstable sampling burns retries and output tokens before the GPU does.&lt;/p&gt;

&lt;p&gt;This page consolidates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;vendor recommended parameters&lt;/li&gt;
&lt;li&gt;embedded defaults from GGUF and APIs&lt;/li&gt;
&lt;li&gt;real-world community findings&lt;/li&gt;
&lt;li&gt;agentic workflow optimizations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Right now it is focused on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Qwen 3.6&lt;/strong&gt; (dense and MoE)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gemma 4&lt;/strong&gt; (dense and MoE)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you run terminal agents such as OpenCode, pair this reference with &lt;a href="https://www.glukhov.org/ai-devtools/opencode/llms-comparison/" rel="noopener noreferrer"&gt;local LLM behavior in OpenCode&lt;/a&gt; so workload-level results and sampler defaults stay aligned.&lt;/p&gt;

&lt;p&gt;The goal is simple:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Provide a single place to configure models for &lt;strong&gt;agent loops, coding, and multi-step reasoning&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  TLDR Reference Table - All models (agentic defaults)
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;temp&lt;/th&gt;
&lt;th&gt;top_p&lt;/th&gt;
&lt;th&gt;top_k&lt;/th&gt;
&lt;th&gt;presence_penalty&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Qwen 3.5 27B&lt;/td&gt;
&lt;td&gt;thinking general&lt;/td&gt;
&lt;td&gt;1.0&lt;/td&gt;
&lt;td&gt;0.95&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;0.0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qwen 3.5 27B&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;coding&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.6&lt;/td&gt;
&lt;td&gt;0.95&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;0.0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qwen 3.5 35B MoE&lt;/td&gt;
&lt;td&gt;thinking&lt;/td&gt;
&lt;td&gt;1.0&lt;/td&gt;
&lt;td&gt;0.95&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;1.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qwen 3.5 35B MoE&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;coding&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.6&lt;/td&gt;
&lt;td&gt;0.95&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;0.0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemma 4 31B&lt;/td&gt;
&lt;td&gt;general&lt;/td&gt;
&lt;td&gt;1.0&lt;/td&gt;
&lt;td&gt;0.95&lt;/td&gt;
&lt;td&gt;64&lt;/td&gt;
&lt;td&gt;0.0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemma 4 31B&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;coding&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1.2&lt;/td&gt;
&lt;td&gt;0.95&lt;/td&gt;
&lt;td&gt;65&lt;/td&gt;
&lt;td&gt;0.0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemma 4 26B MoE&lt;/td&gt;
&lt;td&gt;general&lt;/td&gt;
&lt;td&gt;1.0&lt;/td&gt;
&lt;td&gt;0.95&lt;/td&gt;
&lt;td&gt;64&lt;/td&gt;
&lt;td&gt;0.0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemma 4 26B MoE&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;coding&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1.2&lt;/td&gt;
&lt;td&gt;0.95&lt;/td&gt;
&lt;td&gt;65&lt;/td&gt;
&lt;td&gt;0.0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  What "Agentic Inference" Actually Means
&lt;/h2&gt;

&lt;p&gt;Most parameter guides assume:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;chat&lt;/li&gt;
&lt;li&gt;single-shot completion&lt;/li&gt;
&lt;li&gt;human interaction&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Agentic systems are different.&lt;/p&gt;

&lt;p&gt;They require:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;multi-step reasoning&lt;/li&gt;
&lt;li&gt;tool calling&lt;/li&gt;
&lt;li&gt;consistent outputs&lt;/li&gt;
&lt;li&gt;low error propagation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This changes tuning priorities.&lt;/p&gt;

&lt;h3&gt;
  
  
  Core shift
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Use case&lt;/th&gt;
&lt;th&gt;Priority&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Chat&lt;/td&gt;
&lt;td&gt;natural language quality&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Creative&lt;/td&gt;
&lt;td&gt;diversity&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agentic&lt;/td&gt;
&lt;td&gt;consistency + reasoning stability&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Qwen 3.6 Tuning
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Dense vs MoE matters
&lt;/h3&gt;

&lt;p&gt;Qwen is one of the few families where:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;MoE requires different penalties&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h4&gt;
  
  
  Dense (27B)
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;stable&lt;/li&gt;
&lt;li&gt;predictable&lt;/li&gt;
&lt;li&gt;no routing complexity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Recommended:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;presence_penalty = 0.0&lt;/li&gt;
&lt;/ul&gt;




&lt;h4&gt;
  
  
  MoE (35B-A3B)
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;expert routing per token&lt;/li&gt;
&lt;li&gt;risk of repetition loops&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Recommended:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;presence_penalty = 1.5 (general)&lt;/li&gt;
&lt;li&gt;0.0 for coding&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Why this matters
&lt;/h3&gt;

&lt;p&gt;MoE models can get stuck reusing the same experts.&lt;/p&gt;

&lt;p&gt;Presence penalty helps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;diversify token paths&lt;/li&gt;
&lt;li&gt;improve reasoning exploration&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Qwen Agentic Coding Setup
&lt;/h2&gt;

&lt;p&gt;This is where most people get it wrong.&lt;/p&gt;

&lt;h3&gt;
  
  
  Correct setup
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;temperature = 0.6&lt;/li&gt;
&lt;li&gt;top_p = 0.95&lt;/li&gt;
&lt;li&gt;top_k = 20&lt;/li&gt;
&lt;li&gt;presence_penalty = 0.0&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Why low temperature works
&lt;/h3&gt;

&lt;p&gt;Coding agents need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;deterministic outputs&lt;/li&gt;
&lt;li&gt;repeatable tool calls&lt;/li&gt;
&lt;li&gt;stable formatting&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Higher temperature:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;breaks JSON&lt;/li&gt;
&lt;li&gt;introduces hallucinated APIs&lt;/li&gt;
&lt;li&gt;increases retries&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Gemma 4 Tuning
&lt;/h2&gt;

&lt;p&gt;Gemma behaves differently.&lt;/p&gt;

&lt;h3&gt;
  
  
  No official defaults
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;model cards are empty&lt;/li&gt;
&lt;li&gt;configs are implicit&lt;/li&gt;
&lt;li&gt;real tuning comes from:

&lt;ul&gt;
&lt;li&gt;Google AI Studio&lt;/li&gt;
&lt;li&gt;GGUF defaults&lt;/li&gt;
&lt;li&gt;community benchmarks&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Counter-Intuitive Finding
&lt;/h2&gt;

&lt;p&gt;Gemma 4 performs better with &lt;strong&gt;higher temperature&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Observed behavior
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Temp&lt;/th&gt;
&lt;th&gt;Result&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;0.5&lt;/td&gt;
&lt;td&gt;poor reasoning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1.0&lt;/td&gt;
&lt;td&gt;stable baseline&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1.2 to 1.5&lt;/td&gt;
&lt;td&gt;best coding performance&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This contradicts standard advice.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why high temperature works here
&lt;/h2&gt;

&lt;p&gt;Hypothesis:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;training distribution favors exploration&lt;/li&gt;
&lt;li&gt;reasoning mode depends on diversity&lt;/li&gt;
&lt;li&gt;model compensates for lack of explicit chain-of-thought control&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Result:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;higher temperature improves solution search space&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Gemma Agentic Coding Setup
&lt;/h2&gt;

&lt;p&gt;Recommended:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;temperature = 1.2&lt;/li&gt;
&lt;li&gt;top_p = 0.95&lt;/li&gt;
&lt;li&gt;top_k = 65&lt;/li&gt;
&lt;li&gt;penalties = 0.0&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Important
&lt;/h3&gt;

&lt;p&gt;Do not apply traditional "low temp for code" rule blindly.&lt;/p&gt;

&lt;p&gt;Gemma is an exception.&lt;/p&gt;




&lt;h2&gt;
  
  
  Thinking Mode and Agent Systems
&lt;/h2&gt;

&lt;p&gt;Both Qwen and Gemma support reasoning modes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why it matters
&lt;/h3&gt;

&lt;p&gt;Agent loops require:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;intermediate reasoning&lt;/li&gt;
&lt;li&gt;error recovery&lt;/li&gt;
&lt;li&gt;multi-step planning&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Practical rule
&lt;/h3&gt;

&lt;p&gt;Always enable thinking mode for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;coding agents&lt;/li&gt;
&lt;li&gt;tool use&lt;/li&gt;
&lt;li&gt;multi-step tasks&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Parameter Strategy by Use Case
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Coding agents
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;prioritize determinism&lt;/li&gt;
&lt;li&gt;minimize penalties&lt;/li&gt;
&lt;li&gt;stable sampling&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Reasoning agents
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;moderate temperature&lt;/li&gt;
&lt;li&gt;allow exploration&lt;/li&gt;
&lt;li&gt;preserve structure&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Tool calling
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;strict formatting&lt;/li&gt;
&lt;li&gt;low randomness&lt;/li&gt;
&lt;li&gt;consistent token patterns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Schema and JSON tooling are orthogonal to logits; combine these sampling rules with &lt;a href="https://www.glukhov.org/llm-performance/ollama/llm-structured-output-with-ollama-in-python-and-go/" rel="noopener noreferrer"&gt;structured output patterns for Ollama and Qwen3&lt;/a&gt; so validators see fewer retries.&lt;/p&gt;




&lt;h2&gt;
  
  
  Vendor Defaults vs Reality
&lt;/h2&gt;

&lt;p&gt;Vendor defaults are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;safe&lt;/li&gt;
&lt;li&gt;generic&lt;/li&gt;
&lt;li&gt;not optimized&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Community findings often show:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;better performance&lt;/li&gt;
&lt;li&gt;task-specific tuning&lt;/li&gt;
&lt;li&gt;architecture-aware adjustments&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Example
&lt;/h3&gt;

&lt;p&gt;Gemma:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;official: no guidance&lt;/li&gt;
&lt;li&gt;community: high temperature improves coding&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Qwen:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;official: inconsistent sections&lt;/li&gt;
&lt;li&gt;community: standardized values converge&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Practical Deployment Notes
&lt;/h2&gt;

&lt;p&gt;Under concurrency, queueing and memory splits interact with retries as much as sampling does—read &lt;a href="https://www.glukhov.org/llm-performance/ollama/how-ollama-handles-parallel-requests/" rel="noopener noreferrer"&gt;how Ollama handles parallel requests&lt;/a&gt; alongside the presets above.&lt;/p&gt;

&lt;h3&gt;
  
  
  Ollama
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;works well for both families&lt;/li&gt;
&lt;li&gt;verify GPU compatibility&lt;/li&gt;
&lt;li&gt;defaults may differ from reference&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  vLLM
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;supports advanced sampling&lt;/li&gt;
&lt;li&gt;stable for production&lt;/li&gt;
&lt;li&gt;use explicit parameters&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  llama.cpp
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;requires sampler ordering&lt;/li&gt;
&lt;li&gt;always enable jinja for modern models&lt;/li&gt;
&lt;li&gt;incorrect sampler chain reduces output quality&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;there is no universal parameter set&lt;/li&gt;
&lt;li&gt;architecture matters more than model size&lt;/li&gt;
&lt;li&gt;agentic systems require different tuning than chat&lt;/li&gt;
&lt;li&gt;community benchmarks are often ahead of vendors&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Final Opinion
&lt;/h2&gt;

&lt;p&gt;Most parameter guides are outdated.&lt;/p&gt;

&lt;p&gt;They assume:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;chat use&lt;/li&gt;
&lt;li&gt;low temperature for code&lt;/li&gt;
&lt;li&gt;static configurations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Modern models break those assumptions.&lt;/p&gt;

&lt;p&gt;If you are building agentic systems:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;treat inference tuning as a first-class system design problem&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Not a config file.&lt;/p&gt;




&lt;h2&gt;
  
  
  Future Direction
&lt;/h2&gt;

&lt;p&gt;This reference will evolve into:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;per-model deep dives&lt;/li&gt;
&lt;li&gt;agent-specific configs&lt;/li&gt;
&lt;li&gt;benchmarking-backed tuning&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Because:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;inference is where model capability becomes system performance&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>hermes</category>
      <category>openclaw</category>
      <category>opencode</category>
      <category>cheatsheet</category>
    </item>
    <item>
      <title>LLM Structured Output Validation in Python That Holds Up</title>
      <dc:creator>Rost</dc:creator>
      <pubDate>Fri, 15 May 2026 01:26:29 +0000</pubDate>
      <link>https://dev.to/rosgluk/llm-structured-output-validation-in-python-that-holds-up-3inc</link>
      <guid>https://dev.to/rosgluk/llm-structured-output-validation-in-python-that-holds-up-3inc</guid>
      <description>&lt;p&gt;Most LLM "structured output" tutorials are unserious.&lt;br&gt;
They teach you to ask for JSON politely and then hope the model behaves.&lt;br&gt;
That is not validation.&lt;br&gt;
That is optimism with braces.&lt;/p&gt;



&lt;p&gt;OpenAI's own docs make the distinction explicit. JSON mode gives you valid JSON, while Structured Outputs enforces schema adherence, and OpenAI recommends using Structured Outputs instead of JSON mode when possible.&lt;/p&gt;

&lt;p&gt;That still does not make the payload trustworthy. JSON Schema defines structure and allowed values, Pydantic gives you typed validation in Python, and OpenAI explicitly notes that a schema-valid response can still contain incorrect values. On top of that, refusals and incomplete outputs can bypass the shape you expected. In production, structured output validation is a pipeline, not a toggle. The same boundary also has to live inside the wider story of throughput, retries, and scheduler limits on the &lt;a href="https://www.glukhov.org/llm-performance/" rel="noopener noreferrer"&gt;LLM performance engineering hub&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;
  
  
  Structured output validation is a contract
&lt;/h2&gt;

&lt;p&gt;Structured output validation for LLMs means you define the shape of the answer up front, constrain the model to produce that shape when possible, and then validate the result again before your application trusts it. In practical terms, that means checking required fields, types, enums, closed object shapes, and domain rules before the payload touches your database, UI, queue, or downstream service. JSON Schema exists for exactly this kind of structural validation, Pydantic is built to validate untrusted data against Python type hints, and Python's &lt;code&gt;jsonschema&lt;/code&gt; library gives you a direct way to validate an instance against a schema.&lt;/p&gt;

&lt;p&gt;There is also a clean split between two common use cases. If the model is supposed to answer the user in a structured format, use a structured response format. If the model is supposed to call your application's tools or functions, use function calling. OpenAI's docs spell out that distinction, and for function calling they recommend enabling &lt;code&gt;strict: true&lt;/code&gt; so the arguments reliably adhere to the function schema.&lt;/p&gt;

&lt;p&gt;My strong opinion is simple. Treat every structured LLM response as an API boundary. Once you start thinking in terms of contracts instead of prompts, the architecture gets cleaner, the bugs get cheaper, and the whole "why did the model invent a new field in production" problem mostly disappears. That is the real answer to "what is structured output validation for LLMs" and it is a much better answer than "ask the model nicely for JSON."&lt;/p&gt;
&lt;h2&gt;
  
  
  JSON mode is not validation
&lt;/h2&gt;

&lt;p&gt;If you remember only one thing from this article, make it this. JSON mode is not schema validation. OpenAI's Help Center says JSON mode will not guarantee the output matches any specific schema, only that it is valid JSON and parses without errors. The Structured Outputs guide says the same thing in a cleaner way. Both JSON mode and Structured Outputs can produce valid JSON, but only Structured Outputs enforces schema adherence.&lt;/p&gt;

&lt;p&gt;That difference matters more than people admit. In its Structured Outputs launch post, OpenAI reported that &lt;code&gt;gpt-4o-2024-08-06&lt;/code&gt; with Structured Outputs scored 100 percent on its complex JSON schema evals, while &lt;code&gt;gpt-4-0613&lt;/code&gt; scored under 40 percent. You do not need to treat those numbers as universal truth to see the broader point. Schema enforcement changes the failure surface from "anything could happen" to "the contract is much tighter."&lt;/p&gt;

&lt;p&gt;There are still edge cases, and pretending otherwise is how toy demos become pager duty. OpenAI documents that the model can refuse an unsafe request, and those refusals are surfaced outside your normal schema path. It also documents incomplete responses, including cases such as hitting &lt;code&gt;max_output_tokens&lt;/code&gt; or a content filter interruption. So the FAQ "is JSON mode enough for reliable LLM output" has a short answer and a longer one. The short answer is no. The longer answer is that even strict structured output still needs explicit failure handling.&lt;/p&gt;
&lt;h2&gt;
  
  
  Where structured output still breaks
&lt;/h2&gt;

&lt;p&gt;Schema enforcement shrinks the problem. It does not delete it. In real traffic you still see broken or surprising payloads for reasons that have little to do with your prompt wording.&lt;/p&gt;
&lt;h3&gt;
  
  
  Failure shapes worth designing for
&lt;/h3&gt;

&lt;p&gt;Models and clients disagree about details. You can get extra prose before or after the JSON, Markdown fenced blocks around the payload, or a tool call whose name is valid but whose arguments are JSON that does not match your Pydantic model. Streaming makes it worse because you might validate a half-finished buffer. Defensive code should assume "string in, maybe JSON inside" rather than "bytes on the wire already match my model."&lt;/p&gt;
&lt;h3&gt;
  
  
  Provider and API differences
&lt;/h3&gt;

&lt;p&gt;Not every host exposes the same structured-output surface. One stack might give you a first-class schema-bound completion, another might only guarantee JSON syntax, and local runtimes might lag behind hosted APIs. That is one reason the FAQ "how do you validate LLM JSON in Python" starts with provider enforcement when it exists and still ends with Python-side validation. For a wider view of how vendors compare, see the &lt;a href="https://www.glukhov.org/llm-performance/benchmarks/structured-output-comparison-popular-llm-providers/" rel="noopener noreferrer"&gt;structured output comparison across popular LLM providers&lt;/a&gt;. If you run models locally, the same validation pipeline applies after you normalize the wire format, for example after extraction with Ollama as in &lt;a href="https://www.glukhov.org/llm-performance/ollama/llm-structured-output-with-ollama-in-python-and-go/" rel="noopener noreferrer"&gt;structured LLM output with Ollama in Python and Go&lt;/a&gt;. When a runtime still wraps JSON with odd prefixes or reasoning traces, expect the same class of parser failures described in &lt;a href="https://www.glukhov.org/llm-performance/ollama/ollama-gpt-oss-structured-output-issues/" rel="noopener noreferrer"&gt;Ollama GPT-OSS structured output issues&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Python stack that actually works
&lt;/h2&gt;

&lt;p&gt;My recommendation is boring on purpose. First, let the model provider enforce the structural contract when it can. Second, validate the returned payload in Python with Pydantic. Third, use explicit business-rule validation for facts that a schema alone cannot prove. Fourth, test the contract with fixtures and adversarial examples instead of waving at a playground screenshot and calling it done. OpenAI's Structured Outputs docs, Pydantic's validator model, Python's &lt;code&gt;jsonschema&lt;/code&gt; tooling, and OpenAI's own structured-output eval examples all point in that direction.&lt;/p&gt;

&lt;p&gt;Pydantic is the right center of gravity for Python. It lets you model the output as normal Python types, generate JSON Schema with &lt;code&gt;model_json_schema()&lt;/code&gt;, and validate raw JSON with &lt;code&gt;model_validate_json()&lt;/code&gt;. Pydantic's docs also note that &lt;code&gt;model_validate_json()&lt;/code&gt; is generally the better path than doing &lt;code&gt;json.loads(...)&lt;/code&gt; first and then validating, because that two-step route adds extra parsing work in Python.&lt;/p&gt;

&lt;p&gt;If you keep standalone schema files in your repo, or you want CI to validate fixture payloads independently of model code, Python's &lt;code&gt;jsonschema&lt;/code&gt; package gives you the simplest possible contract check with &lt;code&gt;jsonschema.validate(...)&lt;/code&gt;. If you want that in pre-commit, &lt;code&gt;check-jsonschema&lt;/code&gt; exists specifically as a CLI and pre-commit hook built on &lt;code&gt;jsonschema&lt;/code&gt;. That is a very good fit for teams that want schema changes reviewed like code changes.&lt;/p&gt;

&lt;p&gt;Frameworks can reduce plumbing, but they do not remove the need for actual validation. LangChain now auto-selects provider-native structured output when the provider supports it and falls back to a tool strategy otherwise. Instructor layers Pydantic response models, validation, retries, and multi-provider support on top of model calls. Guardrails focuses on validators and input-output guard layers. Useful tools, all of them. But the schema and the business rules still belong to you. If you are choosing between higher-level libraries, the &lt;a href="https://www.glukhov.org/llm-performance/benchmarks/baml-vs-instruct-for-structured-output-llm-in-python/" rel="noopener noreferrer"&gt;BAML vs Instructor comparison for Python&lt;/a&gt; is a useful companion to this article.&lt;/p&gt;
&lt;h2&gt;
  
  
  A minimal OpenAI and Pydantic example
&lt;/h2&gt;

&lt;p&gt;The smallest production-worthy example has a few non-negotiables. Use a closed set of enum-like values where possible. Forbid extra keys. Add field descriptions so the schema is understandable to humans and more legible to the model. Keep the root object explicit and boring. OpenAI recommends clear names plus titles and descriptions for important keys, JSON Schema uses &lt;code&gt;enum&lt;/code&gt; to restrict values, and Pydantic can close the object shape with &lt;code&gt;extra="forbid"&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Literal&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ConfigDict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Field&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;TicketClassification&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;model_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ConfigDict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;extra&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;forbid&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;category&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Literal&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;billing&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bug&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;how_to&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;abuse&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Support ticket category.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;priority&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Literal&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;low&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;medium&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;high&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Operational urgency.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;needs_human&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Whether a human should review the case.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;A one sentence summary of the issue.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;responses&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o-2024-08-06&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Classify support tickets. Return only the structured result.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Customer reports duplicate charges after refreshing checkout.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;text_format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;TicketClassification&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output_parsed&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_dump&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two details in that example are easy to miss and absolutely worth caring about. &lt;code&gt;extra="forbid"&lt;/code&gt; on the Pydantic side mirrors the JSON Schema idea of &lt;code&gt;additionalProperties: false&lt;/code&gt;, which is also a requirement for strict tool schemas in OpenAI's function-calling docs. And enums are not cosmetic. They are one of the simplest ways to stop the model from inventing a value your code does not understand.&lt;/p&gt;

&lt;p&gt;The OpenAI Python SDK supports &lt;code&gt;client.responses.parse(...)&lt;/code&gt; with a Pydantic model supplied as &lt;code&gt;text_format&lt;/code&gt;, and the parsed object is returned on &lt;code&gt;response.output_parsed&lt;/code&gt;. The same SDK also supports &lt;code&gt;client.chat.completions.parse(...)&lt;/code&gt;, where the parsed object lives on &lt;code&gt;message.parsed&lt;/code&gt;. If you want direct structured data extraction with minimal glue, those helpers are the cleanest starting point.&lt;/p&gt;

&lt;h2&gt;
  
  
  Parse, normalize, then validate
&lt;/h2&gt;

&lt;p&gt;Structured Outputs and &lt;code&gt;model_validate_json&lt;/code&gt; remove a lot of parsing pain when the stack is aligned end to end. The moment you support a provider that returns plain chat text, a model that wraps JSON in fences, or a logging path that stores the raw completion string, you want one choke point that turns text into a dict before Pydantic runs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;parse_json_from_llm_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;cleaned&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;cleaned&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;```

&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;cleaned&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cleaned&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;cleaned&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cleaned&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rsplit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;

```&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Common "Sure, here is the JSON:" prefix before the object.
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;cleaned&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;cleaned&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;cleaned&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cleaned&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;end&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cleaned&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rfind&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;cleaned&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cleaned&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cleaned&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;ticket_dict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;parse_json_from_llm_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw_completion_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ticket&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;TicketClassification&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_validate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ticket_dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That helper is intentionally boring. It handles fenced "&lt;br&gt;
&lt;br&gt;
&lt;code&gt;json ...&lt;/code&gt;&lt;br&gt;
&lt;br&gt;
" blocks and a leading natural-language preamble when the payload is still a single top-level object. It is not a full JSON extractor. If the model nests braces inside string values, naive slicing can break, and the right fix is usually stricter prompting, schema-bound completions, or a dedicated parser library.&lt;/p&gt;
&lt;h3&gt;
  
  
  Streaming completions
&lt;/h3&gt;

&lt;p&gt;If you stream chat tokens, do not run &lt;code&gt;json.loads&lt;/code&gt; or &lt;code&gt;model_validate_json&lt;/code&gt; on every delta. Buffer until the API reports a finished message (check your client for the stream termination or &lt;code&gt;finish_reason&lt;/code&gt;), concatenate the text, then parse once. The same rule applies when tool-call arguments arrive in chunks. You only validate after the arguments string is complete.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;completion_stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;delta&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
    &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;raw_completion_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ticket&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;TicketClassification&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_validate_json&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw_completion_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can still pass &lt;code&gt;raw_completion_text&lt;/code&gt; through &lt;code&gt;parse_json_from_llm_text&lt;/code&gt; first when you expect fences or chatter around the JSON.&lt;/p&gt;

&lt;p&gt;Once you own plain-string parsing, the next constraint is often not Python but the provider's JSON Schema dialect and what the remote API actually accepts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Provider schema limits (before you get clever in Python)
&lt;/h2&gt;

&lt;p&gt;Do not blindly dump any schema generator output into an API and assume every JSON Schema feature is supported. OpenAI supports a subset of JSON Schema, requires all fields to be required for Structured Outputs, requires the root to be an object rather than a top-level &lt;code&gt;anyOf&lt;/code&gt;, and documents limits on nesting depth and total property count. Keep the provider-facing schema simple. That is not a compromise. That is good engineering.&lt;/p&gt;

&lt;p&gt;If you need a provider-agnostic validation path, or you want to validate stored fixtures and mocks, Pydantic plus &lt;code&gt;jsonschema&lt;/code&gt; is still a great combination.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;jsonschema&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;validate&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;validate_json&lt;/span&gt;

&lt;span class="n"&gt;schema&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;TicketClassification&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_json_schema&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;category&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bug&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;priority&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;high&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;needs_human&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;summary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Checkout duplicates charges after refresh.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nf"&gt;validate_json&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;instance&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;schema&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;schema&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ticket&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;TicketClassification&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_validate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ticket&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That pattern is especially handy in tests, contract fixtures, and integrations where the model provider does not offer native structured output enforcement. Just remember that a locally generated schema may be broader than a given provider's supported subset, so "valid locally" does not automatically mean "accepted by every LLM API." Also note that some providers preprocess and cache schema artifacts, so the first request for a new schema can be slower than warm requests.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tool calls are a second contract
&lt;/h2&gt;

&lt;p&gt;Function or tool calling is the other major structured-output shape. The model chooses a name and passes arguments that should match a JSON Schema you control. OpenAI recommends &lt;code&gt;strict: true&lt;/code&gt; on tool definitions so arguments stay aligned with that schema. In agent-heavy stacks, bad sampling turns into invalid tool JSON fast; keep sampler settings aligned with multi-step work using the &lt;a href="https://www.glukhov.org/llm-performance/benchmarks/agentic-inference-parameters-reference/" rel="noopener noreferrer"&gt;agentic inference parameters reference for Qwen and Gemma&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The snippets below assume you already mapped the provider's tool-call object into a &lt;code&gt;name&lt;/code&gt; string and an &lt;code&gt;arguments&lt;/code&gt; dict, for example by parsing &lt;code&gt;tool_calls[].function&lt;/code&gt; on chat completions (JSON string arguments become &lt;code&gt;json.loads&lt;/code&gt; first). &lt;code&gt;dispatch_tool&lt;/code&gt; is the step after that normalization.&lt;/p&gt;

&lt;p&gt;Two practical rules help in Python. First, validate the tool name against an explicit allowlist before you route execution. Second, validate the arguments dict with the same Pydantic model you use in tests, not with ad hoc key access. The failure mode you are avoiding is "valid JSON arguments, wrong shape for the tool that fired," which slips past string checks.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Callable&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;

&lt;span class="n"&gt;ToolHandler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Callable&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;dispatch_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;handlers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;ToolHandler&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;handlers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unsupported tool &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;model_cls&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;handler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;handlers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;validated&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model_cls&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_validate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;validated&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_dump&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

&lt;span class="n"&gt;handlers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;ToolHandler&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;classify_ticket&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;TicketClassification&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;queued as &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;category&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That pattern keeps routing and validation in one place. Your real handlers will be richer, but the split should stay the same: allowed names, typed arguments, then side effects.&lt;/p&gt;

&lt;h2&gt;
  
  
  Schema validation still needs business rules
&lt;/h2&gt;

&lt;p&gt;A valid object is not the same thing as a correct object. OpenAI says this directly. Structured Outputs does not prevent mistakes inside the values of the JSON object. That is why the FAQ "why do schema validation and business-rule validation both matter" has a blunt answer. Because a response can match the schema perfectly and still be wrong in a way that hurts the business.&lt;/p&gt;

&lt;p&gt;Here is a realistic example. The structure can be valid, but the pricing logic can still be nonsense.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;decimal&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Decimal&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Literal&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing_extensions&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Self&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ConfigDict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model_validator&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Offer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;model_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ConfigDict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;extra&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;forbid&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;currency&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Literal&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;USD&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;EUR&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GBP&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Decimal&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;gt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;original_amount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Decimal&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="n"&gt;discounted&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;

    &lt;span class="nd"&gt;@model_validator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;after&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;check_discount_logic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Self&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;discounted&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;original_amount&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;original_amount is required when discounted is true&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;original_amount&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;original_amount must be greater than amount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That validator does something schemas alone often do poorly in real systems. It checks cross-field semantics after the whole model has been parsed. Pydantic's &lt;code&gt;model_validator&lt;/code&gt; exists exactly for this kind of whole-object validation. Notice the &lt;code&gt;Decimal | None&lt;/code&gt; field without a default. That keeps the field present while still allowing &lt;code&gt;null&lt;/code&gt;, which matches OpenAI's documented pattern for optional-like values under strict Structured Outputs.&lt;/p&gt;

&lt;p&gt;If you want validation failures to feed back into the model automatically, Instructor is a practical layer on top of Pydantic. Its docs describe a retry loop where validation errors are captured, formatted as feedback, and used to ask the model to try again.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;instructor&lt;/span&gt;

&lt;span class="n"&gt;retrying_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;instructor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_provider&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai/gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_retries&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;offer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;retrying_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;response_model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Offer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Extract the offer from this text. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Was 49.00 USD, now 19.00 USD.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is one of the few conveniences I will happily recommend. Automatic retries tied to real validation errors are useful. Silent coercion is not. Instructor's model layer, retry docs, and validation docs all lean into that same idea, and they are right to do so.&lt;/p&gt;

&lt;p&gt;You can implement the same idea without a framework. The loop is small. Ask the model, validate with Pydantic, and if validation fails, send the error details back in a follow-up user message and ask for corrected JSON only. Cap attempts, log the final failure, and surface a controlled error to callers. When you already rely on &lt;code&gt;responses.parse&lt;/code&gt; or other schema-bound helpers, you may rarely exercise this path. It still matters for JSON mode, older chat endpoints, or any gateway that hands you a raw string.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ValidationError&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Return only JSON that matches the ticket schema.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Customer reports duplicate charges after refreshing checkout.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;ticket&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;TicketClassification&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;attempt&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;completion&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o-2024-08-06&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;response_format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;json_object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;raw_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;completion&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;ticket&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;TicketClassification&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_validate_json&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;break&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;ValidationError&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;exc&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Validation failed with &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;exc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;. Return corrected JSON only.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;RuntimeError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;exhausted structured output retries&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;ticket&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In real services you would attach tracing IDs, redact customer text in logs, and distinguish recoverable validation errors from refusals or incomplete responses. The important part is that the retry is driven by real validator output, not by a generic "try again" message.&lt;/p&gt;

&lt;h2&gt;
  
  
  Test, retry, and fail closed
&lt;/h2&gt;

&lt;p&gt;What should happen when LLM validation fails? Not a shrug. Reject the payload, log the failure, retry with bounded attempts if the task is worth retrying, and fail closed instead of normalizing garbage into something that only looks acceptable. This is also where many teams forget to handle refusals and incomplete outputs explicitly, even though the provider docs tell them those paths exist.&lt;/p&gt;

&lt;p&gt;For OpenAI's Responses API, failure handling should be first-class code, not an afterthought. The variable is &lt;code&gt;response&lt;/code&gt; from &lt;code&gt;client.responses.create&lt;/code&gt; or &lt;code&gt;parse&lt;/code&gt;, not &lt;code&gt;completion&lt;/code&gt; from chat streaming elsewhere in this article.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;incomplete&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;RuntimeError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;incomplete_details&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;refusal&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;RuntimeError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;refusal&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is not defensive over-engineering. It is directly aligned with the documented failure modes. If the model refuses, you are not holding a schema-valid payload. If the response is incomplete, you are not holding a schema-valid payload. Treat both as explicit branches in your control flow.&lt;/p&gt;

&lt;p&gt;You should also test the contract outside the model call itself.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pytest&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;jsonschema&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;validate&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;validate_json&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ValidationError&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_ticket_fixture_matches_schema&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;category&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bug&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;priority&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;high&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;needs_human&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;summary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Checkout duplicates charges after refresh.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="nf"&gt;validate_json&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;instance&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;schema&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;TicketClassification&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_json_schema&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_discount_logic_rejects_broken_offer&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;pytest&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;raises&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ValidationError&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;Offer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_validate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;currency&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;USD&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;amount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;19.00&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;original_amount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;10.00&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;discounted&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_ticket_rejects_unknown_category_string&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;pytest&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;raises&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ValidationError&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;TicketClassification&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_validate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;category&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;refund&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;priority&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;high&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;needs_human&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;summary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Customer wants a refund.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_ticket_rejects_extra_keys&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;pytest&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;raises&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ValidationError&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;TicketClassification&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_validate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;category&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bug&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;priority&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;high&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;needs_human&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;summary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Broken flow.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;severity&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;critical&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the right shape of test strategy for LLM output validation in Python. Validate golden fixtures with &lt;code&gt;jsonschema&lt;/code&gt; so every field in the contract is exercised. Validate semantics with Pydantic, then add adversarial cases such as illegal enum strings, forbidden extra keys, and cross-field contradictions you care about. If you snapshot real model outputs, scrub PII and treat them as regression fixtures.&lt;/p&gt;

&lt;p&gt;If your team lives in the OpenAI stack, the Evals API also includes structured-output evaluation recipes specifically for testing and iterating on tasks that depend on machine-readable formats. And if you keep raw schema files in the repo, wire &lt;code&gt;check-jsonschema&lt;/code&gt; into CI or pre-commit. Ship contracts, not vibes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Production checks that save you later
&lt;/h2&gt;

&lt;p&gt;When validation fails, the FAQ answer is blunt. Reject the payload, log why, retry with targeted feedback when the task is worth another attempt, and fail closed instead of coercing bad data into a queue.&lt;/p&gt;

&lt;p&gt;A short operations checklist helps teams avoid repeat incidents.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Log schema version or a hash of the JSON Schema you sent to the provider so you can replay failures accurately.&lt;/li&gt;
&lt;li&gt;Redact model inputs and outputs in logs. Structured logs are useless if they leak customer text.&lt;/li&gt;
&lt;li&gt;Emit counters or metrics for refusal rate, incomplete response rate, validation failure rate, and repair success rate. Spikes there beat guessing when a model or prompt change shipped.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Broader &lt;a href="https://www.glukhov.org/observability/observability-for-llm-systems/" rel="noopener noreferrer"&gt;observability for LLM systems&lt;/a&gt; guidance helps wire those signals into dashboards, traces, and SLO reviews once the counters exist.&lt;/p&gt;

&lt;p&gt;The best practice is not complicated. Use provider-side Structured Outputs or strict tool schemas when you can. Normalize raw text when you must. Mirror the contract in Python with Pydantic. Add business-rule validation for what the schema cannot prove. Handle refusals and incomplete responses as normal branches. Test the contract until it stops being a demo and starts being software. Anything less is just prompt engineering cosplay.&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>llm</category>
      <category>ai</category>
      <category>aicoding</category>
    </item>
  </channel>
</rss>
