<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Li-Hsuan Lung</title>
    <description>The latest articles on DEV Community by Li-Hsuan Lung (@lihsuanlung).</description>
    <link>https://dev.to/lihsuanlung</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3821511%2Fb055c88c-86bd-4246-a48a-e6c7221c65c5.png</url>
      <title>DEV Community: Li-Hsuan Lung</title>
      <link>https://dev.to/lihsuanlung</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/lihsuanlung"/>
    <language>en</language>
    <item>
      <title>Semantic Search — How ProjectBrain Finds What You Mean</title>
      <dc:creator>Li-Hsuan Lung</dc:creator>
      <pubDate>Fri, 27 Mar 2026 13:00:00 +0000</pubDate>
      <link>https://dev.to/lihsuanlung/semantic-search-how-projectbrain-finds-what-you-mean-325p</link>
      <guid>https://dev.to/lihsuanlung/semantic-search-how-projectbrain-finds-what-you-mean-325p</guid>
      <description>&lt;h2&gt;
  
  
  The filing cabinet problem
&lt;/h2&gt;

&lt;p&gt;Imagine your project's knowledge base as a massive library with thousands of books, each one containing facts, decisions, and lessons learned by your team. The challenge? There’s no universal catalog. Every book is shelved by whatever label the author thought made sense at the time.&lt;/p&gt;

&lt;p&gt;When you need to find something, you rarely remember the exact phrase that was used. You search for "token expiration" and miss the entry titled "auth session handling." You search for "rate limit" and miss the fact logged as "API throttling ceiling is 1000 req/min." The answer is there. You just can’t reach it.&lt;/p&gt;

&lt;h2&gt;
  
  
  How most search works — and where it falls short
&lt;/h2&gt;

&lt;p&gt;Most search systems operate on exact word matching. The technical term is &lt;em&gt;lexical search&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;The idea is simple: take the words in your query, find documents that contain those words, and rank them by how often the words appear.&lt;/p&gt;

&lt;p&gt;If you search for "rate limit," you get back entries that literally contain the words "rate" and "limit." If someone logged a fact called "API throttling ceiling is 1000 requests per minute," you won't find it — even though it's exactly what you were looking for.&lt;/p&gt;

&lt;p&gt;Lexical search has real strengths. It's fast, reliable, and perfect for exact identifiers. If you need to find a specific ticket number, an error code, or a function name, word-matching is what you want.&lt;/p&gt;

&lt;p&gt;But for a knowledge base full of human-authored notes, decisions, and procedures, literal word matching misses half the content.&lt;/p&gt;

&lt;h2&gt;
  
  
  A different approach: search by meaning
&lt;/h2&gt;

&lt;p&gt;In recent years, semantic search powered by vector embeddings has become accessible and practical for most teams.&lt;/p&gt;

&lt;p&gt;Here is the idea. Modern AI models can read a piece of text and produce a numerical fingerprint — a list of hundreds of numbers that represents the &lt;em&gt;meaning&lt;/em&gt; of the text. Similar meanings produce similar fingerprints. Different meanings produce very different ones.&lt;/p&gt;

&lt;p&gt;When you store a fact in ProjectBrain, we run it through OpenAI's embedding model and save this numerical fingerprint alongside the text. When you search, we fingerprint your query the same way. Then we find the stored entries whose fingerprints are most similar to yours.&lt;/p&gt;

&lt;p&gt;Because the fingerprints encode meaning rather than words, this works even when the vocabulary is completely different. "Rate limit," "API throttling ceiling," and "maximum requests per minute" all point to the same region in meaning-space. The search finds all of them.&lt;/p&gt;

&lt;p&gt;Here's a real example from our own knowledge base. We logged this fact:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Docker test stage must reset ENTRYPOINT inherited from production stage&lt;/strong&gt;&lt;br&gt;
When a Dockerfile test stage extends a production base stage that sets ENTRYPOINT, the test stage inherits it. This causes docker compose run to pass the test command as arguments to the production entrypoint instead of executing it directly.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If you search for &lt;em&gt;"run tests locally docker compose"&lt;/em&gt;, a lexical search on that query finds it because "docker" and "compose" appear in the title. But if you search for &lt;em&gt;"test container starts server instead of running pytest"&lt;/em&gt; — which is the actual symptom someone debugging this would type — a lexical search finds nothing. Semantic search finds it immediately, because the meaning of those two descriptions is the same.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem with semantic-only search
&lt;/h2&gt;

&lt;p&gt;Semantic search sounds perfect. Why not just use it for everything?&lt;/p&gt;

&lt;p&gt;Because it has its own blind spots.&lt;/p&gt;

&lt;p&gt;Semantic search relies on your embeddings being up to date. A newly added entry needs to be indexed before it can be found. And the embedding model sometimes misses on very technical content — exact identifiers, version numbers, and project-specific abbreviations that have no semantic neighborhood in the training data.&lt;/p&gt;

&lt;p&gt;If someone on our team logged a fact about migration revision &lt;code&gt;053_task_id_facts_skills&lt;/code&gt;, a semantic search for that exact string might rank it lower than other migration-related entries. Lexical search would nail it immediately.&lt;/p&gt;

&lt;p&gt;The two approaches are genuinely complementary.&lt;/p&gt;

&lt;h2&gt;
  
  
  How we combined them
&lt;/h2&gt;

&lt;p&gt;ProjectBrain's search uses both — and then ranks the combined results using four signals. The weights below were tuned empirically against real search sessions on our own knowledge base:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Semantic similarity (55%)&lt;/strong&gt; is the dominant factor when embeddings are available. It captures meaning, synonyms, and conceptual proximity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lexical overlap (25%)&lt;/strong&gt; handles exact matches — identifiers, code snippets, specific error messages. This is our Elasticsearch-style fallback.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Recency (15%)&lt;/strong&gt; gives newer entries a boost. A fact logged last week is more likely to be current than one from six months ago.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Task linkage (5%)&lt;/strong&gt; is a small tiebreaker: entries linked to specific tasks in the project rank slightly higher than general, free-floating knowledge.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this looks like in practice
&lt;/h2&gt;

&lt;p&gt;Here are two real searches we ran against ProjectBrain's own knowledge base after building this feature.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Query: "run tests locally docker compose"&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Rank&lt;/th&gt;
&lt;th&gt;Entry&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Run tests locally using docker compose (matches CI)&lt;/td&gt;
&lt;td&gt;Skill&lt;/td&gt;
&lt;td&gt;0.58&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Containerise CI test runs with docker compose&lt;/td&gt;
&lt;td&gt;Decision&lt;/td&gt;
&lt;td&gt;0.52&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Docker test stage must reset ENTRYPOINT inherited from production stage&lt;/td&gt;
&lt;td&gt;Fact&lt;/td&gt;
&lt;td&gt;0.52&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The top three results are exactly the three entries we logged earlier that day. They weren't the most recent entries in the system, and they didn't use the same phrasing as the query. But they matched the meaning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Query: "git hooks enforce lint before push"&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Rank&lt;/th&gt;
&lt;th&gt;Entry&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Store git hooks in .githooks/ and activate via core.hooksPath&lt;/td&gt;
&lt;td&gt;0.55&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A 10-point gap to the next result. No other entry came close.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why transparency matters
&lt;/h2&gt;

&lt;p&gt;One thing we were careful about: every search result includes a score breakdown. You can see exactly how much of the score came from semantic similarity, lexical overlap, recency, and task linkage.&lt;/p&gt;

&lt;p&gt;This matters for a couple of reasons.&lt;/p&gt;

&lt;p&gt;First, it builds trust. When an agent retrieves knowledge and acts on it, you want to understand why that entry was selected. "Semantic similarity: 72%, also linked to the current task" is a lot more trustworthy than "it came from the search."&lt;/p&gt;

&lt;p&gt;Second, it makes the system debuggable. If a result that should rank first is coming in third, the breakdown tells you exactly which signal is dragging it down. Maybe the entry is old and needs refreshing. That's a fixable problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this means for agents
&lt;/h2&gt;

&lt;p&gt;For AI agents working through ProjectBrain, the search improvement has a direct effect on session startup quality.&lt;/p&gt;

&lt;p&gt;When an agent begins a session with an intent — say, "implement the new billing flow" — the context tool now runs a semantic search behind the scenes. Instead of returning the most recently logged entries, it returns the entries most relevant to &lt;em&gt;billing&lt;/em&gt;: the rate limit facts, the payment gateway decisions, the deployment skill for this service.&lt;/p&gt;

&lt;p&gt;The agent starts with the right context instead of the most recent context. In practice, that means fewer cases of an agent re-discovering something the team already knew, and fewer cases of contradicting a decision that was logged months ago.&lt;/p&gt;




&lt;p&gt;If you're already using ProjectBrain, your existing knowledge base is already indexed. The next agent session you run will pull in the most contextually relevant entries for whatever it's working on — not the most recent ones, the most relevant ones. You don't need to do anything.&lt;/p&gt;

&lt;p&gt;If you're not yet using ProjectBrain, &lt;a href="https://app.projectbrain.tools" rel="noopener noreferrer"&gt;get started here&lt;/a&gt;.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Memory Curation — Keeping the Knowledge Base Honest</title>
      <dc:creator>Li-Hsuan Lung</dc:creator>
      <pubDate>Sun, 22 Mar 2026 17:19:32 +0000</pubDate>
      <link>https://dev.to/lihsuanlung/memory-curation-keeping-the-knowledge-base-honest-29fh</link>
      <guid>https://dev.to/lihsuanlung/memory-curation-keeping-the-knowledge-base-honest-29fh</guid>
      <description>&lt;h2&gt;
  
  
  The idea I could never get my team to follow
&lt;/h2&gt;

&lt;p&gt;I have always loved the concept of Architecture Decision Records.&lt;/p&gt;

&lt;p&gt;The idea is simple: whenever your team makes a non-obvious technical decision, you write a short document. The decision, the context, the alternatives you considered, and why you chose what you chose. You commit it to the repository alongside the code. Future teammates can read it and understand not just what was built, but why.&lt;/p&gt;

&lt;p&gt;It is a great idea in theory. But I could never get anyone to actually do it consistently, including myself.&lt;/p&gt;

&lt;p&gt;When the decision is fresh in your head, writing it down feels like overhead. When you are under deadline pressure, the ADR file seems like the first thing to skip. By the time the decision feels worth documenting, you have forgotten half the context. And then a new engineer joins, or you revisit the codebase six months later, and you are left reading code with no memory of the reasoning behind it.&lt;/p&gt;

&lt;p&gt;ProjectBrain's knowledge base is, at its core, an attempt to make this idea stick — and to extend it beyond just architecture decisions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three types of knowledge
&lt;/h2&gt;

&lt;p&gt;ProjectBrain stores three types of knowledge entries:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Decisions&lt;/strong&gt; are the direct heir of the ADR concept. A decision captures a non-obvious choice, the rationale behind it, and the alternatives that were rejected. The rationale is the most valuable part — it is the part that disappears fastest from human memory and git history.&lt;/p&gt;

&lt;p&gt;Here are a few real decisions in our own project's knowledge base:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Adopted Gherkin-style (Given/When/Then) structure for static LLM prompts&lt;/strong&gt;&lt;br&gt;
Gherkin-style prompts (Given/When/Then) provide a more deterministic structure for LLMs, minimizing ambiguity by clearly separating persona (Given), triggers (When), and expected behavior (Then). Positive framing and scenario isolation have been established as best practices to improve LLM adherence to instructions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Memory curator v1 is non-destructive&lt;/strong&gt;&lt;br&gt;
The curator should generate recommendations (refresh, supersede, merge, archive) but should not auto-mutate memory records in v1. Explicit review/resolution preserves safety and auditability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;LLM semantic pass is the centerpiece of the memory curator&lt;/strong&gt;&lt;br&gt;
The rule-based pass (title normalization, staleness, supersession) acts only as a cheap pre-filter to surface candidates. The LLM pass is the primary signal: it scores semantic duplicate pairs, detects quality issues, and provides the reasoning needed to make recommendations actionable.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Facts&lt;/strong&gt; are verifiable truths about the project, environment, or system. They have a shorter half-life than decisions — configurations change, services get renamed, constraints shift. A fact that was true last quarter may be silently wrong today.&lt;/p&gt;

&lt;p&gt;A few real facts from our knowledge base:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Render render.yaml: dockerContext is relative to rootDir, not repo root&lt;/strong&gt;&lt;br&gt;
When a service has rootDir set, dockerContext and dockerfilePath are resolved relative to rootDir — not the repo root. Setting dockerContext: ./curator with rootDir: ./curator produces the path curator/curator (not found). The correct value is dockerContext: . when you want the rootDir itself as the build context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MCP responses support three modes: human, json, both&lt;/strong&gt;&lt;br&gt;
All tools accept a response_mode parameter ("human" | "json" | "both", default "human"). Human mode returns readable markdown. JSON mode returns a structured envelope: {ok, data, error, meta: {tool, response_mode, query?}}. Both mode returns human text followed by "---" and the JSON envelope. Validate response_mode early via _validate_response_mode() and return the error string if invalid. Always return a string — MCP protocol requirement.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Alembic env.py wraps all migrations in one transaction — a single failure rolls back all&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Skills&lt;/strong&gt; are reusable procedures — the "how we do this here" knowledge that never makes it into a README. Setup guides, debugging playbooks, deployment checklists. The kind of knowledge that lives in a senior engineer's head and gets re-explained to every new teammate.&lt;/p&gt;

&lt;p&gt;The combination creates something more complete than ADRs alone: a living record of what is true, what was decided and why, and how things are done.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy7mpf8458hbaqczw8zsc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy7mpf8458hbaqczw8zsc.png" alt="The Knowledge tab, showing a project's decisions and facts. Each entry includes a title and body. Agents and humans write to this store throughout their work sessions." width="800" height="793"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Agents write without friction
&lt;/h2&gt;

&lt;p&gt;The original ADR problem was that writing felt like a burden. ProjectBrain removes that friction almost entirely for agents — they log knowledge as a natural side effect of doing work. When an agent resolves a bug, it logs the root cause as a fact. When it makes an architectural choice, it logs a decision. When it figures out a deployment step, it logs a skill.&lt;/p&gt;

&lt;p&gt;Humans can do the same, and the UI makes it fast. But the real leverage is that agents do it continuously, in the background, without needing to be reminded.&lt;/p&gt;

&lt;p&gt;This solves the write problem. But it creates a different one.&lt;/p&gt;

&lt;h2&gt;
  
  
  The cost of stale memory
&lt;/h2&gt;

&lt;p&gt;When an AI agent reads from a knowledge base, it treats what it finds as ground truth. It does not apply skepticism the way a senior engineer would when stumbling on an old wiki page. It reasons from what it is given.&lt;/p&gt;

&lt;p&gt;That is fine when the knowledge base is accurate. It is a serious problem when it is not.&lt;/p&gt;

&lt;p&gt;Stale context compounds quietly. An agent reads an old fact about a database schema that was changed three months ago. It proceeds to write a migration against the wrong table structure. Another agent reads a superseded decision about an API design and implements a pattern the team moved away from weeks earlier. The work looks correct on the surface. The errors only surface in review — or in production.&lt;/p&gt;

&lt;p&gt;This is worse than no documentation. A missing fact causes the agent to ask a question or make an assumption it flags. A wrong fact causes it to proceed confidently in the wrong direction.&lt;/p&gt;

&lt;p&gt;The problem gets worse as the knowledge base grows. More entries means more signal to retrieve. But it also means more outdated entries that look authoritative. The noise is invisible. It is indistinguishable from good signal until something breaks.&lt;/p&gt;

&lt;h3&gt;
  
  
  How teams typically approach memory pruning
&lt;/h3&gt;

&lt;p&gt;This is not a new problem. A few patterns have emerged, each with real drawbacks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TTL-based expiry.&lt;/strong&gt; Give each entry a maximum age. Simple to implement, but crude. A fact about your CI environment might be stale in a week. A foundational architectural decision might be valid for five years. Fixed TTLs either over-prune or under-prune.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Supersession tracking.&lt;/strong&gt; New entries explicitly mark old ones as superseded. Clean and auditable, but it depends on the writer knowing what the new entry supersedes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Human-in-the-loop review.&lt;/strong&gt; Surface aged entries periodically and ask a human to confirm, update, or delete them. The most reliable method, but it does not scale. A team generating dozens of entries per week would spend all its time reviewing the queue.&lt;/p&gt;

&lt;p&gt;None of these works well in isolation. The honest answer is that memory pruning requires a mix of strategies — automatic signals to surface candidates, semantic analysis to catch what rules miss, and human judgment for the cases where confidence is uncertain.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the curator does
&lt;/h2&gt;

&lt;p&gt;ProjectBrain's curator is our current attempt at that mix. We are actively learning what works, and the approach will evolve as we get more data from real usage.&lt;/p&gt;

&lt;p&gt;The curator runs on a schedule, currently every 30 minutes, and on each pass it samples a window of recent knowledge entries, applies a rule-based filter as a cheap pre-pass, then sends candidates to an LLM for semantic analysis.&lt;/p&gt;

&lt;p&gt;The output is a queue of recommendations:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MERGE&lt;/strong&gt; — two entries that cover the same ground. Duplicates happen often: one agent logs a fact at the end of a session, another logs the same fact at the start of the next. Or two team members capture the same architectural decision independently after a long discussion.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;FLAG&lt;/strong&gt; — a single entry with a quality problem. A decision with no rationale. A skill with a title but no steps. A fact that references something that no longer exists.&lt;/p&gt;

&lt;p&gt;Humans or agents review the queue and act: accept a merge, edit or remove a flagged entry, or dismiss the recommendation if it was wrong.&lt;/p&gt;

&lt;p&gt;&lt;a href="/screenshots/memory-health-merge.png" class="article-body-image-wrapper"&gt;&lt;img src="/screenshots/memory-health-merge.png" alt="The Memory Health tab showing a queue of pending curation recommendations. Each card shows the entry type (fact / decision / skill), recommendation action (MERGE or FLAG), confidence percentage, and the affected entry title."&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For FLAG recommendations, the review card offers three actions: Delete, Edit, or Keep. The curator does not know whether a flagged entry should be deleted or just improved — it flags the problem and leaves the decision to the team.&lt;/p&gt;

&lt;h2&gt;
  
  
  The prompt
&lt;/h2&gt;

&lt;p&gt;The curator's LLM pass sends records as a JSON array and asks for a structured response. The full system prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight gherkin"&gt;&lt;code&gt;&lt;span class="nf"&gt;Given &lt;/span&gt;you are a knowledge base curator for a software project management tool
&lt;span class="nf"&gt;And &lt;/span&gt;you review knowledge records (facts, decisions, skills) written by human team members and AI agents
&lt;span class="nf"&gt;When &lt;/span&gt;you receive a JSON array of records containing id, entity_type, title, and body
&lt;span class="err"&gt;Then you must return a single JSON object with exactly two keys&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="err"&gt;"duplicates"&lt;/span&gt; &lt;span class="err"&gt;and&lt;/span&gt; &lt;span class="err"&gt;"quality_issues"&lt;/span&gt;
&lt;span class="nf"&gt;And &lt;/span&gt;you must output ONLY valid JSON without any markdown formatting or preamble

&lt;span class="kn"&gt;Scenario&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; Identifying duplicate records
&lt;span class="nf"&gt;Given &lt;/span&gt;two records have the same meaning but different wording
&lt;span class="nf"&gt;Then &lt;/span&gt;you must include them in the &lt;span class="s"&gt;"duplicates"&lt;/span&gt; array
&lt;span class="err"&gt;And each item must be formatted as&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="err"&gt;{&lt;/span&gt;
    &lt;span class="err"&gt;"entity_a_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="err"&gt;"...",&lt;/span&gt;
    &lt;span class="err"&gt;"entity_b_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="err"&gt;"...",&lt;/span&gt;
    &lt;span class="err"&gt;"confidence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="err"&gt;0.0–1.0,&lt;/span&gt;
    &lt;span class="err"&gt;"reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="err"&gt;"one&lt;/span&gt; &lt;span class="err"&gt;sentence&lt;/span&gt; &lt;span class="err"&gt;—&lt;/span&gt; &lt;span class="err"&gt;why&lt;/span&gt; &lt;span class="err"&gt;these&lt;/span&gt; &lt;span class="err"&gt;are&lt;/span&gt; &lt;span class="err"&gt;the&lt;/span&gt; &lt;span class="err"&gt;same",&lt;/span&gt;
    &lt;span class="err"&gt;"suggested_merged_body"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="err"&gt;"cle&lt;/span&gt;&lt;span class="nf"&gt;an &lt;/span&gt;merged content combining the best of both"
  &lt;span class="err"&gt;}&lt;/span&gt;
&lt;span class="nf"&gt;And &lt;/span&gt;you must only include pairs with confidence &amp;gt;= 0.75
&lt;span class="nf"&gt;And &lt;/span&gt;you must prefer false negatives over false positives
&lt;span class="nf"&gt;And &lt;/span&gt;you must treat two records on related but distinct topics as NOT duplicates

&lt;span class="kn"&gt;Scenario&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; Identifying quality issues
&lt;span class="err"&gt;Given a record has a genuine quality problem such as&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="err"&gt;-&lt;/span&gt; &lt;span class="err"&gt;Facts&lt;/span&gt; &lt;span class="err"&gt;with&lt;/span&gt; &lt;span class="err"&gt;no&lt;/span&gt; &lt;span class="err"&gt;body&lt;/span&gt; &lt;span class="err"&gt;and&lt;/span&gt; &lt;span class="nf"&gt;a &lt;/span&gt;title so vague it conveys nothing actionable
  &lt;span class="err"&gt;-&lt;/span&gt; &lt;span class="err"&gt;Decisions&lt;/span&gt; &lt;span class="err"&gt;with&lt;/span&gt; &lt;span class="err"&gt;no&lt;/span&gt; &lt;span class="err"&gt;rationale&lt;/span&gt; &lt;span class="err"&gt;(just&lt;/span&gt; &lt;span class="nf"&gt;a &lt;/span&gt;title, no explanation of why)
  &lt;span class="err"&gt;-&lt;/span&gt; &lt;span class="err"&gt;Skills&lt;/span&gt; &lt;span class="err"&gt;with&lt;/span&gt; &lt;span class="err"&gt;no&lt;/span&gt; &lt;span class="err"&gt;steps&lt;/span&gt; &lt;span class="err"&gt;or&lt;/span&gt; &lt;span class="err"&gt;procedure&lt;/span&gt; &lt;span class="err"&gt;(title&lt;/span&gt; &lt;span class="err"&gt;only,&lt;/span&gt; &lt;span class="err"&gt;or&lt;/span&gt; &lt;span class="err"&gt;body&lt;/span&gt; &lt;span class="err"&gt;is&lt;/span&gt; &lt;span class="nf"&gt;a &lt;/span&gt;single vague sentence)
&lt;span class="nf"&gt;Then &lt;/span&gt;you must flag it in the &lt;span class="s"&gt;"quality_issues"&lt;/span&gt; array
&lt;span class="err"&gt;And each item must be formatted as&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="err"&gt;{&lt;/span&gt;
    &lt;span class="err"&gt;"entity_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="err"&gt;"...",&lt;/span&gt;
    &lt;span class="err"&gt;"severity"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="err"&gt;"low"&lt;/span&gt; &lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="nv"&gt;"medium"&lt;/span&gt; &lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="nv"&gt;"high",&lt;/span&gt;
    &lt;span class="n"&gt;"issue":&lt;/span&gt; &lt;span class="n"&gt;"one&lt;/span&gt; &lt;span class="n"&gt;sentence&lt;/span&gt; &lt;span class="n"&gt;—&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;specific&lt;/span&gt; &lt;span class="n"&gt;problem"&lt;/span&gt;
  &lt;span class="err"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A couple of design choices worth noting:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prefer false negatives over false positives on duplicates.&lt;/strong&gt; A missed duplicate is low cost — you can find it on the next pass. A false positive that merges two distinct entries destroys information and erodes trust.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Suggest the merged body.&lt;/strong&gt; For duplicate pairs, the model proposes a merged version combining the best of both entries. This gives the reviewer something to work from rather than asking them to write a new entry from scratch.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tuning curation behavior
&lt;/h2&gt;

&lt;p&gt;The curator's behavior is configurable per project. You can set a confidence threshold — recommendations below it are suppressed entirely — and a freshness window to flag entries that have not been reviewed within a certain period.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv9v225rh5k01ovl534k3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv9v225rh5k01ovl534k3.png" alt="The Memory Curation settings panel, showing the confidence threshold slider and the freshness window input for controlling how aggressively the curator flags entries." width="800" height="793"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Both settings address the same underlying tradeoff: how much noise you are willing to see in exchange for catching more real problems. A team with a high-churn knowledge base might lower the threshold and accept more false positives. A smaller, stable team might raise it to keep the queue quiet.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the curator is not
&lt;/h2&gt;

&lt;p&gt;The curator does not auto-apply changes. It does not delete entries or rewrite them without review. All recommendations require confirmation.&lt;/p&gt;

&lt;p&gt;We considered auto-merging obvious duplicates, but the false positive cost is too high. Two entries that look nearly identical might cover different contexts. The review step is fast and the downside of getting it wrong is not.&lt;/p&gt;

&lt;p&gt;The curator stays in a supporting role. It surfaces candidates. The team decides.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;ADRs work when teams follow them. The hard part has never been the format. It has been the habit.&lt;/p&gt;

&lt;p&gt;What ProjectBrain tries to do is make that habit automatic: log continuously as a side effect of work, and let a background process handle the maintenance. The knowledge base stays roughly honest without requiring anyone to remember to tend it.&lt;/p&gt;

&lt;p&gt;We are still figuring out the right balance — what to flag, how aggressively to deduplicate, when to trust the LLM's judgment and when to be more conservative. If you are building systems where agents produce persistent memory, the write problem is easy. Plan for curation from the start, and expect to keep tuning it.&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>productivity</category>
      <category>softwareengineering</category>
      <category>writing</category>
    </item>
    <item>
      <title>A Workflow Engine That Coordinates Work and Makes It Visible</title>
      <dc:creator>Li-Hsuan Lung</dc:creator>
      <pubDate>Wed, 18 Mar 2026 13:00:00 +0000</pubDate>
      <link>https://dev.to/lihsuanlung/a-workflow-engine-that-coordinates-work-and-makes-it-visible-3amg</link>
      <guid>https://dev.to/lihsuanlung/a-workflow-engine-that-coordinates-work-and-makes-it-visible-3amg</guid>
      <description>&lt;p&gt;&lt;strong&gt;"The future belongs to artificial intelligence."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Ke Jie said this around his 2017 AlphaGo match. He was the world’s top-ranked Go player, and AlphaGo still swept the series 3-0 (games on May 23, May 25, and May 27, 2017), with Ke Jie visibly emotional after the final game.&lt;/p&gt;

&lt;p&gt;That story matters to me because I think one day AI may be much closer to solving software development than we ever expected. For a long time, top-level Go was treated as an especially hard frontier where human intuition would dominate for much longer. Then, suddenly, the gap closed fast. I want to treat software development with the same humility and learn from what the systems actually do, not from old assumptions.&lt;/p&gt;

&lt;p&gt;That is why this post is about workflow design and visibility.&lt;/p&gt;

&lt;p&gt;When people talk about agent workflows, they usually mean one thing: moving tasks from one stage to the next.&lt;/p&gt;

&lt;p&gt;In Project Brain, we are building the workflow engine around two goals at the same time: coordinate work reliably, and make agent behavior visible and explainable.&lt;/p&gt;

&lt;p&gt;If a task moved from "in progress" to "in review," we should know &lt;em&gt;how&lt;/em&gt; it moved, &lt;em&gt;why&lt;/em&gt; it moved, and &lt;em&gt;what assumptions&lt;/em&gt; were used during that handoff.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is interesting about our workflow engine
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Workflow is a real system object, not prompt text
&lt;/h3&gt;

&lt;p&gt;Our workflow is modeled directly in the platform as stages, statuses, and stage policies. That means teams can edit process behavior in the product itself, instead of hiding process rules inside long prompts.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stage policy makes behavior explicit
&lt;/h3&gt;

&lt;p&gt;Each stage can define what should happen after successful work: advance and delegate, advance only, terminal completion, or (optionally) reject work back to an earlier stage. In plain terms, we do not hardcode every route in the agent runtime. We store routing intent as workflow policy and execute against it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Visibility is designed in, not added later
&lt;/h3&gt;

&lt;p&gt;We attach structured metadata to handoffs and status transitions so task history is reconstructable. The focus is not just "current status." The focus is also "execution trace."&lt;/p&gt;

&lt;p&gt;That trace is what helps teams improve prompts, policies, and role boundaries over time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real examples from team chat (cross-stage communication)
&lt;/h2&gt;

&lt;p&gt;These are real excerpts from Project Brain team messages, showing planner/implementer/reviewer flow. The screenshot thread captures a full loop: blocker report, fix handoff, and approval.&lt;/p&gt;

&lt;p&gt;In the screenshots, the reviewer first moved the task back to in-progress with specific blockers:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiaoczknq8b4nhrresimf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiaoczknq8b4nhrresimf.png" alt="Reviewer feedback listing concrete blockers before re-review." width="800" height="286"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The implementer then replied with a concrete fix list and commit hash:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0rmd012lj8viehy6os2p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0rmd012lj8viehy6os2p.png" alt="Implementer response with commit hash and explicit fixes." width="800" height="274"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Finally, the reviewer confirmed re-review and test outcome:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fax9ud5t7xufx2f4rb2f2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fax9ud5t7xufx2f4rb2f2.png" alt="Re-review approval confirming blockers are fixed and tests pass." width="800" height="169"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is exactly the visibility model we want: not just final status, but the full reasoning chain from failure to resolution.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this framing matters
&lt;/h2&gt;

&lt;p&gt;Agents are not programmed in the traditional deterministic sense. You do not write a fixed function and always get the same output. What you can do is influence behavior through constraints, context, and feedback loops. That is why workflow management is so interesting to me: it is a way to shape behavior reliably even when outputs are probabilistic.&lt;/p&gt;

&lt;p&gt;If AI systems can improve through iterative play and feedback loops, then our job is to build environments where those loops are observable, testable, and improvable, instead of hidden.&lt;/p&gt;

&lt;p&gt;That is exactly why we designed this workflow engine around both coordination and visibility.&lt;/p&gt;

&lt;h2&gt;
  
  
  If you are not using Project Brain: how to apply this anyway
&lt;/h2&gt;

&lt;p&gt;You can apply the same workflow principles in any stack (Jira + Slack, Linear + GitHub, custom tools, etc.).&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Define stage outcomes explicitly.&lt;br&gt;&lt;br&gt;
For each stage, write what "done" means and what should happen next (advance, delegate, reject, or stop).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use machine-checkable transition guards.&lt;br&gt;&lt;br&gt;
Require expected state/version fields on status changes so race conditions become explicit conflicts instead of silent corruption.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Standardize handoff metadata.&lt;br&gt;&lt;br&gt;
At minimum: task ID, from-stage, to-stage, actor, and reason for handoff.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Treat review feedback as structured data.&lt;br&gt;&lt;br&gt;
Capture blocker reason, fix commit, verification command, and verification result in one thread.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Optimize for replayability.&lt;br&gt;&lt;br&gt;
A new person (or agent) should be able to read a thread and answer: What happened? Why? What changed? Is it verified?&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you can do those five things consistently, you will get most of the value of workflow orchestration plus visibility, even outside Project Brain.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
      <category>softwaredevelopment</category>
      <category>systemdesign</category>
    </item>
    <item>
      <title>MCP Design in the Real World</title>
      <dc:creator>Li-Hsuan Lung</dc:creator>
      <pubDate>Sun, 15 Mar 2026 01:13:45 +0000</pubDate>
      <link>https://dev.to/lihsuanlung/mcp-design-in-the-real-world-3446</link>
      <guid>https://dev.to/lihsuanlung/mcp-design-in-the-real-world-3446</guid>
      <description>&lt;p&gt;When we started Project Brain's MCP server, we followed a common pattern: every new need got a new tool. It felt productive at first. But after a while, the tool menu became crowded, the rules around each tool grew, and the agent got slower at choosing what to call.&lt;/p&gt;

&lt;p&gt;That led to more retries, higher token usage, and slower progress on simple tasks. The big lesson was straightforward: after a certain point, adding more tools hurts more than it helps.&lt;/p&gt;

&lt;p&gt;In this post, I’ll share what changed for us and what we learned from running this in production.&lt;/p&gt;

&lt;h2&gt;
  
  
  Motivation: why reduce the number of tools?
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Tool selection gets harder for the model
&lt;/h3&gt;

&lt;p&gt;Every new tool is another decision branch. Instead of focusing on the task, the model spends effort deciding which tool is "most correct," whether parameters are supported, and whether an older tool has been replaced by a newer one. Those extra decisions show up as wrong calls and wasted turns.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Context gets bloated
&lt;/h3&gt;

&lt;p&gt;Each tool adds descriptions, argument rules, and examples. That all has to fit into model context. As context grows, signal gets diluted. Even a strong model performs worse when it has to sift through too many similar options.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Day-to-day operations get heavier
&lt;/h3&gt;

&lt;p&gt;Tool growth also creates maintenance overhead: more permission paths to secure, more old behavior to support, more telemetry to monitor, and more docs to keep aligned with reality.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real examples from Project Brain
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. We moved to a five-tool interface
&lt;/h3&gt;

&lt;p&gt;Instead of exposing many narrow tools, we grouped the public API into five domain entrypoints: &lt;code&gt;projects(...)&lt;/code&gt;, &lt;code&gt;context(...)&lt;/code&gt;, &lt;code&gt;tasks(...)&lt;/code&gt;, &lt;code&gt;knowledge(...)&lt;/code&gt;, and &lt;code&gt;collaboration(...)&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This made the system easier to reason about. The agent picks the right domain first, then chooses an action inside that domain. Fewer top-level choices meant less routing confusion.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. We added per-turn shortlist routing
&lt;/h3&gt;

&lt;p&gt;We introduced &lt;code&gt;context(action="shortlist", q, limit, full_tool_mode)&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Think of it as: "for this user request, show me the best few tool-actions first." For example, if a user asks about milestone planning, shortlist pushes milestone-related operations to the top instead of making the model scan the full catalog every time.&lt;/p&gt;

&lt;p&gt;This improved first-call accuracy and reduced unnecessary context.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. We made responses predictable for machines
&lt;/h3&gt;

&lt;p&gt;For key reads, we added &lt;code&gt;response_mode&lt;/code&gt; (&lt;code&gt;human | json | both&lt;/code&gt;). In JSON mode, the output always follows the same envelope: &lt;code&gt;ok&lt;/code&gt;, &lt;code&gt;data&lt;/code&gt;, &lt;code&gt;meta&lt;/code&gt;, &lt;code&gt;error&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;That simple consistency removed a lot of brittle text parsing and made automation much more reliable.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. We expanded one task listing tool instead of adding many search tools
&lt;/h3&gt;

&lt;p&gt;Rather than creating separate tools for each search style, we added richer filters to task listing with &lt;code&gt;q_any&lt;/code&gt;, &lt;code&gt;q_all&lt;/code&gt;, and &lt;code&gt;q_not&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;In plain language, one call can now express "must include these terms, can include these terms, and exclude these terms." We got more power without growing top-level tool count.&lt;/p&gt;

&lt;h2&gt;
  
  
  Challenges you should expect in MCP design
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Knowing which attributes are actually available
&lt;/h3&gt;

&lt;p&gt;If input rules are vague, agents guess. In our case, task queries used to fail or return partial results because fields looked plausible but were not actually supported.&lt;/p&gt;

&lt;p&gt;The fix was to make contracts explicit: clear filter fields, explicit response mode behavior, and stable output shape. Agents should not need to guess what inputs are valid or what outputs will look like.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Public listing behind auth token execution
&lt;/h3&gt;

&lt;p&gt;Discovery and execution are different security concerns. During MCP directory integration, we wanted clients to discover capabilities quickly, but we still needed strict auth for real data access.&lt;/p&gt;

&lt;p&gt;So we separated the two concerns in middleware: process auth headers and token validation for protected calls, while allowing a small public allowlist for low-risk discovery endpoints. In practice: show the menu publicly, lock the kitchen.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Backward compatibility and sprawl
&lt;/h3&gt;

&lt;p&gt;Every new top-level tool creates long-term support cost. Our early habit of adding tiny tools for each new query made routing and maintenance harder over time.&lt;/p&gt;

&lt;p&gt;A better pattern was to keep the top-level interface stable and grow capability inside existing domains via actions and parameters. Internal complexity can grow in code modules without forcing public API sprawl.&lt;/p&gt;

&lt;h2&gt;
  
  
  MCP design best practices
&lt;/h2&gt;

&lt;p&gt;Keep the public tool surface small and stable. Route with shortlist when possible. Prefer extending existing tools over creating new top-level ones. Keep discovery and execution security boundaries separate. Make input and output contracts explicit. Return predictable machine-readable shapes. Prune low-value tools regularly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;Good MCP design is not about exposing everything. It is about exposing the right minimum, clearly.&lt;/p&gt;

&lt;p&gt;If your agents feel flaky, reducing and clarifying your tool surface area is often the fastest way to improve reliability.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>mcp</category>
      <category>performance</category>
      <category>systemdesign</category>
    </item>
    <item>
      <title>The side projects that accidentally started Project Brain</title>
      <dc:creator>Li-Hsuan Lung</dc:creator>
      <pubDate>Fri, 13 Mar 2026 05:56:31 +0000</pubDate>
      <link>https://dev.to/lihsuanlung/i-kept-losing-context-with-coding-agents-so-i-built-project-brain-1703</link>
      <guid>https://dev.to/lihsuanlung/i-kept-losing-context-with-coding-agents-so-i-built-project-brain-1703</guid>
      <description>&lt;p&gt;This started with two weekend projects: an AI-powered text adventure experiment and a 3D-printed shelf for my daughter's growing Tomica collection after our trip to Japan. Both sounded fun. Both turned into the same context-management problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pattern I kept hitting
&lt;/h2&gt;

&lt;p&gt;For each new feature, my agent would create more markdown files: TODO lists, feature notes, architecture plans, README updates, and more. At first it felt productive. Then it became its own maintenance project.&lt;/p&gt;

&lt;p&gt;Docs drifted from reality faster than I expected. I would see a TODO marked &lt;code&gt;in_progress&lt;/code&gt; even though the implementation had already shipped.&lt;/p&gt;

&lt;p&gt;When a session crashed, context crashed with it. The next run had no idea where things left off, so momentum disappeared right when I wanted to keep building.&lt;/p&gt;

&lt;p&gt;Switching tools meant retraining from scratch every time. Different interface, same project, full re-brief.&lt;/p&gt;

&lt;p&gt;That was the moment it clicked: the problem was not model quality. The problem was memory architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  Yes, we are dogfooding Project Brain to build Project Brain
&lt;/h2&gt;

&lt;p&gt;We now run our own development through Project Brain. It is extremely meta and only slightly suspicious.&lt;/p&gt;

&lt;p&gt;I can swap models and tools without losing momentum because the project state lives in one place, not in whichever chat window happened to be open.&lt;/p&gt;

&lt;p&gt;Prompt I use when switching models:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Use Project Brain as your source of truth
1) context(action="session", project_id)
2) tasks(action="context", task_id)
3) tasks(action="list", project_id, status="in_progress")
4) knowledge(entity="fact", action="list", project_id)
5) knowledge(entity="decision", action="list", project_id)

Then summarize:
- current goal
- active tasks
- important constraints/facts
- key decisions and rationale
- immediate next steps
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Skills became a force multiplier
&lt;/h2&gt;

&lt;p&gt;One of my favorite side effects has been watching reusable workflows turn into skills. Instead of repeating instructions in every task, we can publish them once and have any agent follow them.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;API key auth for service accounts (FastAPI)&lt;/li&gt;
&lt;li&gt;Implement GitHub OAuth SSO in FastAPI + React SPA&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Following along got dramatically easier
&lt;/h2&gt;

&lt;p&gt;Facts and decisions gave me a clean trail of what changed and why. Instead of diffing stale markdown docs, I can see durable constraints and rationale in structured records.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyc7kwwcm826hpnw5y9c8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyc7kwwcm826hpnw5y9c8.png" alt="Real project facts and decisions excerpts from Project Brain" width="800" height="325"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Team chat made delegation less chaotic
&lt;/h2&gt;

&lt;p&gt;The team chat flow helps agents delegate with context instead of vague instructions. That means fewer handoff bugs and less "wait, what are we doing again?" between planner and implementer agents.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7dxw9996am87fp2taz06.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7dxw9996am87fp2taz06.png" alt="Delegation thread example with explicit task IDs, rationale, and expected outcome." width="800" height="538"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I keep saying "we"
&lt;/h2&gt;

&lt;p&gt;I say "we" because the agent has genuinely been a partner in building this product. I regularly ask what would make agents more efficient, what creates friction during execution, and what tooling is missing. Those conversations have directly shaped the roadmap and produced ideas I probably would not have reached alone.&lt;/p&gt;




&lt;p&gt;If your agents keep rewriting context and you keep re-explaining the same project, Project Brain was built for that exact pain.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>devtool</category>
      <category>buildinpublic</category>
    </item>
  </channel>
</rss>
