<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: tony tong</title>
    <description>The latest articles on DEV Community by tony tong (@tony_tong_66328b5fb40ae64).</description>
    <link>https://dev.to/tony_tong_66328b5fb40ae64</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3979791%2Fe398c43b-617e-481e-9ae4-433c879f051d.png</url>
      <title>DEV Community: tony tong</title>
      <link>https://dev.to/tony_tong_66328b5fb40ae64</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/tony_tong_66328b5fb40ae64"/>
    <language>en</language>
    <item>
      <title>From a Git clone to a working MCP server: a 30-minute Codex walkthrough</title>
      <dc:creator>tony tong</dc:creator>
      <pubDate>Thu, 11 Jun 2026 16:09:22 +0000</pubDate>
      <link>https://dev.to/tony_tong_66328b5fb40ae64/from-a-git-clone-to-a-working-mcp-server-a-30-minute-codex-walkthrough-5264</link>
      <guid>https://dev.to/tony_tong_66328b5fb40ae64/from-a-git-clone-to-a-working-mcp-server-a-30-minute-codex-walkthrough-5264</guid>
      <description>&lt;p&gt;Most MCP tutorials assume you're starting from scratch. In reality, you usually have a working tool or library and just want to expose it as a callable tool to an LLM agent. Here's the path I take that gets it done in 30 minutes of real work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Pick one tool, not five
&lt;/h2&gt;

&lt;p&gt;Don't expose your whole API surface. Pick the &lt;strong&gt;one&lt;/strong&gt; function a coding agent would actually call. For a docs site, that's &lt;code&gt;search_docs&lt;/code&gt;. For a database, that's &lt;code&gt;run_query&lt;/code&gt;. For an internal service, that's &lt;code&gt;lookup_user&lt;/code&gt;. One tool, clear input schema, real value.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Write the tool description like a docstring
&lt;/h2&gt;

&lt;p&gt;The model will only call the tool well if the description is sharp. I write three sentences:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;What it does&lt;/strong&gt; (verb + noun)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;When to use it&lt;/strong&gt; (the user signal that should trigger it)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What it returns&lt;/strong&gt; (shape, not values)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;search_docs(query, top_k=5)&lt;/code&gt;: Search the company docs index. Use when the user asks a factual question about internal systems, processes, or past decisions. Returns a list of &lt;code&gt;{title, url, snippet}&lt;/code&gt; sorted by relevance.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That's the whole tool spec. No ambiguity, no prose.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Run Codex locally first
&lt;/h2&gt;

&lt;p&gt;OpenAI Codex CLI is the fastest way to validate the tool works end-to-end:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;codex &lt;span class="nt"&gt;--approval-mode&lt;/span&gt; suggest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Drop into a sandboxed directory, ask Codex to use the tool. If it picks the tool, calls it correctly, and uses the result in its answer, you're done. If it doesn't, the description is bad — go back to step 2.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Wire it as an MCP server
&lt;/h2&gt;

&lt;p&gt;Now wrap the tool in a real MCP server. The minimum is a &lt;code&gt;FastMCP&lt;/code&gt; instance with one &lt;code&gt;@mcp.tool()&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;mcp.server.fastmcp&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastMCP&lt;/span&gt;
&lt;span class="n"&gt;mcp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastMCP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;internal-docs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@mcp.tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Search the company docs index...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_docs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;docs_index&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;mcp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;transport&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stdio&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add it to your client's MCP config (Claude Desktop, Cursor, etc.). Restart, ask the same question. If it works in both Codex CLI and the host client, you have a real MCP server.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5: Evals, not vibes
&lt;/h2&gt;

&lt;p&gt;The last 10 minutes is an eval. Three questions your tool should answer correctly, three it should refuse to answer. If you can't list them, your tool isn't done — it's just running.&lt;/p&gt;

&lt;h2&gt;
  
  
  A small diagram
&lt;/h2&gt;

&lt;p&gt;Here's the loop I end up with. Tools, model, results, repeat.&lt;/p&gt;

&lt;p&gt;The boring truth is that the &lt;strong&gt;description&lt;/strong&gt; is 80% of the work. Once that's right, the wiring is half an hour.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffrm7vpy1mmcrzmiqs4tn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffrm7vpy1mmcrzmiqs4tn.png" alt=" " width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>tutorial</category>
    </item>
    <item>
      <title>Why RAG without context judgment is just a fancier grep</title>
      <dc:creator>tony tong</dc:creator>
      <pubDate>Thu, 11 Jun 2026 16:07:58 +0000</pubDate>
      <link>https://dev.to/tony_tong_66328b5fb40ae64/why-rag-without-context-judgment-is-just-a-fancier-grep-247j</link>
      <guid>https://dev.to/tony_tong_66328b5fb40ae64/why-rag-without-context-judgment-is-just-a-fancier-grep-247j</guid>
      <description>&lt;p&gt;I keep seeing teams ship a "RAG system" that's really a vector database with a thin wrapper. They measure recall@10, ship to production, and then wonder why the model hallucinates on documents the retriever clearly found.&lt;/p&gt;

&lt;p&gt;The retriever is doing its job. The model is doing its job. What's missing is the &lt;em&gt;context judgment&lt;/em&gt; layer in between.&lt;/p&gt;

&lt;h2&gt;
  
  
  Retrieval ≠ selection
&lt;/h2&gt;

&lt;p&gt;Most RAG tutorials stop at "embed the docs, embed the query, cosine, top-k". But cosine is a relevance proxy, not a usefulness proxy. A chunk can be semantically similar to the query and still actively mislead the model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A pricing table from 2019 that contradicts the 2024 version&lt;/li&gt;
&lt;li&gt;A code snippet that solves a similar problem in a different language&lt;/li&gt;
&lt;li&gt;A legal disclaimer that looks like a substantive answer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A naive top-k return will mix these in. The model, trained to be helpful, will dutifully stitch them together.&lt;/p&gt;

&lt;h2&gt;
  
  
  What "context judgment" looks like
&lt;/h2&gt;

&lt;p&gt;The simplest version is a re-ranker: take top-50 from the retriever, score each chunk for &lt;em&gt;answerability&lt;/em&gt; (does this chunk actually contain the answer, or just the topic?), keep top-5. A cross-encoder does this well, costs ~50ms per chunk on a small model, and usually lifts answer quality 10-20% on my evals.&lt;/p&gt;

&lt;p&gt;The harder version is a &lt;em&gt;judge&lt;/em&gt; that filters out chunks the model is likely to misuse. Things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Recency checks: drop chunks that pre-date the user's "current" frame&lt;/li&gt;
&lt;li&gt;Source authority: prefer internal docs over scraped blog posts&lt;/li&gt;
&lt;li&gt;Conflict detection: if two chunks disagree, surface the conflict instead of averaging it&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Where it falls apart
&lt;/h2&gt;

&lt;p&gt;The trap is doing this in isolation. If the judge is just another LLM, you've moved the problem one step back. The judge also hallucinates, also misreads, also has its own blind spots. The honest framing is: &lt;em&gt;the judgment layer is where the product lives&lt;/em&gt;. A vector DB is a 5-line integration. A trustworthy RAG system is months of work on the judgment layer.&lt;/p&gt;

&lt;p&gt;That's the part most RAG marketing glosses over.&lt;/p&gt;

&lt;h2&gt;
  
  
  A small checklist
&lt;/h2&gt;

&lt;p&gt;When I'm reviewing a team's RAG pipeline, these are the questions that catch the most issues:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;What does your retriever's top-k &lt;em&gt;look&lt;/em&gt; like, not just its top-1 score? Manually skim 20.&lt;/li&gt;
&lt;li&gt;Is the model told which chunks it should &lt;em&gt;prefer&lt;/em&gt; to use, and which it should &lt;em&gt;ignore&lt;/em&gt;?&lt;/li&gt;
&lt;li&gt;Do you have an eval set that includes &lt;strong&gt;conflicting sources&lt;/strong&gt;?&lt;/li&gt;
&lt;li&gt;When two chunks disagree, does the system surface the conflict or pick one silently?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you can't answer these, your RAG system is a demo, not a product.&lt;/p&gt;

</description>
      <category>productivity</category>
    </item>
  </channel>
</rss>
