<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Naveen Ayalla</title>
    <description>The latest articles on DEV Community by Naveen Ayalla (@naveenayalla1cs50).</description>
    <link>https://dev.to/naveenayalla1cs50</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3961746%2Fa85c1ba3-c53c-484a-ad47-a9ab657d84b3.png</url>
      <title>DEV Community: Naveen Ayalla</title>
      <link>https://dev.to/naveenayalla1cs50</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/naveenayalla1cs50"/>
    <language>en</language>
    <item>
      <title>I Got Tired of Claude Code Guessing Wrong, So I Built an MCP Toolkit</title>
      <dc:creator>Naveen Ayalla</dc:creator>
      <pubDate>Tue, 09 Jun 2026 00:29:07 +0000</pubDate>
      <link>https://dev.to/naveenayalla1cs50/i-got-tired-of-claude-code-guessing-wrong-so-i-built-an-mcp-toolkit-49bf</link>
      <guid>https://dev.to/naveenayalla1cs50/i-got-tired-of-claude-code-guessing-wrong-so-i-built-an-mcp-toolkit-49bf</guid>
      <description>&lt;p&gt;AI coding agents are useful, but they still have one frustrating habit:&lt;/p&gt;

&lt;p&gt;They guess.&lt;/p&gt;

&lt;p&gt;You ask something reasonable like:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Where do we validate user input before inserting into the database?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And instead of knowing where to look, the agent starts reading files one by one.&lt;/p&gt;

&lt;p&gt;In a small project, that is fine.&lt;/p&gt;

&lt;p&gt;In a real production codebase with 80,000+ lines, multiple engineers, old decisions, half-renamed folders, and years of accumulated context, this gets messy fast.&lt;/p&gt;

&lt;p&gt;The agent reads a handful of files, hits context limits, and gives you an answer that sounds confident but points to the wrong part of the codebase.&lt;/p&gt;

&lt;p&gt;I got tired of that, so I built an open-source MCP toolkit to fix it.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;I built &lt;strong&gt;MCP Server Toolkit&lt;/strong&gt;, a collection of four Model Context Protocol servers that give AI coding agents direct access to the things they need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your codebase&lt;/li&gt;
&lt;li&gt;Your database&lt;/li&gt;
&lt;li&gt;Your docs&lt;/li&gt;
&lt;li&gt;Your git history&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Repo:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/naveenayalla1-CS50/mcp-server-toolkit" rel="noopener noreferrer"&gt;https://github.com/naveenayalla1-CS50/mcp-server-toolkit&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The goal is simple:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Stop making the agent guess. Give it tools that know where to look.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Why MCP?
&lt;/h2&gt;

&lt;p&gt;The Model Context Protocol, or MCP, lets AI agents call external tools in a standardized way.&lt;/p&gt;

&lt;p&gt;Instead of the agent reading random files and hoping the right context fits, it can call a purpose-built tool like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;search_code("validate user input")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And get back file paths, line numbers, and relevant context.&lt;/p&gt;

&lt;p&gt;That means fewer wrong guesses, fewer wasted tokens, and much better answers in large codebases.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Four Servers
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. &lt;code&gt;mcp-code-search&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Searches across your repo and returns relevant matches with file paths, line numbers, and surrounding context.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You: Find all places where we call sendEmail

Agent calls search_code("sendEmail")

Results:
api/users.ts:89
services/email.ts:42
jobs/reminders.ts:117
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It also includes targeted &lt;code&gt;read_file&lt;/code&gt; and &lt;code&gt;list_files&lt;/code&gt; tools so the agent can inspect only the files it actually needs.&lt;/p&gt;




&lt;h3&gt;
  
  
  2. &lt;code&gt;mcp-database&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Lets the agent ask read-only database questions in natural language.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You: How many users signed up in the last 7 days?

Agent runs:

SELECT count(*) FROM users
WHERE created_at &amp;gt; now() - interval '7 days';
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It supports Postgres and SQLite.&lt;/p&gt;

&lt;p&gt;The database server is &lt;strong&gt;read-only by default&lt;/strong&gt;. You have to explicitly enable writable mode if you want writes.&lt;/p&gt;

&lt;p&gt;That default matters. I did not want an agent anywhere near production data with write permissions unless the developer intentionally allowed it.&lt;/p&gt;




&lt;h3&gt;
  
  
  3. &lt;code&gt;mcp-docs&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Indexes a folder of Markdown docs with no embedding setup and no external API.&lt;/p&gt;

&lt;p&gt;You can point it at internal docs, runbooks, API references, or project notes.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You: What does our runbook say about rolling back a deployment?

Agent calls search_docs("rollback deployment")

Result:
docs/ops/deploy.md:47
"To rollback: run ./scripts/rollback.sh &amp;lt;version&amp;gt;..."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It works locally and does not send your docs anywhere.&lt;/p&gt;




&lt;h3&gt;
  
  
  4. &lt;code&gt;mcp-git&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Lets the agent query git history, diffs, blame, and branches.&lt;/p&gt;

&lt;p&gt;This is useful when the agent needs to understand not just what the code does, but why it changed.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You: Why was this validation added?

Agent checks git blame and recent commits for that file.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Install
&lt;/h2&gt;

&lt;p&gt;The fastest way to get started is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx mcp-server-toolkit@latest init
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That launches an interactive setup.&lt;/p&gt;

&lt;p&gt;Pick the servers you want, provide &lt;code&gt;DATABASE_URL&lt;/code&gt; if you are using the database server, and it generates the config for Claude Code, Cursor, Windsurf, or any other MCP-compatible client.&lt;/p&gt;

&lt;p&gt;Manual Claude config example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"servers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"code-search"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"-y"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"@mcp-toolkit/code-search"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"."&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"database"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"-y"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"@mcp-toolkit/database"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"--read-only"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"env"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"DATABASE_URL"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"${DATABASE_URL}"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"docs"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"-y"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"@mcp-toolkit/docs"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"./docs"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"git"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"-y"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"@mcp-toolkit/git"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"."&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Restart your MCP-compatible client and the tools are available.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why I Made It
&lt;/h2&gt;

&lt;p&gt;This project came from a very specific frustration.&lt;/p&gt;

&lt;p&gt;AI coding agents are getting better, but they still struggle when the answer is buried inside a large repo.&lt;/p&gt;

&lt;p&gt;The problem is not always reasoning.&lt;/p&gt;

&lt;p&gt;A lot of the time, the problem is retrieval.&lt;/p&gt;

&lt;p&gt;The agent simply does not know where to look.&lt;/p&gt;

&lt;p&gt;MCP servers help solve that by giving the agent focused tools:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Instead of:
Read random files → guess → maybe answer correctly

You get:
Search with the right tool → inspect relevant result → answer with context
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is a much better workflow for real codebases.&lt;/p&gt;




&lt;h2&gt;
  
  
  Building Your Own Server
&lt;/h2&gt;

&lt;p&gt;I also added &lt;code&gt;@mcp-toolkit/core&lt;/code&gt; to make building new MCP servers easier.&lt;/p&gt;

&lt;p&gt;A simple server looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;createServer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@mcp-toolkit/core&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;server&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createServer&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;my-server&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;1.0.0&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;server&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addTool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nf"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;get_feature_flags&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Get all active feature flags for an environment&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;describe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Environment name: staging or production&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;}),&lt;/span&gt;
    &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;env&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;flags&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetchFlags&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;flags&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nx"&gt;server&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can scaffold a new server with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm run new-server &lt;span class="nt"&gt;--&lt;/span&gt; my-server-name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;A few things stood out while building this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The MCP TypeScript SDK is solid.&lt;/strong&gt;&lt;br&gt;
Most of my time went into the actual tool logic, not the protocol plumbing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Read-only defaults matter.&lt;/strong&gt;&lt;br&gt;
Especially for database access. Agents should not get write permissions by accident.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Zod works really well for tool input validation.&lt;/strong&gt;&lt;br&gt;
When the agent passes the wrong input shape, the error is usually clear enough that it can self-correct.&lt;/p&gt;


&lt;h2&gt;
  
  
  Roadmap
&lt;/h2&gt;

&lt;p&gt;The current version includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Code search&lt;/li&gt;
&lt;li&gt;Database&lt;/li&gt;
&lt;li&gt;Docs&lt;/li&gt;
&lt;li&gt;Git&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Next, I am thinking about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Notion&lt;/li&gt;
&lt;li&gt;Linear/Jira&lt;/li&gt;
&lt;li&gt;A small web UI for registered tools&lt;/li&gt;
&lt;li&gt;More examples for custom MCP servers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Contributions are welcome.&lt;/p&gt;


&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;p&gt;GitHub:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/naveenayalla1-CS50/mcp-server-toolkit" rel="noopener noreferrer"&gt;https://github.com/naveenayalla1-CS50/mcp-server-toolkit&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Install:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx mcp-server-toolkit@latest init
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you try it, I would love feedback, issues, or PRs.&lt;/p&gt;

&lt;p&gt;Especially if you are using Claude Code, Cursor, Windsurf, or another MCP-compatible coding agent on a large repo.&lt;/p&gt;

</description>
      <category>github</category>
      <category>node</category>
      <category>javascript</category>
      <category>ai</category>
    </item>
    <item>
      <title># Moving RAG From Demo to Production on Databricks: A Developer-Focused Checklist</title>
      <dc:creator>Naveen Ayalla</dc:creator>
      <pubDate>Mon, 08 Jun 2026 01:10:23 +0000</pubDate>
      <link>https://dev.to/naveenayalla1cs50/-moving-rag-from-demo-to-production-on-databricks-a-developer-focused-checklist-4kbp</link>
      <guid>https://dev.to/naveenayalla1cs50/-moving-rag-from-demo-to-production-on-databricks-a-developer-focused-checklist-4kbp</guid>
      <description>&lt;p&gt;&lt;strong&gt;By Naveen Ayalla&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This article is adapted from my original post in the Databricks Community and is shared here for developers, data engineers, and GenAI practitioners building production AI workflows.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;A RAG demo is easy to build compared to a production RAG system.&lt;/p&gt;

&lt;p&gt;For a demo, you can upload documents, create embeddings, connect an LLM, ask a question, and return an answer.&lt;/p&gt;

&lt;p&gt;That is a great starting point.&lt;/p&gt;

&lt;p&gt;But production needs more than a working answer.&lt;/p&gt;

&lt;p&gt;A production RAG workflow has to answer questions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Is the source data trusted?&lt;/li&gt;
&lt;li&gt;Is the user allowed to access this content?&lt;/li&gt;
&lt;li&gt;Did the system retrieve the right context?&lt;/li&gt;
&lt;li&gt;Is the answer grounded in that context?&lt;/li&gt;
&lt;li&gt;Can we monitor quality, latency, cost, and failures?&lt;/li&gt;
&lt;li&gt;Who owns the data and the workflow after launch?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When these questions are ignored, many GenAI projects slow down after the demo stage.&lt;/p&gt;

&lt;p&gt;Below is a practical checklist I use when thinking about RAG workflows on Databricks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo vs. Production
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Area&lt;/th&gt;
&lt;th&gt;Demo Thinking&lt;/th&gt;
&lt;th&gt;Production Thinking&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Data&lt;/td&gt;
&lt;td&gt;Use sample documents.&lt;/td&gt;
&lt;td&gt;Use trusted, current, approved data.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Access&lt;/td&gt;
&lt;td&gt;Assume one access level.&lt;/td&gt;
&lt;td&gt;Enforce user permissions and sensitive-data rules.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Retrieval&lt;/td&gt;
&lt;td&gt;Return similar chunks.&lt;/td&gt;
&lt;td&gt;Return the right context for the right user.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Response&lt;/td&gt;
&lt;td&gt;Generate a helpful answer.&lt;/td&gt;
&lt;td&gt;Answer only from supported context.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Evaluation&lt;/td&gt;
&lt;td&gt;Try a few test prompts.&lt;/td&gt;
&lt;td&gt;Measure retrieval quality, groundedness, correctness, and failures.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Monitoring&lt;/td&gt;
&lt;td&gt;Check usage.&lt;/td&gt;
&lt;td&gt;Track quality, latency, cost, errors, and feedback.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ownership&lt;/td&gt;
&lt;td&gt;AI team owns everything.&lt;/td&gt;
&lt;td&gt;Data owners, platform teams, and business users share ownership.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  1. Start With a Narrow Use Case
&lt;/h2&gt;

&lt;p&gt;The first mistake is trying to index everything.&lt;/p&gt;

&lt;p&gt;A better starting point is one clear use case.&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Help support teams answer product questions faster.&lt;/li&gt;
&lt;li&gt;Help analysts search internal documentation.&lt;/li&gt;
&lt;li&gt;Help engineers troubleshoot pipeline failures.&lt;/li&gt;
&lt;li&gt;Help business users understand policy documents.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A narrow use case helps you choose better data, test better questions, and measure value more clearly.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Use Data You Can Trust
&lt;/h2&gt;

&lt;p&gt;Not every document should go into a RAG system.&lt;/p&gt;

&lt;p&gt;Before indexing content, ask:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Who owns the data?&lt;/li&gt;
&lt;li&gt;Is it current?&lt;/li&gt;
&lt;li&gt;Is it approved for this use case?&lt;/li&gt;
&lt;li&gt;Does it include sensitive information?&lt;/li&gt;
&lt;li&gt;Which users should be allowed to see it?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the source data is outdated or poorly governed, the generated answer will not be reliable.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Add Metadata Early
&lt;/h2&gt;

&lt;p&gt;Metadata is easy to skip in a demo, but it becomes very useful in production.&lt;/p&gt;

&lt;p&gt;Useful metadata includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;document owner&lt;/li&gt;
&lt;li&gt;source system&lt;/li&gt;
&lt;li&gt;updated date&lt;/li&gt;
&lt;li&gt;department&lt;/li&gt;
&lt;li&gt;product name&lt;/li&gt;
&lt;li&gt;region&lt;/li&gt;
&lt;li&gt;sensitivity level&lt;/li&gt;
&lt;li&gt;access group&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Metadata helps with filtering, debugging, governance, and retrieval quality.&lt;/p&gt;

&lt;p&gt;For example, if two documents answer the same question but one is newer, metadata can help the system prefer the latest source.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Build Access Control Into Retrieval
&lt;/h2&gt;

&lt;p&gt;In enterprise RAG, access control cannot be an afterthought.&lt;/p&gt;

&lt;p&gt;If a user cannot access a document directly, they should not be able to access it through an AI assistant.&lt;/p&gt;

&lt;p&gt;This means the retrieval layer should respect permissions, sensitivity rules, and data ownership.&lt;/p&gt;

&lt;p&gt;On Databricks, this is where a governed lakehouse design becomes important. The AI workflow should follow the same governance principles as the rest of the data platform.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Evaluate Retrieval and Generation Separately
&lt;/h2&gt;

&lt;p&gt;When a RAG answer is wrong, it is important to know why.&lt;/p&gt;

&lt;p&gt;The issue may be retrieval.&lt;br&gt;
The issue may be the model.&lt;br&gt;
The issue may be missing data.&lt;br&gt;
The issue may be stale content.&lt;br&gt;
The issue may be bad chunking.&lt;/p&gt;

&lt;p&gt;That is why I prefer to evaluate retrieval and answer generation separately.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Evaluation Area&lt;/th&gt;
&lt;th&gt;Main Question&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Retrieval quality&lt;/td&gt;
&lt;td&gt;Did the system retrieve the right context?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Answer quality&lt;/td&gt;
&lt;td&gt;Did the model use the context correctly?&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This makes debugging much easier.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Tell the Model When to Stop
&lt;/h2&gt;

&lt;p&gt;One of the most useful production rules is simple:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If the retrieved context is not enough, say that the information is not available instead of guessing.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For internal business users, a confident wrong answer is worse than a clear limitation.&lt;/p&gt;

&lt;p&gt;A good RAG system should know when not to answer.&lt;/p&gt;

&lt;h2&gt;
  
  
  7. Monitor After Launch
&lt;/h2&gt;

&lt;p&gt;A RAG system changes after it goes live.&lt;/p&gt;

&lt;p&gt;Users ask new questions.&lt;br&gt;
Documents get updated.&lt;br&gt;
Models change.&lt;br&gt;
Costs change.&lt;br&gt;
Business rules change.&lt;/p&gt;

&lt;p&gt;After launch, monitor:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;user feedback&lt;/li&gt;
&lt;li&gt;failed questions&lt;/li&gt;
&lt;li&gt;retrieval quality&lt;/li&gt;
&lt;li&gt;latency&lt;/li&gt;
&lt;li&gt;cost&lt;/li&gt;
&lt;li&gt;error rate&lt;/li&gt;
&lt;li&gt;outdated sources&lt;/li&gt;
&lt;li&gt;low-confidence answers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Monitoring should feed back into better data preparation, improved metadata, better prompts, and stronger evaluation datasets.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thought
&lt;/h2&gt;

&lt;p&gt;Production RAG is not just an LLM connected to a vector index.&lt;/p&gt;

&lt;p&gt;It is a governed data product.&lt;/p&gt;

&lt;p&gt;It needs trusted data, metadata, permissions, evaluation, monitoring, and clear ownership.&lt;/p&gt;

&lt;p&gt;Databricks can be a strong foundation for this kind of workflow because data engineering, governance, machine learning, and AI workflows can be connected through the lakehouse approach.&lt;/p&gt;

&lt;p&gt;I would like to hear from other developers and data engineers:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What has been the hardest part of moving RAG from demo to production: access control, retrieval quality, evaluation, monitoring, cost, or user adoption?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This article was originally published in the Databricks Community and is republished here for developers, data engineers, and GenAI practitioners building production AI workflows. Original post: &lt;a href="https://community.databricks.com/t5/data-engineering/from-rag-demo-to-production-on-databricks-7-things-teams-should/m-p/158526#M54730" rel="noopener noreferrer"&gt;https://community.databricks.com/t5/data-engineering/from-rag-demo-to-production-on-databricks-7-things-teams-should/m-p/158526#M54730&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>gpt3</category>
      <category>datascience</category>
    </item>
    <item>
      <title>From 2 AM Failures to 10x Speed: How We Escaped the Stored Procedure Prison</title>
      <dc:creator>Naveen Ayalla</dc:creator>
      <pubDate>Sat, 06 Jun 2026 07:18:53 +0000</pubDate>
      <link>https://dev.to/naveenayalla1cs50/from-2-am-failures-to-10x-speed-how-we-escaped-the-stored-procedure-prison-3bkf</link>
      <guid>https://dev.to/naveenayalla1cs50/from-2-am-failures-to-10x-speed-how-we-escaped-the-stored-procedure-prison-3bkf</guid>
      <description>&lt;h3&gt;
  
  
  The Hook: The 2 AM Wake-Up Call
&lt;/h3&gt;

&lt;p&gt;It was 2 AM on a Tuesday. Again.&lt;/p&gt;

&lt;p&gt;My phone buzzed with the all-too-familiar alert: “SAP HANA ETL job failed — timeout exceeded.” This wasn't just a technical glitch; it was a systemic failure of an architecture pushed to its limits. Our nightly batch processes, powered by over 200 deeply nested stored procedures, had become a fragile ecosystem. For the third time that month, the finance team would be without their reports at 8 AM, and as the Data Engineering Lead, the responsibility sat squarely on my shoulders.&lt;/p&gt;

&lt;p&gt;The reality of managing these legacy systems is that you eventually stop being an engineer and start being an archaeologist. These procedures had evolved over seven years — written by consultants long gone and modified by developers who left no documentation. Debugging them felt like an archaeological excavation of technical debt. I knew that to save our SLAs (and my sleep schedule), we had to migrate to Databricks and PySpark.&lt;/p&gt;




&lt;h3&gt;
  
  
  Takeaway 1: The “Black Box” Logic is Your Biggest Liability
&lt;/h3&gt;

&lt;p&gt;In my experience, the primary danger of a legacy SAP HANA environment isn't just the performance lag; it's the existential risk of “black box” logic. When your business logic is trapped in proprietary SQLScript, it becomes a liability. Without version control or unit tests, the core of your company's data intelligence is unverified and unscalable.&lt;/p&gt;

&lt;p&gt;This creates a “proprietary prison” of vendor lock-in. When business logic is coupled so tightly to a specific database dialect, your ability to scale is limited by the physical constraints of expensive, vertically-scaled hardware. From a business continuity standpoint, undocumented, nested logic isn't just annoying — it's a risk to the entire department's standing with leadership.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;“The worst part? Nobody truly understood the stored procedures anymore. They had evolved over 7 years — written by consultants who left, modified by developers who forgot to document, and nested so deeply that debugging felt like archaeological excavation.”&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  Takeaway 2: Killing Cursors is the Key to 10x Performance
&lt;/h3&gt;

&lt;p&gt;The most significant performance leap came from a fundamental shift in mindset: killing cursors. SAP HANA often relies on row-by-row processing patterns that are inherently sequential and resource-heavy. By moving to the set-based operations of PySpark, we didn't just see marginal gains; we saw a transformation.&lt;/p&gt;

&lt;p&gt;The numbers speak for themselves. In our migration, a process that once took 45 minutes using a SAP HANA cursor was slashed to just 90 seconds. But the real “aha!” moment for our team was our most complex procedure, which dropped from 94 minutes to a mere 6 minutes — a 15.6x jump in performance. This is the power of moving from sequential database execution to a distributed processing mindset.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;SAP HANA Pattern&lt;/th&gt;
&lt;th&gt;PySpark Equivalent&lt;/th&gt;
&lt;th&gt;Why It's Faster&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cursors &amp;amp; Manual Loops&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Set-Based Operations&lt;/td&gt;
&lt;td&gt;Distributed execution across the cluster nodes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Nested Row Calculations&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Window Functions&lt;/td&gt;
&lt;td&gt;Optimized execution plans and predicate pushdown&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Recursive CTEs&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;GraphFrames&lt;/td&gt;
&lt;td&gt;Efficient handling of deep hierarchical trees&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h3&gt;
  
  
  Takeaway 3: Cost Reductions are Not Just Marginal, They're Massive
&lt;/h3&gt;

&lt;p&gt;When I presented the billing report after our first month on the Databricks Lakehouse, the numbers didn't just meet expectations — they fundamentally changed our standing with the CFO. We achieved a staggering 82% reduction in monthly compute costs, dropping from $18,500 to just $3,200.&lt;/p&gt;

&lt;p&gt;That is a saving of &lt;strong&gt;$15,300 per month&lt;/strong&gt;, or roughly &lt;strong&gt;$183,600 per year&lt;/strong&gt;. That's more than enough to fund an additional senior headcount for the team.&lt;/p&gt;

&lt;p&gt;The financial drain of SAP HANA comes from its vertical scaling model: you pay for high-spec, expensive nodes that sit idle or underutilized just to handle peak loads. By moving to a Lakehouse architecture, we stopped paying for peak capacity 24/7. We now pay only for the compute we actually use during transformations, leveraging elastic scaling and cost-effective cloud storage.&lt;/p&gt;




&lt;h3&gt;
  
  
  Takeaway 4: Technical “Gotchas” Live in the Details of NULLs and Decimals
&lt;/h3&gt;

&lt;p&gt;If you're planning this jump, let me tell you exactly where the bodies are buried. While the performance gains are intoxicating, the technical nuances can compromise your data integrity if you aren't disciplined:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;NULL Aggregations:&lt;/strong&gt; SAP HANA and Spark handle NULLs differently. In my experience, skipping a thorough validation of these differences is the fastest way to lose the trust of your finance stakeholders.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Decimal Precision:&lt;/strong&gt; SAP HANA supports higher precision by default. To avoid rounding errors that could break a balance sheet, you must explicitly cast decimals at the schema level in PySpark.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;The “Side Effect” Nightmare:&lt;/strong&gt; Stored procedures often modify data outside their declared scope — a concept foreign to PySpark's functional, side-effect-free programming model. We had to reverse-engineer these hidden behaviors to ensure the new logic captured every “ghost” rule.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Architect-Level Partitioning:&lt;/strong&gt; Don't rely on auto-partitioning. For my workload, partitioning by &lt;code&gt;order_year&lt;/code&gt; on high-cardinality columns reduced scan times by 80%.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To ensure safety, we mandated row-for-row validation for a full week before the cutover. The execution engine changed, but the data remained 100% identical.&lt;/p&gt;




&lt;h3&gt;
  
  
  Takeaway 5: Transitioning from a Database to a Development Platform
&lt;/h3&gt;

&lt;p&gt;The real “escape” from the legacy prison isn't just about code; it's about engineering rigor. We moved from a database-centric world to a platform-centric world.&lt;/p&gt;

&lt;p&gt;By adopting Databricks, we introduced CI/CD, Git version control, and automated unit testing into our workflow. This democratization of the codebase meant that junior engineers could finally understand and modify transformations that were previously locked behind the “black box” of senior-only SQLScript knowledge.&lt;/p&gt;

&lt;p&gt;Furthermore, features like Delta Time Travel became our “get out of jail free” card. Being able to “time travel” back to a previous state of the data to debug a failure in minutes — rather than hours of archaeological digging — has completely changed our operational velocity. We are no longer just running daily batches; we are now near real-time and streaming ready.&lt;/p&gt;




&lt;h3&gt;
  
  
  Conclusion: Your Future Sleep Schedule Depends on the Leap
&lt;/h3&gt;

&lt;p&gt;The results of this migration were definitive: we slashed our total nightly ETL runtime from 5 hours and 23 minutes to just 28 minutes. We turned a $220k annual liability into a lean, $38k high-performance asset.&lt;/p&gt;

&lt;p&gt;By escaping the proprietary prison of stored procedures, we empowered our team, secured our data integrity, and restored our sleep schedules. The data team is no longer the bottleneck; we are the engine of the company.&lt;/p&gt;

&lt;p&gt;Here is the question for you: Is your current technical debt a managed expense, or is it a liability waiting to wake you up at 2 AM? It might be time to take the leap.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Building a PySpark and AWS Glue ETL Pipeline for Search Keyword Revenue Analysis</title>
      <dc:creator>Naveen Ayalla</dc:creator>
      <pubDate>Sun, 31 May 2026 23:10:33 +0000</pubDate>
      <link>https://dev.to/naveenayalla1cs50/building-a-pyspark-and-aws-glue-etl-pipeline-for-search-keyword-revenue-analysis-6gp</link>
      <guid>https://dev.to/naveenayalla1cs50/building-a-pyspark-and-aws-glue-etl-pipeline-for-search-keyword-revenue-analysis-6gp</guid>
      <description>&lt;p&gt;I published a public data engineering project that demonstrates a cloud-based ETL pipeline for analyzing web analytics search keyword revenue.&lt;/p&gt;

&lt;p&gt;The project uses PySpark, AWS Glue, Amazon S3, and Terraform to process hit-level web analytics data, extract external search engine domains and keywords, parse revenue, and generate a sorted reporting output.&lt;/p&gt;

&lt;p&gt;Key concepts covered:&lt;/p&gt;

&lt;p&gt;Batch ETL pipeline design&lt;br&gt;
PySpark transformations&lt;br&gt;
AWS Glue job configuration&lt;br&gt;
S3 input and output workflow&lt;br&gt;
Revenue aggregation logic&lt;br&gt;
Terraform infrastructure examples&lt;/p&gt;

&lt;p&gt;This is a generic open-source portfolio project and does not include proprietary or company-provided data.&lt;/p&gt;

&lt;p&gt;GitHub:  &lt;a href="https://github.com/naveenayalla1-CS50/search-keyword-performance-revenue" rel="noopener noreferrer"&gt;https://github.com/naveenayalla1-CS50/search-keyword-performance-revenue&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Feedback from data engineers and cloud data practitioners is welcome.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>dataengineering</category>
      <category>showdev</category>
      <category>terraform</category>
    </item>
  </channel>
</rss>
