<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Michael San Martim</title>
    <description>The latest articles on DEV Community by Michael San Martim (@sanmartim).</description>
    <link>https://dev.to/sanmartim</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3901257%2F8667821e-9561-4565-9ab3-c214955cac32.jpg</url>
      <title>DEV Community: Michael San Martim</title>
      <link>https://dev.to/sanmartim</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sanmartim"/>
    <language>en</language>
    <item>
      <title>Your AI agent needs data. Here's how to feed it without RAG.</title>
      <dc:creator>Michael San Martim</dc:creator>
      <pubDate>Tue, 28 Apr 2026 19:49:27 +0000</pubDate>
      <link>https://dev.to/sanmartim/your-ai-agent-needs-data-heres-how-to-feed-it-without-rag-3162</link>
      <guid>https://dev.to/sanmartim/your-ai-agent-needs-data-heres-how-to-feed-it-without-rag-3162</guid>
      <description>&lt;p&gt;Your AI agent is smart. It can write code, plan projects, debug errors. But ask it about &lt;em&gt;your&lt;/em&gt; data — "how many orders did we get yesterday?" — and it falls apart.&lt;/p&gt;

&lt;p&gt;The usual fix is RAG: chunk your data, embed it, stuff it into a vector store, pray the retrieval finds the right rows. &lt;strong&gt;It works 60% of the time.&lt;/strong&gt; The other 40%, the agent hallucinates numbers, mixes up customers, invents trends that don't exist.&lt;/p&gt;

&lt;p&gt;Here's the thing: your data is already in a database. In tables. With schemas. SQL exists. Why are we turning structured data into embeddings just so an LLM can guess at it?&lt;/p&gt;

&lt;h2&gt;
  
  
  The real problem
&lt;/h2&gt;

&lt;p&gt;AI agents don't have a data problem. They have an &lt;strong&gt;access&lt;/strong&gt; problem.&lt;/p&gt;

&lt;p&gt;Your agent can't just &lt;code&gt;SELECT * FROM orders&lt;/code&gt;. It doesn't have a database connection. It doesn't know your schema. It doesn't have credentials. So you build a RAG pipeline — which is really just a worse, slower, less accurate version of SQL.&lt;/p&gt;

&lt;p&gt;What if the agent could just... query the database?&lt;/p&gt;

&lt;h2&gt;
  
  
  MCP changes everything
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://modelcontextprotocol.io/" rel="noopener noreferrer"&gt;Model Context Protocol&lt;/a&gt; (MCP) lets AI agents call tools. Not "tools" in the prompt-engineering sense — actual tools. Functions with parameters and return values. Claude, Cursor, Windsurf — they all support it.&lt;/p&gt;

&lt;p&gt;So we built MCP servers for our data platform. The agent connects, discovers your tables, and runs real SQL. No embeddings. No vector stores. No hallucinations.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;User&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;"How did sales perform last week vs the week before?"&lt;/span&gt;

&lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;MCP&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="n"&gt;list_tables&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
       &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;MCP&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="n"&gt;describe_table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"orders"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
       &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;MCP&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"SELECT DATE_TRUNC('week', created_at) AS week,
                    SUM(total) AS revenue FROM orders
                    WHERE created_at &amp;gt;= CURRENT_DATE - 14
                    GROUP BY 1 ORDER BY 1"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;"Sales were $312k last week, up 8% from $289k the week before.
        Electronics had the biggest jump at 15%."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every number comes from a real SQL query. The agent can show its work.&lt;/p&gt;

&lt;h2&gt;
  
  
  How we built it
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://dataspoc.com" rel="noopener noreferrer"&gt;DataSpoc&lt;/a&gt; is an open-source data platform — three CLI tools connected by Parquet files in a cloud bucket. The two that matter for agents:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pipe&lt;/strong&gt; — ingests data from 400+ sources (databases, APIs, SaaS) to Parquet in S3/GCS/Azure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lens&lt;/strong&gt; — mounts your cloud Parquet as DuckDB views. SQL shell, Jupyter notebooks, AI queries, local cache.&lt;/p&gt;

&lt;p&gt;Both expose &lt;strong&gt;MCP servers&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;dataspoc-lens[mcp] dataspoc-pipe[mcp]

dataspoc-lens mcp    &lt;span class="c"&gt;# agents query your data&lt;/span&gt;
dataspoc-pipe mcp    &lt;span class="c"&gt;# agents manage pipelines&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Setting it up (5 minutes)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1: Get data into your bucket
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;dataspoc-pipe

dataspoc-pipe init
dataspoc-pipe add my-postgres
dataspoc-pipe run my-postgres
&lt;span class="c"&gt;# → Parquet files in s3://my-lake/raw/postgres/orders/...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Point Lens at the bucket
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;dataspoc-lens[mcp]

dataspoc-lens add-bucket s3://my-lake
dataspoc-lens catalog
&lt;span class="c"&gt;# → orders (45,000 rows, 12 columns)&lt;/span&gt;
&lt;span class="c"&gt;# → customers (12,000 rows, 8 columns)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Connect your AI agent
&lt;/h3&gt;

&lt;p&gt;Add to Claude Desktop config (&lt;code&gt;claude_desktop_config.json&lt;/code&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"dataspoc-lens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"dataspoc-lens"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"mcp"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"dataspoc-pipe"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"dataspoc-pipe"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"mcp"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. Your agent now has 14 tools:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lens tools (query):&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;list_tables&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Discover all tables in your lake&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;describe_table&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Get column names and types&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;query&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Run SQL (read-only)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ask&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Natural language → SQL → results&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;cache_status&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Check data freshness&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;cache_refresh&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Re-download stale tables&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;cache_refresh_stale&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Refresh all stale at once&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Pipe tools (ingestion):&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;list_pipelines&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Show configured sources&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;run_pipeline&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Trigger ingestion&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;pipeline_status&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Check all pipelines&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;pipeline_logs&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Read execution logs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;pipeline_config&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;View pipeline settings&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;show_manifest&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;View data catalog&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;validate_pipeline&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Test connectivity&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Why this beats RAG for structured data
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;RAG&lt;/th&gt;
&lt;th&gt;MCP + SQL&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Accuracy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~60% (retrieval + LLM interpretation)&lt;/td&gt;
&lt;td&gt;~99% (SQL is deterministic)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Speed&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Seconds (embed → search → rerank → generate)&lt;/td&gt;
&lt;td&gt;Milliseconds (SQL query)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Freshness&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Stale until re-indexed&lt;/td&gt;
&lt;td&gt;Real-time (query live data)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Auditability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Black box&lt;/td&gt;
&lt;td&gt;Agent shows the SQL it ran&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Schema changes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Re-embed everything&lt;/td&gt;
&lt;td&gt;Automatic (reads live schema)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Embedding API + vector DB + LLM&lt;/td&gt;
&lt;td&gt;Just the LLM (DuckDB is free)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;RAG is great for unstructured data — documents, emails, tickets. But for tables with rows and columns? SQL wins. Every time.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Python SDK (for agent frameworks)
&lt;/h2&gt;

&lt;p&gt;If you're building with CrewAI, LangGraph, or AutoGen:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dataspoc_lens&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LensClient&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nc"&gt;LensClient&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;lens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Discover
&lt;/span&gt;    &lt;span class="n"&gt;tables&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lens&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tables&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;schema&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lens&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;schema&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;orders&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Query
&lt;/span&gt;    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lens&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT status, COUNT(*) FROM orders GROUP BY 1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Ask in natural language
&lt;/span&gt;    &lt;span class="n"&gt;answer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lens&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;which customers are at risk of churning?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Ensure fresh data
&lt;/span&gt;    &lt;span class="n"&gt;lens&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cache_refresh_stale&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dataspoc_pipe&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;PipeClient&lt;/span&gt;

&lt;span class="n"&gt;pipe&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;PipeClient&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;pipe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my-postgres&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;           &lt;span class="c1"&gt;# trigger ingestion
&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pipe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;            &lt;span class="c1"&gt;# check all pipelines
&lt;/span&gt;&lt;span class="n"&gt;logs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pipe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;logs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my-postgres&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# read last execution
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No subprocess calls. No CLI parsing. Native Python.&lt;/p&gt;

&lt;h2&gt;
  
  
  The AGENT.md pattern
&lt;/h2&gt;

&lt;p&gt;Every DataSpoc repo ships with an &lt;code&gt;AGENT.md&lt;/code&gt; — a skill file that documents every function, pattern, and constraint. AI agents read it and know exactly what they can do.&lt;/p&gt;

&lt;p&gt;Think of it as a README, but for machines. Drop it into Claude Code, Cursor, or any agent workflow. No prompt engineering needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Governed access (the security angle)
&lt;/h2&gt;

&lt;p&gt;This isn't a free-for-all. The agent gets the &lt;strong&gt;same access as the human user&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Read-only by default.&lt;/strong&gt; The &lt;code&gt;query&lt;/code&gt; MCP tool rejects DROP, DELETE, INSERT, UPDATE.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cloud IAM enforced.&lt;/strong&gt; The agent uses your AWS/GCP/Azure credentials. No bucket access = no data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No new credentials.&lt;/strong&gt; Same IAM permissions your analysts use.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auditable.&lt;/strong&gt; Every query the agent runs is real SQL you can review.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No shadow data pipelines. No unaudited access. Your data governance stays intact.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this looks like in practice
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Data Engineer agent (via Pipe MCP):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You: "The sales pipeline failed last night. What happened?"

Agent: [MCP] pipeline_logs("sales")
       [MCP] validate_pipeline("sales")

Agent: "The Postgres source returned a connection timeout.
        The DB host is unreachable. Once it's back, I can
        re-run with --full to backfill the missed data."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Data Analyst agent (via Lens MCP):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;You&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;"Build me a weekly revenue report by product line."&lt;/span&gt;

&lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;MCP&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="n"&gt;cache_refresh_stale&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
       &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;MCP&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="n"&gt;list_tables&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
       &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;MCP&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="n"&gt;describe_table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"orders"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
       &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;MCP&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"SELECT ..."&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;"Revenue this week: $312k. Electronics leads at $98k (+15%),
        followed by Software at $87k. Here's the CSV export."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;ML agent (via Lens ML commands):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;You&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;"Can we predict which customers will churn?"&lt;/span&gt;

&lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;MCP&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;"SELECT * FROM customers LIMIT 5"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
       &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;ml&lt;/span&gt; &lt;span class="n"&gt;train&lt;/span&gt; &lt;span class="c1"&gt;--target churn --from customers&lt;/span&gt;
       &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;ml&lt;/span&gt; &lt;span class="k"&gt;explain&lt;/span&gt; &lt;span class="c1"&gt;--model churn&lt;/span&gt;

&lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;"Model trained (AUC 0.87). Top predictors: days_since_order,
        support_tickets, contract_type. 23 customers flagged as
        high risk. Predictions saved to ml/predictions/churn."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Try it now
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;dataspoc-pipe dataspoc-lens[mcp]

&lt;span class="c"&gt;# Get some data in&lt;/span&gt;
dataspoc-pipe init
dataspoc-pipe add my-source
dataspoc-pipe run my-source

&lt;span class="c"&gt;# Connect your agent&lt;/span&gt;
dataspoc-lens add-bucket s3://my-data
dataspoc-lens mcp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or just query from the terminal first:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;dataspoc-lens shell
dataspoc-lens ask &lt;span class="s2"&gt;"how many orders yesterday?"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Honest limitations
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Batch only.&lt;/strong&gt; No streaming. Pipe runs on cron, not in real-time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Single-node.&lt;/strong&gt; DuckDB is fast but not distributed. Petabyte-scale needs Snowflake.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Early stage&lt;/strong&gt; (v0.2.0). API may evolve.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ML is commercial.&lt;/strong&gt; Pipe and Lens are Apache 2.0 — free forever.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Website:&lt;/strong&gt; &lt;a href="https://dataspoc.com" rel="noopener noreferrer"&gt;dataspoc.com&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pipe:&lt;/strong&gt; &lt;a href="https://github.com/dataspoclab/dataspoc-pipe" rel="noopener noreferrer"&gt;github.com/dataspoclab/dataspoc-pipe&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lens:&lt;/strong&gt; &lt;a href="https://github.com/dataspoclab/dataspoc-lens" rel="noopener noreferrer"&gt;github.com/dataspoclab/dataspoc-lens&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PyPI:&lt;/strong&gt; &lt;a href="https://pypi.org/project/dataspoc-pipe/" rel="noopener noreferrer"&gt;dataspoc-pipe&lt;/a&gt; | &lt;a href="https://pypi.org/project/dataspoc-lens/" rel="noopener noreferrer"&gt;dataspoc-lens&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Docs:&lt;/strong&gt; &lt;a href="https://dataspoc.com/getting-started/overview/" rel="noopener noreferrer"&gt;dataspoc.com/getting-started/overview/&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Your data is in tables. Let your agent query it like tables. Stop turning SQL into embeddings.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Built in Brazil. Open source. For humans and AI agents.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>automation</category>
      <category>dataengineering</category>
    </item>
  </channel>
</rss>
