<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Prashanth Velidandi</title>
    <description>The latest articles on DEV Community by Prashanth Velidandi (@pmv_inferx).</description>
    <link>https://dev.to/pmv_inferx</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3382940%2F7c22cb83-8e48-49dd-8ec0-2aa7fb44b3e5.jpg</url>
      <title>DEV Community: Prashanth Velidandi</title>
      <link>https://dev.to/pmv_inferx</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/pmv_inferx"/>
    <language>en</language>
    <item>
      <title>Why AI Skills Are Broken - and How We Fixed the Architecture</title>
      <dc:creator>Prashanth Velidandi</dc:creator>
      <pubDate>Fri, 19 Jun 2026 14:52:54 +0000</pubDate>
      <link>https://dev.to/pmv_inferx/why-ai-skills-are-broken-and-how-we-fixed-the-architecture-4om7</link>
      <guid>https://dev.to/pmv_inferx/why-ai-skills-are-broken-and-how-we-fixed-the-architecture-4om7</guid>
      <description>&lt;p&gt;&lt;strong&gt;The promise of AI skills&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When Claude introduced SKILL.md files in late 2025, it changed how developers think about AI agents. Instead of hardcoding every instruction into a system prompt, you write a skill file, drop it in a folder, and your agent knows what to do. Simple. Elegant. Powerful.&lt;/p&gt;

&lt;p&gt;The ecosystem exploded. skills.sh now has 600,000+ skills. Developers are building, sharing, and shipping faster than ever.&lt;/p&gt;

&lt;p&gt;But there's a structural problem nobody is talking about.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;The three failures of local skills&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. The monolithic model tax&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When you configure an agent today — Claude Code, Cursor, OpenClaw — you pick one model. That model handles everything. Summarizing a routine email. Reviewing complex legal contracts. Generating pricing strategy. Same model. Same price.&lt;/p&gt;

&lt;p&gt;80% of your context window is consumed by tool definitions, system prompts, and conversation history. Not your actual task. You're paying premium prices for infrastructure that delivers minimal user value.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. The security problem nobody talks about&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Local skills run with full system privileges. They can read your files, invoke shell commands, access cloud credentials, and open network connections.&lt;/p&gt;

&lt;p&gt;A comprehensive study of 31,132 publicly available skills found that 26.1% contain at least one security vulnerability. Skills bundling executable scripts are 2.12x more likely to contain vulnerabilities.&lt;/p&gt;

&lt;p&gt;One malicious skill — or even a benign skill manipulated through prompt injection — can exfiltrate SSH keys, access cloud credentials, or delete critical data. The attack surface is enormous.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. The context bottleneck&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Complex workflows, deep domain knowledge, and large reference materials cannot fit in a single context window. Even with a 1 million token window, models systematically lose information in the middle.&lt;/p&gt;

&lt;p&gt;The more skills you add, the worse it gets. Attention scatters. Response times increase. Accuracy drops with every turn.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Rethinking the architecture&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The root cause of all three problems is the same: skills live inside the agent's shared context window, running on your local machine with full system privileges.&lt;/p&gt;

&lt;p&gt;What if skills lived outside the context window entirely?&lt;/p&gt;

&lt;p&gt;That's what we built. &lt;strong&gt;Skill Function&lt;/strong&gt; — a cloud-native Skill-as-a-Service platform.&lt;/p&gt;

&lt;p&gt;Instead of loading a skill into the agent's context, Skill Function moves each skill to the cloud as an independent callable service.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;POST api.inferx.net/skills/saas-pricing

{
  "input": "B2B SaaS, $50 ACV, PLG motion, 3 tiers"
}

→ Expert output. 195ms. Instructions never leave the platform.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;&lt;strong&gt;How it works&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Right model for each skill&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every Skill Function is bound to a pre-selected model chosen by the skill author. A simple classification skill uses a 7B model. A complex code-review skill uses a 70B model.&lt;/p&gt;

&lt;p&gt;When your agent calls the Skill Function, it no longer forces every task through your expensive flagship model. Each task gets the model it actually needs — no more, no less.&lt;/p&gt;

&lt;p&gt;Result: 70-90% lower inference cost for mixed workloads.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dedicated clean context&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Each Skill Function handles one task at a time. Input is simple: the user's request plus a short summary of relevant history. No unrelated tool definitions. No other skills' prompts. No accumulated conversation history.&lt;/p&gt;

&lt;p&gt;The skill runs in its own isolated context window — clean, focused, free of cross-talk. Performance does not degrade with every turn.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Skills call other skills&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;One Skill Function can call other Skill Functions, just like traditional function calls in software. Complex workflows decompose into a directed graph of skill calls.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;orchestrator → legal-reviewer    (if legal task)
orchestrator → pricing-strategist (if pricing task)  
orchestrator → code-reviewer      (if code task)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This breaks the context barrier entirely. Not fragmentation — composition.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Zero local execution&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A Skill Function is a pure knowledge skill. It cannot call tools directly — no curl, no bash, no local file access, no network egress.&lt;/p&gt;

&lt;p&gt;This eliminates the entire local attack surface. Even a successful prompt injection can only influence the skill's output text. It cannot trigger system-level actions.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;MCP-native discovery&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Skill Function exposes a standard MCP tool-calling interface. When you subscribe to a cloud skill, it automatically appears in your local agent through MCP tool discovery — just like a locally installed tool.&lt;/p&gt;

&lt;p&gt;No skill files to download. No environment variables to set. No local deployment. The agent simply sees a new tool and calls it.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;The result&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Local Skills&lt;/th&gt;
&lt;th&gt;Skill Function&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Model&lt;/td&gt;
&lt;td&gt;One flagship for everything&lt;/td&gt;
&lt;td&gt;Right model per task&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Context&lt;/td&gt;
&lt;td&gt;Shared, fills up&lt;/td&gt;
&lt;td&gt;Isolated per skill&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Security&lt;/td&gt;
&lt;td&gt;Full local privileges&lt;/td&gt;
&lt;td&gt;Zero local execution&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Composition&lt;/td&gt;
&lt;td&gt;File references&lt;/td&gt;
&lt;td&gt;Function calls&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Discovery&lt;/td&gt;
&lt;td&gt;Manual install&lt;/td&gt;
&lt;td&gt;MCP auto-discovery&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;p&gt;&lt;strong&gt;Try it&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;50+ Skill Functions available today across marketing, design, engineering, finance, and research. Or import your own SKILL.md and run it as a protected callable endpoint.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;inferx.net&lt;/strong&gt; — free to start.&lt;/p&gt;

&lt;p&gt;Full technical white paper: &lt;a href="https://inferx.net/skill-function-whitepaper" rel="noopener noreferrer"&gt;https://inferx.net/skill-function-whitepaper&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;For questions: &lt;a href="mailto:prashanth@inferx.net"&gt;prashanth@inferx.net&lt;/a&gt; · @InferXai&lt;/em&gt;&lt;/p&gt;




</description>
      <category>ai</category>
      <category>agentskills</category>
      <category>agents</category>
      <category>langchain</category>
    </item>
    <item>
      <title>RAG became the default answer for private knowledge access. We asked a different question: what if context didn’t need to be repeatedly retrieved at all? Persistent KV cache changed the economics completely.</title>
      <dc:creator>Prashanth Velidandi</dc:creator>
      <pubDate>Tue, 26 May 2026 12:21:50 +0000</pubDate>
      <link>https://dev.to/pmv_inferx/rag-became-the-default-answer-for-private-knowledge-access-we-asked-a-different-question-what-if-45m9</link>
      <guid>https://dev.to/pmv_inferx/rag-became-the-default-answer-for-private-knowledge-access-we-asked-a-different-question-what-if-45m9</guid>
      <description>&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/pmv_inferx/we-replaced-our-rag-pipeline-with-persistent-kv-cache-heres-what-we-found-7cl" class="crayons-story__hidden-navigation-link"&gt;We Replaced Our RAG Pipeline With Persistent KV Cache. Here's What We Found.&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/pmv_inferx" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3382940%2F7c22cb83-8e48-49dd-8ec0-2aa7fb44b3e5.jpg" alt="pmv_inferx profile" class="crayons-avatar__image"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/pmv_inferx" class="crayons-story__secondary fw-medium m:hidden"&gt;
              Prashanth Velidandi
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                Prashanth Velidandi
                
              
              &lt;div id="story-author-preview-content-3731263" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/pmv_inferx" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3382940%2F7c22cb83-8e48-49dd-8ec0-2aa7fb44b3e5.jpg" class="crayons-avatar__image" alt=""&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;Prashanth Velidandi&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/pmv_inferx/we-replaced-our-rag-pipeline-with-persistent-kv-cache-heres-what-we-found-7cl" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;May 23&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/pmv_inferx/we-replaced-our-rag-pipeline-with-persistent-kv-cache-heres-what-we-found-7cl" id="article-link-3731263"&gt;
          We Replaced Our RAG Pipeline With Persistent KV Cache. Here's What We Found.
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/rag"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;rag&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/serverless"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;serverless&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/machinelearning"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;machinelearning&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/webdev"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;webdev&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
          &lt;a href="https://dev.to/pmv_inferx/we-replaced-our-rag-pipeline-with-persistent-kv-cache-heres-what-we-found-7cl" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left"&gt;
            &lt;div class="multiple_reactions_aggregate"&gt;
              &lt;span class="multiple_reactions_icons_container"&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/fire-f60e7a582391810302117f987b22a8ef04a2fe0df7e3258a5f49332df1cec71e.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/sparkle-heart-5f9bee3767e18deb1bb725290cb151c25234768a0e9a2bd39370c382d02920cf.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
              &lt;/span&gt;
              &lt;span class="aggregate_reactions_counter"&gt;2&lt;span class="hidden s:inline"&gt;&amp;nbsp;reactions&lt;/span&gt;&lt;/span&gt;
            &lt;/div&gt;
          &lt;/a&gt;
            &lt;a href="https://dev.to/pmv_inferx/we-replaced-our-rag-pipeline-with-persistent-kv-cache-heres-what-we-found-7cl#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              

              &lt;span class="hidden s:inline"&gt;Add&amp;nbsp;Comment&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            3 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;


</description>
    </item>
    <item>
      <title>We Replaced Our RAG Pipeline With Persistent KV Cache. Here's What We Found.</title>
      <dc:creator>Prashanth Velidandi</dc:creator>
      <pubDate>Sat, 23 May 2026 08:34:13 +0000</pubDate>
      <link>https://dev.to/pmv_inferx/we-replaced-our-rag-pipeline-with-persistent-kv-cache-heres-what-we-found-7cl</link>
      <guid>https://dev.to/pmv_inferx/we-replaced-our-rag-pipeline-with-persistent-kv-cache-heres-what-we-found-7cl</guid>
      <description>&lt;p&gt;RAG has become the default answer for giving LLMs access to private knowledge. And for good reason — it works. But after running it in production we kept hitting the same wall. Not retrieval accuracy. The operational tax.&lt;/p&gt;

&lt;p&gt;Re-embedding on data changes. Chunking drift. Retrieval misses on edge cases. Pipeline failures at 2am. The vector database that needs babysitting.&lt;/p&gt;

&lt;p&gt;So we ran an experiment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Hypothesis&lt;/strong&gt;&lt;br&gt;
What if instead of chunking, embedding, and retrieving — we just loaded the full document into the LLM context, cached the KV state persistently, and reused it across every query?&lt;/p&gt;

&lt;p&gt;No retrieval step. No embedding pipeline. No vector database. Just the model with full document context, warm and ready.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How It Works&lt;/strong&gt;&lt;br&gt;
The core idea is simple. When an LLM processes a prompt it generates a key-value attention cache — the internal representation of everything it has read. Normally this cache is transient. It lives in VRAM during the request and disappears after.&lt;br&gt;
We persist it.&lt;br&gt;
The initialization prompt — your document — gets processed once. The resulting KV cache gets stored externally and indexed to that document. Every subsequent query retrieves that cached state and appends the user query. The model never recomputes the document. Ever.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The math&lt;/strong&gt;:&lt;br&gt;
KV_init = LLM.prefill(document)&lt;br&gt;
KV_store[document_id] = KV_init&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;# On every query:&lt;/strong&gt;&lt;br&gt;
KV_full = KV_store[document_id] + LLM.prefill(query)&lt;br&gt;
output = LLM.decode(KV_full)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What We Found&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Answer quality improved.&lt;br&gt;
No retrieval misses are possible when the full document is in context. The model has read everything. It doesn't guess which chunks are relevant — it knows the whole document. For complex multi-part questions that span different sections this is a significant improvement over chunked retrieval.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Updates became trivial.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Document changes? Re-run the prefill, store the new KV cache. Minutes not hours. No re-embedding pipeline. No re-indexing. No retrieval regression testing. Just regenerate and deploy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Operational complexity dropped.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;No embedding model to maintain. No vector database to monitor. No chunking strategy to tune. No retrieval quality metrics to track. The surface area for things to break quietly got dramatically smaller.&lt;br&gt;
Latency on warm cache is effectively instant.&lt;/p&gt;

&lt;p&gt;When the KV state is already loaded the query just appends and generates. No retrieval hop, no context injection latency.&lt;br&gt;
The Honest Tradeoffs&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Context window is the ceiling.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Current limit is around 120k tokens — roughly 200-300 pages. Works well for focused documents. For large corpora you need a routing layer to select the right cache per query. You've pushed the retrieval problem up one level — instead of retrieving chunks you're selecting a cache. Simpler problem but not zero.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cold cache restore adds latency.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The first query after a cache restore pays a latency cost. For strict SLA requirements this matters. Warm cache is instant. Cold restore depends on your infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Initial prefill costs more than embedding.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Running a full forward pass on a large document costs more compute than embedding it. The economics work when query volume is high enough to amortize that cost. Low query, high update frequency — RAG still wins.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where This Wins&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This approach is clearly better when:&lt;/p&gt;

&lt;p&gt;You have a focused, structured document — legal contract, compliance policy, product manual, technical spec&lt;br&gt;
Query volume is high relative to update frequency&lt;br&gt;
Full context comprehension matters more than breadth&lt;br&gt;
You want to eliminate pipeline maintenance entirely&lt;br&gt;
Privacy matters — no document chunks sent to embedding APIs&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where RAG Still Wins&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Very large document collections where context limits apply&lt;br&gt;
Highly dynamic data that changes multiple times per day&lt;br&gt;
When you genuinely don't know which document is relevant at query time&lt;br&gt;
Low query volume where prefill cost doesn't amortize&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What We're Building&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We've been running this in production at InferX as part of our Sovereign Endpoints™ infrastructure. The persistent KV cache layer sits on top of our GPU snapshotting architecture — which is what makes the cold cache restore fast enough to be practical.&lt;br&gt;
We're now opening a limited beta for teams who want to test this on real workloads. Particularly interested in legal, compliance, finance, and developer tooling use cases.&lt;br&gt;
If you're running RAG in production and want to run a head-to-head comparison — we'd love to work with you.&lt;/p&gt;

&lt;p&gt;🎬 Demo dropping in 2 days — follow to see it first.&lt;br&gt;
&lt;/p&gt;
&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
      &lt;div class="c-embed__body flex items-center justify-between"&gt;
        &lt;a href="https://inferx.net/" rel="noopener noreferrer" class="c-link fw-bold flex items-center"&gt;
          &lt;span class="mr-2"&gt;inferx.net&lt;/span&gt;
          

        &lt;/a&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;


</description>
      <category>rag</category>
      <category>serverless</category>
      <category>machinelearning</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
