<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: NexusFeed</title>
    <description>The latest articles on DEV Community by NexusFeed (@nexusfeed).</description>
    <link>https://dev.to/nexusfeed</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3873684%2F18c0d814-ad2e-4c1c-ac59-93e43e6ab875.png</url>
      <title>DEV Community: NexusFeed</title>
      <link>https://dev.to/nexusfeed</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/nexusfeed"/>
    <language>en</language>
    <item>
      <title>The data every AI agent needs but nobody sells cleanly — and what you can build on top of it</title>
      <dc:creator>NexusFeed</dc:creator>
      <pubDate>Tue, 14 Apr 2026 21:53:21 +0000</pubDate>
      <link>https://dev.to/nexusfeed/the-data-every-ai-agent-needs-but-nobody-sells-cleanly-and-what-you-can-build-on-top-of-it-1dia</link>
      <guid>https://dev.to/nexusfeed/the-data-every-ai-agent-needs-but-nobody-sells-cleanly-and-what-you-can-build-on-top-of-it-1dia</guid>
      <description>&lt;p&gt;Freight audit shops charge their customers &lt;strong&gt;3–8% of recovered overcharges&lt;/strong&gt; to reconcile invoices against published carrier rates. The data input to that business — current LTL fuel surcharge percentages from each carrier, refreshed weekly — costs &lt;strong&gt;$0.03 per lookup&lt;/strong&gt; if you know where to get it.&lt;/p&gt;

&lt;p&gt;That gap is the thing I want to talk about. Not the scraping. The &lt;em&gt;gap&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;I just shipped &lt;strong&gt;NexusFeed&lt;/strong&gt;, a JSON API that returns two kinds of data the rest of the web sells badly or not at all: LTL fuel surcharges for 10 freight carriers, and liquor-license compliance records for 5 US states. Both are public information. Both are locked behind portals that range from &lt;em&gt;annoying&lt;/em&gt; to &lt;em&gt;actively hostile&lt;/em&gt;. And both are the kind of data that used to require either a sales call to a legacy data vendor or a wet-signed NDA with a compliance firm.&lt;/p&gt;

&lt;p&gt;This post is not about how I got the data. It is about what &lt;em&gt;you&lt;/em&gt; can build on top of it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this data is worth your attention
&lt;/h2&gt;

&lt;p&gt;Here's the thing that surprised me six months into building this: the hard part was never the scraping.&lt;/p&gt;

&lt;p&gt;The hard part is that &lt;strong&gt;agent-native data doesn't exist yet&lt;/strong&gt; in most B2B verticals. If you want to wire a language model into a freight-audit workflow or a three-tier alcohol compliance check, you have two options. One, pay a legacy vendor $3–15k/month for a CSV drop and a login to their dashboard. Two, scrape it yourself — and discover that five of the eleven sources I currently cover require non-trivial anti-bot handling, weekly regression testing, and a confidence model so your agent doesn't hallucinate a rate that silently 404'd that morning.&lt;/p&gt;

&lt;p&gt;That second path is what I did, which is why I can now charge &lt;strong&gt;$0.03 per LTL lookup&lt;/strong&gt; and &lt;strong&gt;$0.05 per ABC lookup&lt;/strong&gt; with a &lt;code&gt;_verifiability&lt;/code&gt; block on every response. Both numbers are roughly two orders of magnitude cheaper than the legacy dashboard subscriptions, because the cost structure is marginal per-call instead of seat-based.&lt;/p&gt;

&lt;p&gt;Which means there is a window open right now for someone to build the &lt;em&gt;thin agent layer&lt;/em&gt; on top, and keep most of the margin.&lt;/p&gt;

&lt;h2&gt;
  
  
  Build #1: the freight-audit agent
&lt;/h2&gt;

&lt;p&gt;Pick any mid-market 3PL or shipper doing $50M+ in annual freight spend. Industry studies consistently find 3–8% of LTL invoices contain overcharges — wrong fuel surcharge, wrong accessorial, wrong class. Freight-audit firms recover those overcharges and keep a cut, typically 30–50% of the recovered amount.&lt;/p&gt;

&lt;p&gt;The data they need to do this is: for every invoice, what &lt;em&gt;should&lt;/em&gt; the fuel surcharge have been on the day the shipment moved, per that carrier's published rate? Historically that data lived in a dozen carrier PDFs and a rates analyst's memory.&lt;/p&gt;

&lt;p&gt;With NexusFeed's LTL endpoints it's one HTTP call per carrier-week. Pipe a shipper's invoice CSV into an agent, have the agent pull the correct rate for each row, flag discrepancies above a threshold, draft a dispute letter. &lt;strong&gt;Cost to run the data layer on 10,000 invoices a month: about $300.&lt;/strong&gt; Revenue share on recovered overcharges on that volume: five figures. That's the entire unit economics.&lt;/p&gt;

&lt;p&gt;If you build this, your moat is &lt;em&gt;not&lt;/em&gt; the data — you're renting that. Your moat is the workflow: the invoice parser, the dispute-letter prompt chain, the customer dashboard. Those are the expensive, differentiated pieces.&lt;/p&gt;

&lt;h2&gt;
  
  
  The &lt;code&gt;_verifiability&lt;/code&gt; contract (and why your agent needs it)
&lt;/h2&gt;

&lt;p&gt;This is the design decision I am proudest of, and the one I wish more scraping APIs made. It is also what makes "build an agent on top of this" a real proposal instead of a pipe dream.&lt;/p&gt;

&lt;p&gt;Every response from NexusFeed carries a &lt;code&gt;_verifiability&lt;/code&gt; block as a first-class field, not a footnote:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Verifiability&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;source_timestamp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;
    &lt;span class="n"&gt;extraction_confidence&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;
    &lt;span class="n"&gt;raw_data_evidence_url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;extraction_method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ExtractionMethod&lt;/span&gt;
    &lt;span class="n"&gt;data_freshness_ttl_seconds&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;extraction_method&lt;/code&gt; is an enum with five members: &lt;code&gt;api_mirror&lt;/code&gt;, &lt;code&gt;playwright_dom&lt;/code&gt;, &lt;code&gt;structured_parse&lt;/code&gt;, &lt;code&gt;scraper_api&lt;/code&gt;, &lt;code&gt;scraper_api_fallback&lt;/code&gt;. An agent reading the response can see immediately whether the number came from a clean JSON API mirror or from a regex over a rendered portal, and make its own call about how much to trust the answer.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;extraction_confidence&lt;/code&gt; is never hardcoded. It is the output of one short function, called by every extractor:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;compute_confidence&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;required_fields&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;found_fields&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;set&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;fallback_triggered&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;fallback_triggered&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;
    &lt;span class="n"&gt;found&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;required_fields&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;found_fields&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;found&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;required_fields&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Six required fields, four found, fallback path did not trigger? Confidence is 0.67. Primary extractor failed and the HTML fallback ran? Confidence is 0.0 and the router returns 503 instead of a half-answer.&lt;/p&gt;

&lt;p&gt;Why this matters for &lt;em&gt;you&lt;/em&gt; specifically: an agent making compliance or financial decisions on top of scraped data needs a programmatic honesty signal. Without it, you get silent drift — the source site quietly changes a column, your agent keeps returning plausible-looking answers, and your customer eventually notices. With &lt;code&gt;_verifiability&lt;/code&gt;, you can gate agent actions on &lt;code&gt;confidence &amp;gt;= 0.9&lt;/code&gt;, log the evidence URL for audit, and get paged the moment a source degrades. It's the difference between a demo and a production system.&lt;/p&gt;

&lt;h2&gt;
  
  
  Builds #2 and #3
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The three-tier compliance dashboard.&lt;/strong&gt; Every alcohol brand selling into the US has to track which of their distributors' licenses are current, in which states, with which privileges. Today that work is done by paralegals re-typing data out of five different state portals once a quarter. With NexusFeed's ABC endpoints (CA, TX, NY, FL, IL) it's a nightly cron. The buyer is any beverage-alcohol brand with a compliance or legal-ops function. Pricing anchor: Park Street Compliance charges four-figures-a-month for the human-driven version.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The AI freight broker assistant.&lt;/strong&gt; Brokers spend a large portion of their day quoting shippers, and the quote depends on current fuel surcharge plus accessorials per carrier. An agent that watches a broker's inbox, parses RFQs, pulls current rates for every carrier in their network, and drafts a priced response saves the broker hours per day. The buyer is any brokerage with 5–50 agents — a market with thousands of firms in the US alone. NexusFeed's LTL endpoints are the input layer.&lt;/p&gt;

&lt;p&gt;Neither of these is a hypothetical. They're calls I had before I shipped. Both buyers have budget and neither of them wants to be the company that scrapes ODFL's website themselves.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's shipped, and what it costs
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;10 LTL carriers:&lt;/strong&gt; Old Dominion, Saia, Estes, ABF, R+L, TForce, XPO, Southeastern, Averitt, FedEx&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;5 ABC states:&lt;/strong&gt; California, Texas, New York, Florida, Illinois&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;230 passing tests&lt;/strong&gt;, FastAPI on Railway, Redis cache-aside (7-day LTL TTL, 24-hour ABC TTL)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stripe metered billing&lt;/strong&gt; per product — &lt;code&gt;ltl_request&lt;/code&gt; at $0.03/call, &lt;code&gt;abc_request&lt;/code&gt; at $0.05/call&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP server&lt;/strong&gt; — install NexusFeed as a tool in Claude Desktop or Cline with one command, and your agent can call every endpoint as a first-class function&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;Docs, OpenAPI spec, and the MCP install command are at &lt;strong&gt;&lt;a href="https://docs.nexusfeed.dev" rel="noopener noreferrer"&gt;docs.nexusfeed.dev&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If you want a key, the two products are on RapidAPI:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://rapidapi.com/ladourv/api/ltl-fuel-surcharge-api" rel="noopener noreferrer"&gt;LTL Fuel Surcharge API&lt;/a&gt;&lt;/strong&gt; — $0.03/request&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://rapidapi.com/ladourv/api/abc-license-compliance-api" rel="noopener noreferrer"&gt;ABC License Compliance API&lt;/a&gt;&lt;/strong&gt; — $0.05/request&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What would you build on top of this? I gave you three starting points. If you have a fourth — or you're already building one of these three — drop it in the comments. I'll answer every one, and the best use-case gets a free month on both products to ship it.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>showdev</category>
      <category>api</category>
      <category>agents</category>
    </item>
  </channel>
</rss>
