<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: K.SLADE</title>
    <description>The latest articles on DEV Community by K.SLADE (@matchuplabs).</description>
    <link>https://dev.to/matchuplabs</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3861463%2F18e2ae7e-fde4-4c48-959d-3f3c397baed7.jpeg</url>
      <title>DEV Community: K.SLADE</title>
      <link>https://dev.to/matchuplabs</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/matchuplabs"/>
    <language>en</language>
    <item>
      <title>How I Structured NYC's Open Data for AI Agents Using MCP</title>
      <dc:creator>K.SLADE</dc:creator>
      <pubDate>Sat, 04 Apr 2026 21:37:03 +0000</pubDate>
      <link>https://dev.to/matchuplabs/how-i-structured-nycs-open-data-for-ai-agents-using-mcp-4f76</link>
      <guid>https://dev.to/matchuplabs/how-i-structured-nycs-open-data-for-ai-agents-using-mcp-4f76</guid>
      <description>&lt;p&gt;NYC gives away some of the best public data in the world. Property ownership, building violations, restaurant health inspections, tax assessments, complaints — all free, all public.&lt;/p&gt;

&lt;p&gt;The problem? It's completely unusable by AI agents.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Data Fragmentation Problem
&lt;/h2&gt;

&lt;p&gt;NYC's data is scattered across at least six agencies, each with its own API, schema, and query patterns:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Agency&lt;/th&gt;
&lt;th&gt;What They Publish&lt;/th&gt;
&lt;th&gt;Access Method&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;DOF (Finance)&lt;/td&gt;
&lt;td&gt;Ownership, tax, sales via ACRIS&lt;/td&gt;
&lt;td&gt;SODA + web scrape&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DOB (Buildings)&lt;/td&gt;
&lt;td&gt;Permits, new building apps&lt;/td&gt;
&lt;td&gt;SODA + DOB NOW BIS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HPD (Housing)&lt;/td&gt;
&lt;td&gt;Violations, complaints, registrations&lt;/td&gt;
&lt;td&gt;SODA + web&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OATH/ECB&lt;/td&gt;
&lt;td&gt;Administrative penalties&lt;/td&gt;
&lt;td&gt;SODA&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DOHMH (Health)&lt;/td&gt;
&lt;td&gt;Restaurant inspections, grades&lt;/td&gt;
&lt;td&gt;SODA&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HCR&lt;/td&gt;
&lt;td&gt;Rent stabilization status&lt;/td&gt;
&lt;td&gt;FOIL requests&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;An AI agent trying to answer a simple question like "is this a bad landlord?" would need to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Normalize the address (NYC addresses are notoriously inconsistent)&lt;/li&gt;
&lt;li&gt;Resolve it to a BBL (Borough-Block-Lot) and BIN (Building ID)&lt;/li&gt;
&lt;li&gt;Query DOF for ownership records&lt;/li&gt;
&lt;li&gt;Query HPD for housing violations&lt;/li&gt;
&lt;li&gt;Query DOB for building violations&lt;/li&gt;
&lt;li&gt;Query OATH/ECB for penalty history&lt;/li&gt;
&lt;li&gt;Merge all of that into a coherent response&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;No agent does this today. The friction is too high.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Solution: One MCP Server, Four Tools
&lt;/h2&gt;

&lt;p&gt;I built &lt;a href="https://nycapi.app" rel="noopener noreferrer"&gt;NYC API&lt;/a&gt; — an MCP server that aggregates NYC's public data into four tools that any AI agent can call:&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;resolve_property_identifier&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Normalize any NYC address, BBL, or BIN to a canonical form. This is the critical first step — NYC addresses come in dozens of formats ("123 Main St", "123 MAIN STREET", "123 Main St.", "123 Main Street Apt 4B") and you need a canonical identifier to query anything downstream.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;get_property_intelligence&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Given a resolved identifier, returns ownership records, zoning classification, tax class, assessed values, sales history, liens, and rent stabilization status. One call replaces what used to be 3-4 separate agency lookups.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;get_building_violations&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;DOB, HPD, and OATH/ECB violations with severity scoring and risk indicators. An agent can immediately tell whether a building has serious open violations or a clean record.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;get_restaurant_venue_intel&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;DOHMH health grades, inspection history, violation codes, and permit status. Useful for restaurant discovery agents, food safety research, and mixed-use property analysis.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture Decisions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Why MCP (Model Context Protocol)?
&lt;/h3&gt;

&lt;p&gt;MCP is becoming the standard way AI agents discover and invoke external tools. By implementing MCP rather than just a REST API, any MCP-compatible client — Claude Desktop, LangChain, CrewAI, OpenAI Assistants — can plug in with just a URL and API key. No SDK installation, no wrapper code.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Streamable HTTP Instead of SSE?
&lt;/h3&gt;

&lt;p&gt;The MCP spec supports two transports: SSE (Server-Sent Events) and Streamable HTTP. I went with Streamable HTTP because the server is deployed on Vercel, which is serverless. SSE requires a persistent connection and server-side session state — neither of which works on serverless.&lt;/p&gt;

&lt;p&gt;The implementation is stateless by design:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;POST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;NextRequest&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;authResult&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;validateApiKey&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;PRODUCT&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;error&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="nx"&gt;authResult&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;authResult&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;server&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createServer&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;transport&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;WebStandardStreamableHTTPServerTransport&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;sessionIdGenerator&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;undefined&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// stateless — required for serverless&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;server&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;transport&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;transport&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;handleRequest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt; &lt;span class="nx"&gt;satisfies&lt;/span&gt; &lt;span class="nx"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each request creates a fresh server + transport pair. No cross-invocation state, no session management, no cleanup. It just works on Vercel's serverless functions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Resources in Addition to Tools?
&lt;/h3&gt;

&lt;p&gt;The MCP spec defines both tools (actions the agent can call) and resources (reference data the agent can read). I added five resources:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Capability guide&lt;/strong&gt; — tells the agent what the server can and can't do&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Input formatting&lt;/strong&gt; — explains address formats, BBL structure, BIN ranges&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Schema examples&lt;/strong&gt; — sample responses so the agent knows what to expect&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Coverage notes&lt;/strong&gt; — which boroughs and data sources are available&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Credit policy&lt;/strong&gt; — how credits are consumed per tool call&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These resources help agents make better tool-calling decisions without wasting credits on invalid queries.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Data Pipeline
&lt;/h2&gt;

&lt;p&gt;All data flows through NYC's Socrata Open Data API (SODA). The pipeline:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Inbound query&lt;/strong&gt; — agent sends an address or identifier&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Address normalization&lt;/strong&gt; — shared middleware canonicalizes the input&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parallel SODA queries&lt;/strong&gt; — multiple datasets are queried simultaneously&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Response assembly&lt;/strong&gt; — results are merged, scored, and structured&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Credit deduction&lt;/strong&gt; — usage is tracked in Supabase&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For data sources not available via SODA (ACRIS property sales, rent stabilization status), I use supplementary lookups with appropriate caching.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd Do Differently
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Address normalization is harder than you think.&lt;/strong&gt; NYC addresses have edge cases that will break any naive parser — hyphenated Queens addresses (42-15 Crescent St), lettered avenues (Avenue A vs Ave A), and buildings with multiple valid addresses. I spent more time on the normalizer than any other component.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Start with fewer tools.&lt;/strong&gt; Four tools is manageable, but I could have launched with just &lt;code&gt;resolve_property_identifier&lt;/code&gt; + &lt;code&gt;get_property_intelligence&lt;/code&gt; and validated demand before building the rest.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Credit-based pricing works for agents.&lt;/strong&gt; Agents are bursty — they might make 50 calls in a minute during a due diligence workflow, then nothing for days. Per-credit pricing maps to this usage pattern better than flat monthly rates.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Server URL:&lt;/strong&gt; &lt;code&gt;https://nycapi.app/api/mcp&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auth:&lt;/strong&gt; Bearer token (get a free API key at &lt;a href="https://nycapi.app" rel="noopener noreferrer"&gt;nycapi.app&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Free tier:&lt;/strong&gt; 50 credits, no card required&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Paid tiers:&lt;/strong&gt; Starter ($29/1K credits), Growth ($99/5K), Scale ($249/15K)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're building agents that need to reason about physical locations in NYC — real estate, compliance, tenant advocacy, restaurant discovery — I'd love your feedback on the tool design.&lt;/p&gt;

&lt;p&gt;What data would you want to see added?&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built by &lt;a href="https://nycapi.app" rel="noopener noreferrer"&gt;Matchup Labs&lt;/a&gt;. Stack: Next.js, TypeScript, Vercel, Supabase, Stripe.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>data</category>
      <category>mcp</category>
    </item>
  </channel>
</rss>
