<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: New Way Capital Advisory</title>
    <description>The latest articles on DEV Community by New Way Capital Advisory (@tianninglab).</description>
    <link>https://dev.to/tianninglab</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3840375%2Fb7b1f3b9-d3fb-400e-8de5-d6f1ca7a8d4c.png</url>
      <title>DEV Community: New Way Capital Advisory</title>
      <link>https://dev.to/tianninglab</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/tianninglab"/>
    <language>en</language>
    <item>
      <title>Google indexed 2 of our 9,200 pages. ChatGPT answered questions about 3 of them the same day</title>
      <dc:creator>New Way Capital Advisory</dc:creator>
      <pubDate>Fri, 10 Apr 2026 16:28:18 +0000</pubDate>
      <link>https://dev.to/tianninglab/google-indexed-2-of-our-9200-pages-chatgpt-answered-questions-about-3-of-them-the-same-day-10j2</link>
      <guid>https://dev.to/tianninglab/google-indexed-2-of-our-9200-pages-chatgpt-answered-questions-about-3-of-them-the-same-day-10j2</guid>
      <description>&lt;p&gt;Last week I ran site:&lt;a href="https://nwc-advisory.com" rel="noopener noreferrer"&gt;https://nwc-advisory.com&lt;/a&gt; on Google. It returned 2 results.&lt;/p&gt;

&lt;p&gt;The same day, our nginx logs showed ChatGPT fetching 3 of our pages in real time to answer user questions. One of them was a Burnham-on-Sea postcode. I don't know who asked.&lt;/p&gt;

&lt;p&gt;Here is what 15 hours of crawler traffic looked like across our 12 domains:&lt;/p&gt;

&lt;p&gt;Source          Hits           What it means &lt;/p&gt;

&lt;p&gt;Googlebot       298            Crawling aggressively &lt;br&gt;
Bingbot         217            Crawling aggressively &lt;br&gt;
OAI-SearchBot    20            Building ChatGPT's search index &lt;br&gt;
GPTBot           14            OpenAI training data &lt;br&gt;
ClaudeBot        10            Anthropic crawler &lt;br&gt;
ChatGPT-User      3            Real users getting our data via                    ChatGPT&lt;br&gt;
PerplexityBot     0 &lt;br&gt;
Applebot          0 &lt;/p&gt;

&lt;p&gt;Google crawled us 298 times in 15 hours and has indexed 2 of our 9,152 canonical pages in the six weeks since we launched. OpenAI's three bots together crawled us 37 times and served 3 live answers from our pages the same day. Anthropic's ClaudeBot added another 10. Perplexity and Apple did not show up at all.&lt;/p&gt;

&lt;p&gt;For a small SaaS with no domain authority and few backlinks, these are not the same distribution channel anymore.&lt;/p&gt;

&lt;p&gt;The receipts:&lt;/p&gt;

&lt;p&gt;Three log lines (I have removed the client IPs but kept everything else):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;[10/Apr/2026:08:03:43 +0000] "nwc-advisory.com"&lt;br&gt;
"GET / HTTP/2.0" 200 38145 "-"&lt;br&gt;
"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko);&lt;br&gt;
compatible; ChatGPT-User/1.0; +&lt;a href="https://openai.com/bot" rel="noopener noreferrer"&gt;https://openai.com/bot&lt;/a&gt;"&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;[10/Apr/2026:09:06:21 +0000] "property.nwc-advisory.com"&lt;br&gt;
"GET /prices/sg7-5aa HTTP/2.0" 200 6831 "-"&lt;br&gt;
"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko);&lt;br&gt;
compatible; ChatGPT-User/1.0; +&lt;a href="https://openai.com/bot" rel="noopener noreferrer"&gt;https://openai.com/bot&lt;/a&gt;"&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;[10/Apr/2026:09:46:47 +0000] "property.nwc-advisory.com"&lt;br&gt;
"GET /prices/ta8-1aa HTTP/2.0" 200 6816 "-"&lt;br&gt;
"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko);&lt;br&gt;
compatible; ChatGPT-User/1.0; +&lt;a href="https://openai.com/bot" rel="noopener noreferrer"&gt;https://openai.com/bot&lt;/a&gt;"&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;OpenAI operates three distinct crawlers and our logs show all three in the same window:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GPTBot:training data collection&lt;/li&gt;
&lt;li&gt;OAI-SearchBot:building the search index used inside ChatGPT&lt;/li&gt;
&lt;li&gt;ChatGPT-User:real-time fetches triggered when a user asks ChatGPT a question&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The first two are indexers, like Googlebot or Bingbot. The third is different. A ChatGPT-User hit means someone asked ChatGPT a question, and ChatGPT decided to fetch your page to answer them.&lt;/p&gt;

&lt;p&gt;The three pages our logs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Homepage (&lt;a href="https://property.nwc-advisory.com" rel="noopener noreferrer"&gt;https://property.nwc-advisory.com&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;/prices/sg7-5aa, Hertfordshire. 134 transactions last year, median £348,750.&lt;/li&gt;
&lt;li&gt;/prices/ta8-1aa -- Burnham-on-Sea, Somerset. 223 transactions, median £275,000.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Someone asked ChatGPT about UK house prices in Baldock. Someone else asked about Burnham-on-Sea. &lt;/p&gt;

&lt;p&gt;They got numbers. They never visited our site.&lt;/p&gt;

&lt;p&gt;If you want to see the raw pages ChatGPT fetched, they are here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://property.nwc-advisory.com/prices/sg7-5aa" rel="noopener noreferrer"&gt;https://property.nwc-advisory.com/prices/sg7-5aa&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://property.nwc-advisory.com/prices/ta8-1aa" rel="noopener noreferrer"&gt;https://property.nwc-advisory.com/prices/ta8-1aa&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Why our pages are AI-readable?&lt;/p&gt;

&lt;p&gt;Nothing exotic. Three boring things.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;robots.txt whitelists AI bots&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Most of the "how to block ChatGPT" posts got this backwards. We do the opposite:&lt;/p&gt;

&lt;p&gt;User-agent: GPTBot&lt;br&gt;
Allow: /&lt;/p&gt;

&lt;p&gt;User-agent: ChatGPT-User&lt;br&gt;
Allow: /&lt;/p&gt;

&lt;p&gt;User-agent: OAI-SearchBot&lt;br&gt;
Allow: /&lt;/p&gt;

&lt;p&gt;User-agent: ClaudeBot&lt;br&gt;
Allow: /&lt;/p&gt;

&lt;p&gt;User-agent: &lt;br&gt;
Disallow: /v1/&lt;br&gt;
Disallow: /api/&lt;/p&gt;

&lt;p&gt;The API endpoints are protected. The landing pages are wide open. Block what is monetized, allow what you want distributed.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Server-rendered HTML, no SPA&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Every landing page is a complete HTML document at request time. No React, no Vue, no client-side fetch() for the data. ChatGPT-User does not execute JavaScript -if your data is populated client-side, it is invisible to the fetch. We use a Python script that renders Jinja2 templates into static HTML files at build time. 9,152 pages across 12 domains, all pre-rendered, all around 6KB each.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;IndexNow, not just Google Search Console&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;IndexNow is a free push protocol that notifies Bing, Yandex, and a few others instantly when you publish a URL. Bing's index directly feeds ChatGPT Search, Grok, and Copilot. We submit every new page via IndexNow within seconds of deploying it. Waiting for Googlebot to re-crawl is slower by orders of magnitude, and for a low-authority domain, Google may never index the page at all.&lt;/p&gt;

&lt;p&gt;Here is the Python snippet we use to classify bot traffic when we parse our nginx logs:&lt;/p&gt;

&lt;p&gt;python&lt;br&gt;
AI_BOTS = {&lt;br&gt;
    'ChatGPT-User':   'Live query - user asking ChatGPT now',&lt;br&gt;
    'OAI-SearchBot':  'ChatGPT search index',&lt;br&gt;
    'GPTBot':         'OpenAI training data',&lt;br&gt;
    'ClaudeBot':      'Anthropic crawler',&lt;br&gt;
    'PerplexityBot':  'Perplexity search',&lt;br&gt;
    'Bytespider':     'ByteDance / TikTok',&lt;br&gt;
    'Applebot':       'Apple / Siri',&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;SEARCH_BOTS = {&lt;br&gt;
    'Googlebot':    'Google Search',&lt;br&gt;
    'bingbot':      'Bing Search',&lt;br&gt;
    'DuckDuckBot':  'DuckDuckGo',&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;def classify(user_agent: str) -&amp;gt; str:&lt;br&gt;
    ua = user_agent.lower()&lt;br&gt;
    for bot, purpose in AI_BOTS.items():&lt;br&gt;
        if bot.lower() in ua:&lt;br&gt;
            return f"AI: {purpose}"&lt;br&gt;
    for bot, purpose in SEARCH_BOTS.items():&lt;br&gt;
        if bot.lower() in ua:&lt;br&gt;
            return f"Search: {purpose}"&lt;br&gt;
    return "Human"&lt;/p&gt;

&lt;p&gt;Run that across your access log and you will see AI traffic showing up in categories you do not have dashboards for yet.&lt;/p&gt;

&lt;p&gt;What we built that Google ignored&lt;/p&gt;

&lt;p&gt;Because I want to be precise about this: we did the SEO work. Not a rushed job. Over six weeks we shipped:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;9,152 canonical tags across 12 domains&lt;/li&gt;
&lt;li&gt;hreflang tags linking en to fr on the bilingual property apps&lt;/li&gt;
&lt;li&gt;Sitemaps on all 12 domains, 9,157 URLs total&lt;/li&gt;
&lt;li&gt;IndexNow batch submissions, 9,157 URLs pushed to Bing and Yandex&lt;/li&gt;
&lt;li&gt;Google Search Console verified on every domain, sitemaps submitted&lt;/li&gt;
&lt;li&gt;Disallow: /v1/ and Disallow: /api/ on all property apps so crawlers do not waste their budget on API endpoints&lt;/li&gt;
&lt;li&gt;FAQPage JSON-LD on every landing page for rich snippets&lt;/li&gt;
&lt;li&gt;Open Graph and Twitter Card meta on every domain&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Result after six weeks: 2 URLs indexed on Google.The homepage (nwc-advisory.com/) and one NYC property page (property-nyc.nwc-advisory.com/comps/11225). That is the entire Google index for our brand.&lt;/p&gt;

&lt;p&gt;This is not a Google bug. It is a domain authority floor -- Google rate-limits indexing for new domains without backlinks. The old playbook was: wait three to six months, build backlinks, eventually get indexed. The new playbook is: while you are waiting for Google, OpenAI and Microsoft are already crawling you and serving your data to their users.&lt;/p&gt;

&lt;p&gt;Six takeaways if you are building something data-heavy&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Allow AI bots in robots.txt. Block sensitive paths, not the crawlers themselves.&lt;/li&gt;
&lt;li&gt;Server-render every data page.ChatGPT-Userdoes not execute JavaScript. If your data is in a fetch() call, it is invisible.&lt;/li&gt;
&lt;li&gt;Submit to IndexNow.Bing feeds ChatGPT Search, Grok, and Copilot. Waiting for Google is slower and less likely to work for new domains.&lt;/li&gt;
&lt;li&gt;Write long-tail pages for long-tail queries. ChatGPT's live fetches in our logs were for specific UK postcodes. One page per specific question.&lt;/li&gt;
&lt;li&gt;Parse your access logs weekly. Grep for ChatGPT-User, OAI-SearchBot, ClaudeBot, PerplexityBot. You will see AI distribution working before it shows up in any dashboard.&lt;/li&gt;
&lt;li&gt;Keep doing SEO for Google anyway. Do both. But calibrate your expectations for the first six months - if your domain is new, Google is not a near-term channel.&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>ai</category>
      <category>seo</category>
      <category>webdev</category>
      <category>showdev</category>
    </item>
    <item>
      <title>We Added Rental Yield to a Free Property Comps API — Here's How It Works</title>
      <dc:creator>New Way Capital Advisory</dc:creator>
      <pubDate>Mon, 06 Apr 2026 14:29:18 +0000</pubDate>
      <link>https://dev.to/tianninglab/we-added-rental-yield-to-a-free-property-comps-api-heres-how-it-works-34be</link>
      <guid>https://dev.to/tianninglab/we-added-rental-yield-to-a-free-property-comps-api-heres-how-it-works-34be</guid>
      <description>&lt;p&gt;Last month I shared a &lt;a href="https://api.nwc-advisory.com/docs" rel="noopener noreferrer"&gt;free property comparable sales API&lt;/a&gt; covering 11 markets.&lt;br&gt;
Since then, we shipped two updates worth talking about: a new US market (Washington DC) and rental yield data baked into the stats endpoint.&lt;/p&gt;

&lt;p&gt;The Problem&lt;/p&gt;

&lt;p&gt;If you're building real estate tools, getting comparable sales data is step one. But investors always ask the same follow-up: "What's the yield?"&lt;/p&gt;

&lt;p&gt;Answering that requires a second data source — rental data — and a calculation layer. Most developers either hardcode national averages (inaccurate) or pay for expensive rental APIs (overkill for a yield estimate).&lt;/p&gt;

&lt;p&gt;What We Built&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;/v1/stats&lt;/code&gt; endpoint now returns gross rental yield for all 6 US markets, derived from two federal data sources:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Census ACS 5-Year Survey** (primary) — median rents by ZIP and bedroom count&lt;/li&gt;
&lt;li&gt;HUD Small Area Fair Market Rents (SAFMR) FY2026** (fallback) — published annually per ZIP&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No extra API call. No separate subscription. Yield comes back in the same stats response you're already using.&lt;/p&gt;

&lt;p&gt;Example&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
bash
  curl "https://api.nwc-advisory.com/v1/stats?market=dc&amp;amp;zip_code=20001&amp;amp;months=12" \
    -H "X-RapidAPI-Key: YOUR_KEY"

Response:

  {
    "area": "20001",
    "period": "12 months",
    "stats": {
      "median": 760000,
      "count": 745
    },
    "monthly_rent_estimate": 3167,
    "gross_yield_pct": 5.0,
    "yield_source": "safmr",
    "beds_used_for_yield": 2,
    "median_price_per_sqft": 639,
    "by_type": {
      "Row House": { "median": 913000, "count": 178 },
      "Condo": { "median": 635000, "count": 456 }
    }
  }

One call gives you: median price, transaction count, price per sqft, breakdown by property type — plus monthly rent estimate, gross yield percentage, and the data source.

Bedroom-Specific Yields

Pass min_beds to refine the yield to a specific segment:

  3-bedroom yield in DC
  curl "https://api.nwc-advisory.com/v1/stats?market=dc&amp;amp;zip_code=20001&amp;amp;min_beds=3&amp;amp;months=12"

This does two things:
  1. Filters comparable sales to 3+ bedroom properties
  2. Uses the 3-bedroom rent bracket from SAFMR/Census instead of the default 2-bedroom

The response includes beds_used_for_yield: 3 so you know which bracket was applied.

How the Yield Is Calculated

gross_yield = (monthly_rent × 12) / median_sale_price × 100

  - monthly_rent comes from Census ACS (preferred) or HUD SAFMR (fallback), matched by ZIP + bedroom count
  - median_sale_price comes from actual government-recorded transactions in the search radius
  - This is gross yield — before expenses, vacancy, or taxes

The formula is intentionally simple. Investors apply their own expense ratios on top.

New Market: Washington DC

We added DC as our 12th market (6th US market). The data comes from the DC Office of Tax and Revenue CAMA system:

  - 256,000+ transactions (1979–2026)
  - Full property detail: beds, baths, sqft, year built, neighborhood, ward
  - 86% of records have complete bedroom/bathroom data
  - Updated from DC government's ArcGIS REST API

All 12 Markets

  ┌────────────────┬──────┬──────────────┬────────────────┐
  │     Market     │ Code │ Transactions │ Location Param │
  ├────────────────┼──────┼──────────────┼────────────────┤
  │ United Kingdom │ uk   │ 31M+         │ postcode       │
  ├────────────────┼──────┼──────────────┼────────────────┤
  │ France         │ fr   │ 8.3M+        │ code_postal    │
  ├────────────────┼──────┼──────────────┼────────────────┤
  │ Singapore      │ sg   │ 970K+        │ postal_code    │
  ├────────────────┼──────┼──────────────┼────────────────┤
  │ New York City  │ nyc  │ 500K+        │ zip_code       │
  ├────────────────┼──────┼──────────────┼────────────────┤
  │ Chicago        │ chi  │ 580K+        │ zip_code       │
  ├────────────────┼──────┼──────────────┼────────────────┤
  │ Miami-Dade     │ mia  │ 417K+        │ zip_code       │
  ├────────────────┼──────┼──────────────┼────────────────┤
  │ Philadelphia   │ phl  │ 290K+        │ zip_code       │
  ├────────────────┼──────┼──────────────┼────────────────┤
  │ Connecticut    │ ct   │ 210K+        │ town           │
  ├────────────────┼──────┼──────────────┼────────────────┤
  │ Washington DC  │ dc   │ 256K+        │ zip_code       │
  ├────────────────┼──────┼──────────────┼────────────────┤
  │ Dubai          │ dxb  │ 1.3M+        │ area_name      │
  ├────────────────┼──────┼──────────────┼────────────────┤
  │ Ireland        │ ie   │ 230K+        │ county         │
  ├────────────────┼──────┼──────────────┼────────────────┤
  │ Taiwan         │ tw   │ 150K+        │ city_en        │
  └────────────────┴──────┴──────────────┴────────────────┘

Rental yield is available on the 6 US markets (NYC, CHI, MIA, PHL, CT, DC). Yield data for UK, France, and Dubai is available in PDF valuation reports — API exposure for international markets is on the roadmap. 

Data Sources

Every transaction comes from an official government registry — not MLS, not scraped listings, not estimates. The key difference: these are actual recorded sale prices, not asking prices or algorithmic valuations.

Yield data comes from federal sources (Census Bureau + HUD) that are updated annually and publicly available. We cache and serve them per-ZIP so you don't have to parse XLSX files from government portals.

Try It

  - Free tier: 50 requests/month, no credit card
  - Docs: https://api.nwc-advisory.com/docs
  - RapidAPI: https://rapidapi.com/NWCA/api/property-comparable-sales

We're adding more US markets — if you're building something that needs specific coverage, let us know in the comments.

All data sourced from government registries. Yield estimates are informational — not financial advice.

● Live Apps

What to see the API/data in action ?? — free property search with PDF reports

US locations:

- Chicago: https://property-chi.nwc-advisory.com
- New York City: https://property-nyc.nwc-advisory.com
- Miami-Dade:  https://property-miami.nwc-advisory.com
- Philadelphia: https://property-phl.nwc-advisory.com
- Connecticut:  https://property-ct.nwc-advisory.com
- Washington DC: https://property-dc.nwc-advisory.com

International locations:

- United Kingdom: https://property.nwc-advisory.com
- France: https://property-fr.nwc-advisory.com
- Singapore: https://property-sg.nwc-advisory.com
- Dubai: https://property-dxb.nwc-advisory.com
- Ireland: https://property-ie.nwc-advisory.com
- Taiwan: https://property-tw.nwc-advisory.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>api</category>
      <category>realestate</category>
      <category>webdev</category>
      <category>data</category>
    </item>
    <item>
      <title>We caught ChatGPT answering property questions with our data -- here's the nginx log proof</title>
      <dc:creator>New Way Capital Advisory</dc:creator>
      <pubDate>Mon, 30 Mar 2026 21:43:07 +0000</pubDate>
      <link>https://dev.to/tianninglab/we-caught-chatgpt-answering-property-questions-with-our-data-heres-the-nginx-log-proof-3oo8</link>
      <guid>https://dev.to/tianninglab/we-caught-chatgpt-answering-property-questions-with-our-data-heres-the-nginx-log-proof-3oo8</guid>
      <description>&lt;h2&gt;
  
  
  The moment we noticed
&lt;/h2&gt;

&lt;p&gt;Yesterday, while running our routine nginx log analysis, we spotted something unusual:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight apache"&gt;&lt;code&gt;51.107.70.192 - [30/Mar/2026:19:42:35 +0000] "property.nwc-advisory.com"
&lt;span class="err"&gt;"&lt;/span&gt;GET /prices/sk10-1ae HTTP/2.0" 200 4623 "-"
&lt;span class="err"&gt;"&lt;/span&gt;Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible;
ChatGPT-User/1.0; +https://openai.com/bot"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A &lt;code&gt;ChatGPT-User&lt;/code&gt; bot fetched one of our UK property landing pages. Not GPTBot (the training crawler). Not OAI-SearchBot (the indexer). The &lt;strong&gt;ChatGPT-User&lt;/strong&gt; agent -- the one that fetches pages in real time to answer a user's question.&lt;/p&gt;

&lt;p&gt;Someone asked ChatGPT about house prices near Macclesfield (SK10 postcode), and ChatGPT pulled the answer from our page.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's the difference between OpenAI's bots?
&lt;/h2&gt;

&lt;p&gt;OpenAI operates three distinct crawlers. In the same 2-hour window, we saw all three:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Bot&lt;/th&gt;
&lt;th&gt;User-Agent&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;th&gt;Our logs&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GPTBot&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;GPTBot/1.3&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Training data collection&lt;/td&gt;
&lt;td&gt;Crawling sitemaps, homepages&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OAI-SearchBot&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;OAI-SearchBot/1.3&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Search index building&lt;/td&gt;
&lt;td&gt;Crawling UK landing pages (SK postcodes), robots.txt across all 11 domains&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ChatGPT-User&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ChatGPT-User/1.0&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Live query answering&lt;/td&gt;
&lt;td&gt;Fetched &lt;code&gt;/prices/sk10-1ae&lt;/code&gt; for a real user&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The ChatGPT-User hit is the interesting one. It means our data is being &lt;strong&gt;served to end users in real time&lt;/strong&gt; through ChatGPT's search feature. The user never visits our site -- they get the answer inside their chat.&lt;/p&gt;

&lt;h2&gt;
  
  
  The numbers
&lt;/h2&gt;

&lt;p&gt;In a 2-hour window on March 30, 2026:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;OpenAI bots:  22 requests (GPTBot + OAI-SearchBot + ChatGPT-User)
Googlebot:     4 requests
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;OpenAI is crawling our pages &lt;strong&gt;5x more frequently&lt;/strong&gt; than Google.&lt;/p&gt;

&lt;p&gt;Here's what OAI-SearchBot was systematically crawling:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;property.nwc-advisory.com       GET /prices/sk1-1aa
property.nwc-advisory.com       GET /prices/sk10-1ae
property.nwc-advisory.com       GET /prices/sk11-0aa
property.nwc-advisory.com       GET /prices/sk12-1aa
property.nwc-advisory.com       GET /prices/sk13-0aa
property.nwc-advisory.com       GET /prices/sk14-1aa
property.nwc-advisory.com       GET /sitemap.xml
property.nwc-advisory.com       GET /robots.txt
property-chi.nwc-advisory.com   GET /robots.txt
property-dxb.nwc-advisory.com   GET /robots.txt
property-ie.nwc-advisory.com    GET /robots.txt
property-miami.nwc-advisory.com GET /robots.txt
property-phl.nwc-advisory.com   GET /robots.txt
property-sg.nwc-advisory.com    GET /robots.txt
property-tw.nwc-advisory.com    GET /robots.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It's reading the sitemaps, then crawling individual landing pages. Systematically.&lt;/p&gt;

&lt;h2&gt;
  
  
  What we built (and why AI can read it)
&lt;/h2&gt;

&lt;p&gt;We operate &lt;a href="https://property.nwc-advisory.com" rel="noopener noreferrer"&gt;property comparable sales apps&lt;/a&gt; across 11 markets (UK, France, Singapore, NYC, Chicago, Miami, Philadelphia, Connecticut, Dubai, Ireland, Taiwan), backed by 35M+ government-source transactions.&lt;/p&gt;

&lt;p&gt;For SEO, we generated &lt;strong&gt;9,100+ static landing pages&lt;/strong&gt; -- one per postcode/ZIP/area:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/prices/sw1a-1aa    → London Westminster
/prices/sk10-1ae    → Macclesfield
/prix/75001         → Paris 1er
/comps/10001        → Midtown Manhattan
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each page contains:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Median price, average price, price range&lt;/li&gt;
&lt;li&gt;Price per sqft/m2 statistics&lt;/li&gt;
&lt;li&gt;Recent comparable sales with addresses&lt;/li&gt;
&lt;li&gt;Stamp duty / notary fee calculators&lt;/li&gt;
&lt;li&gt;FAQ schema (JSON-LD)&lt;/li&gt;
&lt;li&gt;Nearby area links&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key design decision: &lt;strong&gt;no login walls, no JavaScript-rendered content, no gated data&lt;/strong&gt;. Every landing page is server-rendered HTML with structured data that any crawler can parse.&lt;/p&gt;

&lt;h3&gt;
  
  
  The page structure that AI loves
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="c"&gt;&amp;lt;!-- Clean semantic HTML --&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;h1&amp;gt;&lt;/span&gt;Property Prices in SK10 1AE, Macclesfield&lt;span class="nt"&gt;&amp;lt;/h1&amp;gt;&lt;/span&gt;

&lt;span class="nt"&gt;&amp;lt;div&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"stats-grid"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;div&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"stat"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;span&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"label"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;Median Price&lt;span class="nt"&gt;&amp;lt;/span&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;span&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"value"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;£285,000&lt;span class="nt"&gt;&amp;lt;/span&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;/div&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;div&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"stat"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;span&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"label"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;Price per sqft&lt;span class="nt"&gt;&amp;lt;/span&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;span&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"value"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;£198&lt;span class="nt"&gt;&amp;lt;/span&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;/div&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/div&amp;gt;&lt;/span&gt;

&lt;span class="c"&gt;&amp;lt;!-- JSON-LD structured data --&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;script &lt;/span&gt;&lt;span class="na"&gt;type=&lt;/span&gt;&lt;span class="s"&gt;"application/ld+json"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@context&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://schema.org&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@type&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;WebApplication&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;name&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Property Comparable Sales - SK10&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;description&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Recent property sales near SK10 1AE, Macclesfield&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;url&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://property.nwc-advisory.com/prices/sk10-1ae&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/script&amp;gt;&lt;/span&gt;

&lt;span class="c"&gt;&amp;lt;!-- FAQ schema for rich snippets --&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;script &lt;/span&gt;&lt;span class="na"&gt;type=&lt;/span&gt;&lt;span class="s"&gt;"application/ld+json"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@type&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;FAQPage&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;mainEntity&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@type&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Question&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;name&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;What is the average house price in SK10?&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;acceptedAnswer&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@type&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Answer&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;text&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;The average house price in SK10 is £312,450...&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/script&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The SEO stack
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;nginx (SSL, bot-block with search engine whitelist)
  │
  ├── /prices/{postcode}  →  Static HTML (2,308 UK pages)
  ├── /prix/{code_postal} →  Static HTML (5,851 FR pages)
  ├── /comps/{zip}        →  Static HTML (per US market)
  │
  ├── sitemap.xml          →  All page URLs
  ├── robots.txt           →  Allow everything except /v1/ and /api/
  └── IndexNow key         →  Instant Bing/Yandex notification
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key technical decisions:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. robots.txt allows AI bots&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="py"&gt;User-agent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;GPTBot&lt;/span&gt;
&lt;span class="py"&gt;Allow&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/&lt;/span&gt;

&lt;span class="py"&gt;User-agent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ChatGPT-User&lt;/span&gt;
&lt;span class="py"&gt;Allow&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/&lt;/span&gt;

&lt;span class="py"&gt;User-agent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;*&lt;/span&gt;
&lt;span class="py"&gt;Disallow&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/v1/&lt;/span&gt;
&lt;span class="py"&gt;Disallow&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/api/&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Many sites block GPTBot. We deliberately allow it. The API endpoints are protected (disallowed), but the landing pages are open.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Server-rendered, not SPA&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Each landing page is a complete HTML document. No client-side rendering, no JavaScript required to see the data. ChatGPT-User fetches the HTML and can immediately extract the statistics.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Canonical URLs and sitemaps&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every page has &lt;code&gt;&amp;lt;link rel="canonical"&amp;gt;&lt;/code&gt; and is listed in the sitemap. This tells crawlers exactly which pages exist and which URL is authoritative.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Structured data (JSON-LD)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;FAQPage schema means the questions and answers are machine-readable. When ChatGPT needs "What is the average house price in SK10?", it can extract the answer directly from the structured data.&lt;/p&gt;

&lt;h2&gt;
  
  
  How we detect AI bot traffic
&lt;/h2&gt;

&lt;p&gt;Our visitor analysis pipeline (Python, runs against nginx logs) classifies every request:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Bot classification
&lt;/span&gt;&lt;span class="n"&gt;AI_BOTS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ChatGPT-User&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;   &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Live query - someone asked ChatGPT&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;GPTBot&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;         &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Training data collection&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;OAI-SearchBot&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Search index building&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ClaudeBot&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;      &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Anthropic training&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;PerplexityBot&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Perplexity search&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Bytespider&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;     &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;TikTok/ByteDance&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;SEARCH_BOTS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Googlebot&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;      &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Google Search indexing&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;bingbot&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Bing Search indexing&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Applebot&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;       &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Apple/Siri search&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;DuckDuckBot&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;DuckDuckGo indexing&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;classify_request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_agent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;ua_lower&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;user_agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;bot&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;purpose&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;AI_BOTS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;bot&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;ua_lower&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AI Bot: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;purpose&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;bot&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;purpose&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;SEARCH_BOTS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;bot&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;ua_lower&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Search: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;purpose&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Human visitor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We run this daily against our nginx logs. The trend over the past few weeks: AI bot traffic is growing faster than search bot traffic.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this means for developers
&lt;/h2&gt;

&lt;p&gt;If you're building a data-heavy application, the old playbook was:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Build app → SEO → Rank on Google → Users find you → They visit
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The new playbook is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Build app → Structured data → AI crawls you → Users get your data via ChatGPT
                                              (they may never visit your site)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is both exciting and unsettling. Exciting because your data reaches users through a completely new channel. Unsettling because those users never see your UI, your brand, or your conversion funnel.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpe21nbenfwpqk3gxzf9e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpe21nbenfwpqk3gxzf9e.png" alt=" "&gt;&lt;/a&gt;Practical takeaways&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you want AI to use your data:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Don't block AI bots&lt;/strong&gt; in robots.txt (unless you have a reason to)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Server-render your pages&lt;/strong&gt; -- ChatGPT-User can't execute JavaScript&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use structured data&lt;/strong&gt; (JSON-LD, schema.org) -- makes extraction trivial&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generate landing pages for long-tail queries&lt;/strong&gt; -- AI searches the same way humans do&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keep data ungated&lt;/strong&gt; -- if it's behind a login, AI can't reach it&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Maintain sitemaps&lt;/strong&gt; -- AI bots follow them just like Googlebot&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;If you want to track it:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Parse your nginx/access logs for &lt;code&gt;ChatGPT-User&lt;/code&gt;, &lt;code&gt;GPTBot&lt;/code&gt;, &lt;code&gt;OAI-SearchBot&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ChatGPT-User&lt;/code&gt; = your data is being served to real users right now&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;OAI-SearchBot&lt;/code&gt; = your pages are being indexed for future queries&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;GPTBot&lt;/code&gt; = your content may be used for model training&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The broader picture
&lt;/h2&gt;

&lt;p&gt;Google is still the primary search engine. But in our 2-hour sample, OpenAI's crawlers outnumbered Googlebot 5:1. And while Google has been crawling our 12 domains for weeks without indexing most pages (domain authority problem), ChatGPT is already serving our data to users.&lt;/p&gt;

&lt;p&gt;For a small startup with no domain authority, no backlinks, and a brand-new domain, this is significant. The traditional SEO path (build authority -&amp;gt; get indexed -&amp;gt; rank -&amp;gt; get traffic) takes months to years. The AI search path (structured data -&amp;gt; get crawled -&amp;gt; get cited) can happen in weeks.&lt;/p&gt;

&lt;p&gt;We're not saying Google doesn't matter. We're saying there's a new, parallel distribution channel, and it favors clean data over domain authority.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Our stack:&lt;/strong&gt; Python (FastAPI) + SQLite + nginx + static HTML generators. 11 property markets, 35M+ transactions from government open data sources. Free to search at &lt;a href="https://property.nwc-advisory.com" rel="noopener noreferrer"&gt;property.nwc-advisory.com&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The API is also available on &lt;a href="https://rapidapi.com/nwcadvisory/api/property-comparable-sales" rel="noopener noreferrer"&gt;RapidAPI&lt;/a&gt; and as an &lt;a href="https://github.com/Tianning-lab/property-comps-mcp-server" rel="noopener noreferrer"&gt;MCP server&lt;/a&gt; for AI agents.&lt;/p&gt;




&lt;p&gt;What AI bot traffic are you seeing in your logs? Have you caught ChatGPT-User fetching your pages? Drop a comment -- I'm curious if this is a broader trend or we're just early.*&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>ai</category>
      <category>devops</category>
      <category>api</category>
    </item>
    <item>
      <title>I Built a Free Property Comparable Sales API Covering 11 Global Markets (44M+ Transactions)</title>
      <dc:creator>New Way Capital Advisory</dc:creator>
      <pubDate>Sat, 28 Mar 2026 15:15:49 +0000</pubDate>
      <link>https://dev.to/tianninglab/i-built-a-free-property-comparable-sales-api-covering-11-global-markets-44m-transactions-546</link>
      <guid>https://dev.to/tianninglab/i-built-a-free-property-comparable-sales-api-covering-11-global-markets-44m-transactions-546</guid>
      <description>&lt;p&gt;Every property valuation starts with the same question: what did similar properties sell for nearby? The data exists — government&lt;br&gt;
land registries publish it — but it is scattered across different formats, languages, and access methods. UK Land Registry uses CSV&lt;br&gt;
dumps. France publishes DVF files. Singapore has a government portal. New York City releases rolling Excel files. None of them&lt;br&gt;
talk to each other.&lt;/p&gt;

&lt;p&gt;I built a unified API that normalises all of it into a single interface. One endpoint, one schema, eleven markets.&lt;/p&gt;

&lt;p&gt;What It Covers&lt;br&gt;
Market               Source         Transactions   Location Param &lt;br&gt;
UK (England &amp;amp; Wales) HM Land Registry   31M+       postcode&lt;br&gt;&lt;br&gt;
France               DVF                8.3M+      code_postal&lt;br&gt;&lt;br&gt;
Singapore            HDB Resale Data    973K+      postal_code&lt;br&gt;&lt;br&gt;
New York City        Dept of Finance    51K+       zip_code       Chicago              Cook County        480K+      zip_code&lt;br&gt;&lt;br&gt;
Dubai                DLD Transactions   1.3M+      area_name&lt;br&gt;&lt;br&gt;
Miami                Miami-Dade County  190K+      zip_code&lt;br&gt;&lt;br&gt;
Philadelphia         OPA Sales Data     350K+      zip_code&lt;br&gt;&lt;br&gt;
Connecticut          Town Clerk Records 280K+      town&lt;br&gt;&lt;br&gt;
Ireland              Property Price Reg 450K+      county&lt;br&gt;&lt;br&gt;
Taiwan               Ministry of Int    180K+      city_en        &lt;/p&gt;

&lt;p&gt;All data is government-sourced. No scraped listings. No estimates. Actual recorded sale prices.&lt;/p&gt;

&lt;p&gt;Three Endpoints&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Comparable Sales&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;GET /v1/comps?market=nyc&amp;amp;zip_code=10001&amp;amp;months=12&amp;amp;limit=20&lt;/p&gt;

&lt;p&gt;Returns recent sales near a location: address, sale price,                   date, property type, size, price per square foot/metre. Filter by&lt;br&gt;
  property type, bedroom count, and date range.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Area Statistics&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;GET /v1/stats?market=uk&amp;amp;postcode=SW1A1DA&lt;/p&gt;

&lt;p&gt;Median price, price per area unit, transaction volume, price trends, property type breakdown. The numbers you need to understand a micro-market at a glance.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Monthly Trends&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;GET /v1/trends?market=fr&amp;amp;code_postal=75001&amp;amp;months=24&lt;/p&gt;

&lt;p&gt;Month-by-month median prices and volumes. Useful for spotting whether a market is heating up or cooling down.&lt;/p&gt;

&lt;p&gt;How It Works&lt;/p&gt;

&lt;p&gt;Each market has its own backend API (FastAPI + SQLite) running on a dedicated port. A unified gateway at api.nwc-advisory.com routes requests by the market parameter to the correct backend, normalises the response schema, and returns consistent JSON regardless of which country you are querying.&lt;/p&gt;

&lt;p&gt;Client request&lt;br&gt;
      → api.nwc-advisory.com (gateway, port 8080)&lt;br&gt;
          → market backend (ports 8060-8070)&lt;br&gt;
              → SQLite database (government data, indexed)&lt;br&gt;
          ← normalised JSON response&lt;br&gt;
      ← consistent schema&lt;/p&gt;

&lt;p&gt;The SQLite databases are built from government bulk downloads. Each has compound indexes on location + property type + date for&lt;br&gt;
fast radius searches. The densest queries (central London postcodes covering thousands of transactions) return in under 5 seconds.&lt;/p&gt;

&lt;p&gt;The Hardest Parts&lt;/p&gt;

&lt;p&gt;Address normalisation across 11 countries. UK postcodes, French postal codes, Singapore 6-digit codes, US ZIP codes, Dubai area&lt;br&gt;
names, Irish counties, Taiwanese districts in both English and Traditional Chinese. Each market has its own geocoding logic and&lt;br&gt;
search radius behaviour.&lt;/p&gt;

&lt;p&gt;Currency and unit consistency. Price per square foot in the US, price per square metre in Europe and Asia. GBP, EUR, USD, SGD, AED,NTD. The API returns values in each market's native currency and unit — no forced conversions that would confuse local users.&lt;/p&gt;

&lt;p&gt;Data freshness. Each market publishes on a different schedule. UK Land Registry is monthly with a 6-week delay. France DVF is semi-annual. Singapore HDB is monthly with minimal delay. The update pipeline checks each source, downloads new data, ingests it into SQLite, deploys to the server, and verifies the health endpoint — all scripted.&lt;/p&gt;

&lt;p&gt;Free Tier&lt;/p&gt;

&lt;p&gt;The API is available on RapidAPI with a free tier:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Free: 50 requests/month, 10 results per search, 3 markets (UK, NYC, Singapore)&lt;/li&gt;
&lt;li&gt;Pro ($29/mo): 1,000 requests, 100 results, all 11 markets&lt;/li&gt;
&lt;li&gt;Ultra ($99/mo): 10,000 requests, 500 results, all 11 markets&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Interactive docs at api.nwc-advisory.com/docs (Swagger UI). You can test queries directly in the browser.&lt;/p&gt;

&lt;p&gt;MCP Server (For AI Agents)&lt;/p&gt;

&lt;p&gt;If you are building with Claude, GPT, or any LLM agent framework, there is an MCP server that wraps the API into three tools:&lt;br&gt;
search_property_comps, get_area_stats, and list_markets. The repo is open source on GitHub: Tianning-lab/property-comps-mcp-server.&lt;/p&gt;

&lt;p&gt;This means an AI agent can answer questions like "What did 3-bedroom flats sell for near Zurich Bahnhofstrasse in the last year?" by calling the tool directly — no prompt engineering around raw API calls needed.&lt;/p&gt;

&lt;p&gt;What I Would Do Differently&lt;/p&gt;

&lt;p&gt;Start with fewer markets. I launched UK and France first, then added nine more. Each new market took 2-3 days of data pipeline work plus frontend localisation (French, Traditional Chinese, Spanish for NYC). If I did it again, I would ship three markets and validate demand before building the rest.&lt;/p&gt;

&lt;p&gt;Billing from day one. The free property apps get traffic but almost nobody signs up for paid tiers. Building the RapidAPI listing earlier and focusing on API-first distribution (developers, agents, fintech integrations) would have been a better monetisation path than consumer web apps.&lt;/p&gt;

&lt;p&gt;Stack&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Backend: Python 3.13, FastAPI, SQLite (one DB per market)&lt;/li&gt;
&lt;li&gt;Gateway: FastAPI proxy with OpenAPI 3.1 spec&lt;/li&gt;
&lt;li&gt;Frontend: Static HTML/CSS/JS (11 localised versions)&lt;/li&gt;
&lt;li&gt;Server: Single Ubuntu box, nginx, Let's Encrypt&lt;/li&gt;
&lt;li&gt;Data pipeline: Python ingest scripts per market, cron-scheduled checks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No cloud services, no managed databases, no Kubernetes. One server handles all 11 markets comfortably.&lt;/p&gt;

&lt;p&gt;The API docs are at api.nwc-advisory.com/docs. The MCP server is at github.com/Tianning-lab/property-comps-mcp-server. If you are building anything in proptech or real estate analytics, I would like to hear what data you wish existed.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fljfoijf7dfjj1j60zwjb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fljfoijf7dfjj1j60zwjb.png" alt=" " width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>api</category>
      <category>realestate</category>
      <category>python</category>
      <category>showdev</category>
    </item>
    <item>
      <title>How I made a 31M-row SQLite query go from 168 seconds to 5 seconds</title>
      <dc:creator>New Way Capital Advisory</dc:creator>
      <pubDate>Tue, 24 Mar 2026 14:24:20 +0000</pubDate>
      <link>https://dev.to/tianninglab/how-i-made-a-31m-row-sqlite-query-go-from-168-seconds-to-5-seconds-h6m</link>
      <guid>https://dev.to/tianninglab/how-i-made-a-31m-row-sqlite-query-go-from-168-seconds-to-5-seconds-h6m</guid>
      <description>&lt;p&gt;My property comparison app was working great in development. Then a user in Norwich searched for nearby sales and my API took &lt;strong&gt;168 seconds&lt;/strong&gt; to respond. nginx gave up at 60 seconds and returned a 504.&lt;/p&gt;

&lt;p&gt;Here's how I fixed it -- and what I learned about SQLite performance at scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Setup
&lt;/h2&gt;

&lt;p&gt;I built a property comparables API that serves recent sales data from government open datasets. The UK version uses &lt;a href="https://www.gov.uk/government/statistical-data-sets/price-paid-data-downloads" rel="noopener noreferrer"&gt;HM Land Registry Price Paid Data&lt;/a&gt; -- every residential property transaction in England and Wales since 1995.&lt;/p&gt;

&lt;p&gt;That's &lt;strong&gt;31 million rows&lt;/strong&gt; in a single SQLite database. 8.7GB on disk.&lt;/p&gt;

&lt;p&gt;The API is simple: give it a postcode, it finds nearby sales within a radius, filtered by property type and date range. The stack is Python + FastAPI + SQLite, running on a single $20/month VPS.&lt;/p&gt;

&lt;p&gt;For most postcodes, it was fine. London postcodes with a few hundred results came back in under a second. But Norwich postcode NR1 has &lt;strong&gt;1,941 postcodes in its search radius&lt;/strong&gt; -- and that's where everything broke.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Query
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;transactions&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;postcode&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;-- up to 1,941 postcodes&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;property_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Nothing exotic. But SQLite was doing a full table scan on 31M rows for each query.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Wrong Fix I Tried First
&lt;/h2&gt;

&lt;p&gt;My instinct was to add an index on &lt;code&gt;postcode&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_postcode&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;transactions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;postcode&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This helped -- queries dropped to about 40 seconds. Still unusable. The problem is that after finding rows by postcode, SQLite still had to scan all matching rows to filter by &lt;code&gt;property_type&lt;/code&gt; and &lt;code&gt;date&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Actual Fix: Compound Indexes
&lt;/h2&gt;

&lt;p&gt;The key insight is that SQLite can only use &lt;strong&gt;one index per table per query&lt;/strong&gt;. If your WHERE clause filters on three columns, a single-column index only helps with the first filter. The rest are scanned.&lt;/p&gt;

&lt;p&gt;The fix is a compound index that matches your query pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_comp&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;transactions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;postcode&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;property_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The column order matters -- it must match the query's filter order. SQLite walks the B-tree left to right: first it narrows by postcode, then by property type, then by date range. All three filters are served by the same index lookup.&lt;/p&gt;

&lt;p&gt;After creating the index:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ANALYZE&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;-- update the query planner's statistics&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result: NR1 went from 168 seconds to 4.8 seconds.&lt;/strong&gt; Every other postcode dropped to under 1 second.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Same Fix, Different Country
&lt;/h2&gt;

&lt;p&gt;I had the same problem with the France version -- 8.3 million transactions from the &lt;a href="https://www.data.gouv.fr/fr/datasets/demandes-de-valeurs-foncieres-geolocalisees/" rel="noopener noreferrer"&gt;DVF open dataset&lt;/a&gt;. Same fix:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_comp&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;transactions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code_postal&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;type_local&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;date_mutation&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;ANALYZE&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All major French cities went to under 100ms.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Else I Learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. &lt;code&gt;ANALYZE&lt;/code&gt; is not optional.&lt;/strong&gt; Without it, SQLite's query planner doesn't know about the index distribution and may choose a suboptimal plan. I spent an hour wondering why my new index wasn't helping before realizing I'd skipped this.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. nginx timeout matters.&lt;/strong&gt; Default &lt;code&gt;proxy_read_timeout&lt;/code&gt; is 60 seconds. Dense queries that take 5-10 seconds are fine, but if you ever regress, nginx will kill the connection before the user sees an error. I bumped it to 300 seconds as a safety net:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nginx"&gt;&lt;code&gt;&lt;span class="k"&gt;location&lt;/span&gt; &lt;span class="n"&gt;/v1/&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kn"&gt;proxy_read_timeout&lt;/span&gt; &lt;span class="s"&gt;300s&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kn"&gt;proxy_pass&lt;/span&gt; &lt;span class="s"&gt;http://127.0.0.1:8060&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;3. SQLite handles 31M rows better than you'd think.&lt;/strong&gt; There's a bias in the industry that SQLite is "for small stuff." With proper indexing, it serves production traffic on a single VPS without connection pooling, replication, or any of the Postgres/MySQL infrastructure. My entire hosting cost is $20/month for 11 country databases.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Test with your densest data.&lt;/strong&gt; I had been testing with London postcodes (small radius, few matches). The bug only appeared in Norwich (huge radius, 1,941 postcodes). Now I always test with the worst case after any database or query change.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Numbers
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;NR1 (worst case)&lt;/td&gt;
&lt;td&gt;168s&lt;/td&gt;
&lt;td&gt;4.8s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SW1A (London)&lt;/td&gt;
&lt;td&gt;12s&lt;/td&gt;
&lt;td&gt;0.3s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NR1 via HTTPS&lt;/td&gt;
&lt;td&gt;504 timeout&lt;/td&gt;
&lt;td&gt;5.1s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Index size overhead&lt;/td&gt;
&lt;td&gt;--&lt;/td&gt;
&lt;td&gt;+1.2GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Total DB size&lt;/td&gt;
&lt;td&gt;8.7GB&lt;/td&gt;
&lt;td&gt;9.9GB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The 1.2GB index overhead is a good trade for 30x query speedup.&lt;/p&gt;

&lt;h2&gt;
  
  
  Live Demo
&lt;/h2&gt;

&lt;p&gt;The API serves comparable sales across 11 countries, all using the same SQLite + compound index pattern. If you want to see the result: &lt;a href="https://property.nwc-advisory.com" rel="noopener noreferrer"&gt;property.nwc-advisory.com&lt;/a&gt; for UK data.&lt;/p&gt;

&lt;p&gt;The API is also available on &lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy5l4qjexr46xos044er4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy5l4qjexr46xos044er4.png" alt=" " width="709" height="640"&gt;&lt;/a&gt;RapidAPI(&lt;a href="https://rapidapi.com" rel="noopener noreferrer"&gt;https://rapidapi.com&lt;/a&gt;) if you want to integrate property comps into your own tools, and there's an &lt;a href="https://github.com/Tianning-lab/property-comps-mcp-server" rel="noopener noreferrer"&gt;open-source MCP server&lt;/a&gt; if you use Claude or other AI agents.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Have you hit similar SQLite performance walls? I'm curious what scale other people are running SQLite at in production.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>showdev</category>
      <category>python</category>
      <category>database</category>
      <category>webdev</category>
    </item>
    <item>
      <title>I built a free Morningstar X-Ray replacement for European/US ETF investors</title>
      <dc:creator>New Way Capital Advisory</dc:creator>
      <pubDate>Mon, 23 Mar 2026 15:33:22 +0000</pubDate>
      <link>https://dev.to/tianninglab/i-built-a-free-morningstar-x-ray-replacement-for-europeanus-etf-investors-5f52</link>
      <guid>https://dev.to/tianninglab/i-built-a-free-morningstar-x-ray-replacement-for-europeanus-etf-investors-5f52</guid>
      <description>&lt;h2&gt;
  
  
  tags: fintech, python, api, opensource
&lt;/h2&gt;

&lt;p&gt;When Morningstar paywalled their Portfolio X-Ray tool in 2025 ($249/year), European investors got hit the hardest. Most alternatives are US-ticker-only. If you hold iShares, Vanguard, Xtrackers, or Synchrony funds with ISINs, your options were basically zero.&lt;/p&gt;

&lt;p&gt;So I built a free replacement: &lt;a href="https://nwc-advisory.com/xray" rel="noopener noreferrer"&gt;nwc-advisory.com/xray&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What it does
&lt;/h2&gt;

&lt;p&gt;Enter your fund ISINs and amounts (no login, no account). It fetches public factsheet data, decomposes each fund into underlying holdings, and shows 6 analysis views:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Asset Allocation&lt;/strong&gt; -- before and after look-through&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Geographic Exposure&lt;/strong&gt; -- economic exposure, not fund domicile (your IE-domiciled S&amp;amp;P 500 ETF shows as US, not Ireland)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sector Breakdown&lt;/strong&gt; -- GICS sectors across your entire portfolio&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stock Overlap&lt;/strong&gt; -- stocks appearing in 2+ funds, with fund similarity percentages&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fee Analysis&lt;/strong&gt; -- weighted TER vs cheapest passive alternative, with annual savings in your currency&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Top Holdings&lt;/strong&gt; -- your 30 largest underlying positions&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Tech stack
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Backend&lt;/strong&gt;: Python/FastAPI&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data&lt;/strong&gt;: 13 async factsheet fetchers (iShares CSV exports, JustETF, FT.com, Gerifonds PDF parsing, and 9 more)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cache&lt;/strong&gt;: 24h disk-backed JSON + in-memory LRU (500 entries). Popular funds return in under 100ms, cold starts take 15-30s&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PDF&lt;/strong&gt;: ReportLab for 6-page PDF reports (email-gated)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Frontend&lt;/strong&gt;: Vanilla HTML/CSS/JS, no framework&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hosting&lt;/strong&gt;: Single Ubuntu server, systemd service, nginx reverse proxy&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The interesting technical bits
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Name normalization&lt;/strong&gt; was harder than expected. The same company appears as "Nestle SA", "Nestle AG", "NESTLE S.A.", and "Nestlé" across different data sources. I strip accents, remove corporate suffixes (AG, SA, Ltd, Inc, Corp, PLC, Group, Holdings), and normalize whitespace before deduplicating.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fund-pair overlap&lt;/strong&gt; uses a Jaccard-like calculation on the intersection of underlying holdings, weighted by position size. This tells you "your MSCI World and S&amp;amp;P 500 ETFs are 78% identical" -- meaning you're paying two sets of fees for nearly the same exposure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ISIN-based routing&lt;/strong&gt; determines which fetchers to try first. CH-prefix ISINs go to Swiss sources (Gerifonds, SwissFundData), IE/LU go to JustETF and iShares, and everything falls back through 9 legacy fetchers.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I couldn't replicate
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Morningstar's style box (3x3 value/growth/blend grid) -- proprietary classification&lt;/li&gt;
&lt;li&gt;P/E and P/B ratios -- would need a real-time data feed&lt;/li&gt;
&lt;li&gt;Performance tracking -- out of scope for a free tool&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Also: Property Comps API
&lt;/h2&gt;

&lt;p&gt;Separately, I built an API serving 4.2M+ government-recorded property transactions across 11 markets (UK, France, Dubai, Singapore, NYC, and 6 more). If you're building anything in proptech:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Swagger docs: &lt;a href="https://api.nwc-advisory.com/docs" rel="noopener noreferrer"&gt;api.nwc-advisory.com/docs&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Free tier available&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Happy to answer questions about the architecture or data sources.&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>python</category>
      <category>showdev</category>
      <category>sideprojects</category>
    </item>
  </channel>
</rss>
