<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Zee</title>
    <description>The latest articles on DEV Community by Zee (@zee_builds).</description>
    <link>https://dev.to/zee_builds</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3875644%2F87032ac1-a71e-4e82-9eb9-db42127d93d2.png</url>
      <title>DEV Community: Zee</title>
      <link>https://dev.to/zee_builds</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/zee_builds"/>
    <language>en</language>
    <item>
      <title>The One Lesson I Learned Building a Web Extraction API in 2026</title>
      <dc:creator>Zee</dc:creator>
      <pubDate>Fri, 08 May 2026 04:16:13 +0000</pubDate>
      <link>https://dev.to/zee_builds/the-one-lesson-i-learned-building-a-web-extraction-api-in-2026-44f5</link>
      <guid>https://dev.to/zee_builds/the-one-lesson-i-learned-building-a-web-extraction-api-in-2026-44f5</guid>
      <description>&lt;p&gt;I spent the last few months building a web extraction API. Here's what surprised me most: &lt;strong&gt;developers don't need another scraper. They need extraction that stops breaking.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every web scraping thread I read has the same arc:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Write a BeautifulSoup/Scrapy scraper&lt;/li&gt;
&lt;li&gt;It works for two weeks&lt;/li&gt;
&lt;li&gt;The target site changes one div&lt;/li&gt;
&lt;li&gt;Scraper breaks at 2am&lt;/li&gt;
&lt;li&gt;Dev swears, rewrites selectors&lt;/li&gt;
&lt;li&gt;Repeat&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The alternative everyone reaches for next: "I'll use Playwright. No, I'll use Puppeteer. No, a headless browser with proxy rotation. No..."&lt;/p&gt;

&lt;p&gt;But here's the thing most people miss: &lt;strong&gt;the problem isn't fetching. It's parsing.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The extraction-first approach
&lt;/h3&gt;

&lt;p&gt;At Haunt API (which I built), we flipped the model. Instead of fetch-then-parse, the user describes what they want in plain English: "Extract product name, price, and stock status from this page."&lt;/p&gt;

&lt;p&gt;The AI reads the page like a human would — it understands context, not CSS selectors. When the site changes layout next week, the extraction still works because the prompt targets meaning, not markup.&lt;/p&gt;

&lt;h3&gt;
  
  
  What matters in 2026
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cloudflare bypass&lt;/strong&gt; is table stakes now. If your extraction service can't handle Cloudflare-protected sites, it's a hobby project.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Structured JSON output&lt;/strong&gt; matters more than markdown. LLMs consume JSON; humans debug with it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Failed extractions shouldn't cost anything.&lt;/strong&gt; You shouldn't pay for "the page loaded but I couldn't find what you asked for."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Natural language prompts &amp;gt; CSS selectors.&lt;/strong&gt; Site maintainers change divs. They don't change meaning.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  A practical example
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://hauntapi.com/v1/extract&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;X-API-Key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your_key_here&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://books.toscrape.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Extract all book titles and their prices&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="c1"&gt;# =&amp;gt; [{"title": "A Light in the Attic", "price": "£51.77"}, ...]
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's three lines. No selectors. No Playwright. No parsing.&lt;/p&gt;

&lt;h3&gt;
  
  
  The real lesson
&lt;/h3&gt;

&lt;p&gt;Building the tool taught me that the web extraction market in 2026 is consolidating around two poles: &lt;strong&gt;platforms&lt;/strong&gt; (Apify, with thousands of pre-built scrapers and scheduling) and &lt;strong&gt;extraction APIs&lt;/strong&gt; (tools that focus on making one extraction call reliable).&lt;/p&gt;

&lt;p&gt;If you're building a product that needs web data, pick the right pole. If you need one-off reliable extraction of specific data points, an extraction-first API will save you more time than another headless browser setup.&lt;/p&gt;

&lt;p&gt;Disclosure: I built Haunt API. Free tier is 100 requests/month if you want to try it: &lt;a href="https://hauntapi.com" rel="noopener noreferrer"&gt;https://hauntapi.com&lt;/a&gt;&lt;/p&gt;

</description>
      <category>webscraping</category>
      <category>python</category>
      <category>api</category>
    </item>
    <item>
      <title>I’m looking for ugly URLs that break normal scrapers</title>
      <dc:creator>Zee</dc:creator>
      <pubDate>Fri, 01 May 2026 21:24:30 +0000</pubDate>
      <link>https://dev.to/zee_builds/im-looking-for-ugly-urls-that-break-normal-scrapers-19o4</link>
      <guid>https://dev.to/zee_builds/im-looking-for-ugly-urls-that-break-normal-scrapers-19o4</guid>
      <description>&lt;p&gt;Most scraper demos use friendly pages.&lt;/p&gt;

&lt;p&gt;A blog post.&lt;br&gt;
A docs page.&lt;br&gt;
A fake ecommerce product.&lt;br&gt;
Something clean enough that BeautifulSoup could probably manage it after a coffee.&lt;/p&gt;

&lt;p&gt;That is not where web extraction gets annoying.&lt;/p&gt;

&lt;p&gt;The annoying cases are the ugly ones:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;JavaScript-rendered pages&lt;/li&gt;
&lt;li&gt;pages with no stable CSS selectors&lt;/li&gt;
&lt;li&gt;pages where the useful data is mixed into layout sludge&lt;/li&gt;
&lt;li&gt;Cloudflare / bot-wall weirdness&lt;/li&gt;
&lt;li&gt;vendor pages where the table changes every week&lt;/li&gt;
&lt;li&gt;docs pages where the answer is spread across several sections&lt;/li&gt;
&lt;li&gt;pages that look simple in a browser but return nonsense to curl&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those are the URLs I actually care about.&lt;/p&gt;
&lt;h2&gt;
  
  
  The useful test
&lt;/h2&gt;

&lt;p&gt;The test is not:&lt;/p&gt;

&lt;p&gt;“Can this tool scrape example.com?”&lt;/p&gt;

&lt;p&gt;The test is:&lt;/p&gt;

&lt;p&gt;“Can I send it a real page and ask for the specific thing I need, without writing a custom parser?”&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST https://hauntapi.com/v1/extract &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"X-API-Key: YOUR_KEY"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "url": "https://example.com/some-awful-page",
    "prompt": "Extract product names, prices, availability, and the source URL as JSON"
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is the shape I built Haunt API around:&lt;/p&gt;

&lt;p&gt;URL in.&lt;br&gt;
Natural-language extraction prompt in.&lt;br&gt;
Structured JSON out.&lt;/p&gt;

&lt;p&gt;No selector map.&lt;br&gt;
No one-off parser.&lt;br&gt;
No “the site changed one div class and now everything is dead” ritual sacrifice.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I want to test next
&lt;/h2&gt;

&lt;p&gt;I’m collecting awkward public URLs that normal scrapers struggle with.&lt;/p&gt;

&lt;p&gt;Not private data.&lt;br&gt;
Not login-only pages.&lt;br&gt;
Not anything illegal or creepy.&lt;/p&gt;

&lt;p&gt;Just the normal developer pain pile:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;public product pages&lt;/li&gt;
&lt;li&gt;public directories&lt;/li&gt;
&lt;li&gt;public docs&lt;/li&gt;
&lt;li&gt;public event listings&lt;/li&gt;
&lt;li&gt;public price pages&lt;/li&gt;
&lt;li&gt;public content pages with messy markup&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you have one of those “this page should be easy but somehow isn’t” URLs, send it over.&lt;/p&gt;

&lt;p&gt;I’ll try to turn it into clean JSON or Markdown and share what worked / what failed.&lt;/p&gt;

&lt;p&gt;The live docs are here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://hauntapi.com/docs" rel="noopener noreferrer"&gt;https://hauntapi.com/docs&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And the hard-URL proof flow is here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://hauntapi.com/services" rel="noopener noreferrer"&gt;https://hauntapi.com/services&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I’m mainly interested in the failures. Friendly demos are cheap. Broken real pages are where the bodies are buried.&lt;/p&gt;

</description>
      <category>webscraping</category>
      <category>api</category>
      <category>llm</category>
      <category>automation</category>
    </item>
    <item>
      <title>Your SaaS cancellation page is where retention goes to die</title>
      <dc:creator>Zee</dc:creator>
      <pubDate>Fri, 01 May 2026 21:21:43 +0000</pubDate>
      <link>https://dev.to/zee_builds/your-saas-cancellation-page-is-where-retention-goes-to-die-3k95</link>
      <guid>https://dev.to/zee_builds/your-saas-cancellation-page-is-where-retention-goes-to-die-3k95</guid>
      <description>&lt;p&gt;Most SaaS teams treat churn like a dashboard problem.&lt;/p&gt;

&lt;p&gt;They connect Stripe, stare at monthly churn, maybe add a chart, then wonder why nothing changes.&lt;/p&gt;

&lt;p&gt;That is post-mortem work.&lt;/p&gt;

&lt;p&gt;The customer has already left. The money is already gone. The dashboard is just reading the gravestone.&lt;/p&gt;

&lt;p&gt;The useful moment is earlier: the cancellation page.&lt;/p&gt;

&lt;p&gt;That is the one place where the customer is still present, still logged in, still telling you they are about to leave, and still possibly recoverable.&lt;/p&gt;

&lt;p&gt;Here is the simple teardown I use when looking at a SaaS cancellation flow.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Do you know why they are leaving?
&lt;/h2&gt;

&lt;p&gt;If the page only has a red "cancel subscription" button, you are throwing away the most useful data in the business.&lt;/p&gt;

&lt;p&gt;At minimum, ask for one reason:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;too expensive&lt;/li&gt;
&lt;li&gt;missing feature&lt;/li&gt;
&lt;li&gt;not using it enough&lt;/li&gt;
&lt;li&gt;switched to another tool&lt;/li&gt;
&lt;li&gt;temporary pause&lt;/li&gt;
&lt;li&gt;support/product issue&lt;/li&gt;
&lt;li&gt;other&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Do not make it a 20-field survey. That is not research, that is punishment.&lt;/p&gt;

&lt;p&gt;One click is enough.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Does the save offer match the reason?
&lt;/h2&gt;

&lt;p&gt;This is where most flows go stupid.&lt;/p&gt;

&lt;p&gt;If someone says "too expensive", offer a discount or downgrade.&lt;/p&gt;

&lt;p&gt;If someone says "not using it enough", offer a pause or reminder.&lt;/p&gt;

&lt;p&gt;If someone says "missing feature", show the closest workaround or ask if they want to be told when it ships.&lt;/p&gt;

&lt;p&gt;If someone says "temporary pause", do not beg. Give them a clean pause option.&lt;/p&gt;

&lt;p&gt;A generic "20% off if you stay" offer is better than nothing, but it is still lazy.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Are you saving the subscription or just annoying them?
&lt;/h2&gt;

&lt;p&gt;Dark pattern cancellation flows might reduce churn for five minutes and increase hatred forever.&lt;/p&gt;

&lt;p&gt;Do not hide the cancel button.&lt;br&gt;
Do not add five fake confirmation screens.&lt;br&gt;
Do not make them email support.&lt;br&gt;
Do not trap them.&lt;/p&gt;

&lt;p&gt;A good save flow is clear:&lt;/p&gt;

&lt;p&gt;"You can cancel now, but here is the one relevant option that might fit better."&lt;/p&gt;

&lt;p&gt;That is retention. Not hostage-taking.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Are failed payments mixed up with voluntary churn?
&lt;/h2&gt;

&lt;p&gt;These are different problems.&lt;/p&gt;

&lt;p&gt;A failed card is not the same as someone choosing to leave.&lt;/p&gt;

&lt;p&gt;Failed payment recovery needs dunning, retries, backup payment methods, and clear billing emails.&lt;/p&gt;

&lt;p&gt;Voluntary churn needs reason capture, matching offers, and product feedback loops.&lt;/p&gt;

&lt;p&gt;If your churn dashboard lumps them together, your action plan will be mud.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Can you see what happens after the save attempt?
&lt;/h2&gt;

&lt;p&gt;Track the basics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;cancellation started&lt;/li&gt;
&lt;li&gt;reason selected&lt;/li&gt;
&lt;li&gt;offer shown&lt;/li&gt;
&lt;li&gt;offer accepted&lt;/li&gt;
&lt;li&gt;cancellation completed&lt;/li&gt;
&lt;li&gt;saved revenue&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you cannot see these steps, you cannot improve the flow. You are guessing in expensive darkness.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tiny useful audit
&lt;/h2&gt;

&lt;p&gt;Look at your cancellation page and ask:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;What reason would a customer give here?&lt;/li&gt;
&lt;li&gt;What offer would they see next?&lt;/li&gt;
&lt;li&gt;Would that offer actually match the reason?&lt;/li&gt;
&lt;li&gt;Would I personally find this flow fair?&lt;/li&gt;
&lt;li&gt;Can I measure whether it saved anything?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If the answer is mostly "no", the fix is probably not another dashboard.&lt;/p&gt;

&lt;p&gt;It is a better cancellation moment.&lt;/p&gt;

&lt;p&gt;I built SaveMyChurn around this exact idea: catch the customer while they are still in the cancellation flow, ask why they are leaving, and show the right recovery offer instead of just reporting churn after the fact.&lt;/p&gt;

&lt;p&gt;If you want to sanity-check your own flow, the low-friction page is here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://savemychurn.com/cancellation-audit" rel="noopener noreferrer"&gt;https://savemychurn.com/cancellation-audit&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;No Stripe key needed for the first look. Just use it as a teardown lens before you start handing tools access to billing data.&lt;/p&gt;

&lt;p&gt;And if you do nothing else, add the one-question reason step. Boring, cheap, and annoyingly effective.&lt;/p&gt;

</description>
      <category>product</category>
      <category>saas</category>
      <category>startup</category>
      <category>marketing</category>
    </item>
    <item>
      <title>Most SaaS churn dashboards are post-mortems</title>
      <dc:creator>Zee</dc:creator>
      <pubDate>Fri, 01 May 2026 06:51:24 +0000</pubDate>
      <link>https://dev.to/zee_builds/most-saas-churn-dashboards-are-post-mortems-5f9k</link>
      <guid>https://dev.to/zee_builds/most-saas-churn-dashboards-are-post-mortems-5f9k</guid>
      <description>&lt;p&gt;If your churn dashboard only tells you that someone left, it is not a recovery system. It is a gravestone with charts.&lt;/p&gt;

&lt;p&gt;The useful question is not just “what is our churn rate?”&lt;/p&gt;

&lt;p&gt;It is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;who is likely to cancel?&lt;/li&gt;
&lt;li&gt;why are they cancelling?&lt;/li&gt;
&lt;li&gt;what save path should they see before the hard exit?&lt;/li&gt;
&lt;li&gt;what failed payments are quietly sitting in Stripe?&lt;/li&gt;
&lt;li&gt;what is a 5% retention improvement worth in actual MRR?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A lot of small SaaS teams already have the raw ingredients:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Stripe subscriptions&lt;/li&gt;
&lt;li&gt;cancellation reasons, if they ask for them&lt;/li&gt;
&lt;li&gt;plan and price data&lt;/li&gt;
&lt;li&gt;retry events&lt;/li&gt;
&lt;li&gt;customer usage signals&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But the cancellation flow is usually written like a legal form:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Are you sure you want to cancel?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is not retention. That is a trapdoor.&lt;/p&gt;

&lt;p&gt;A better cancellation flow should branch.&lt;/p&gt;

&lt;p&gt;If the reason is price, offer a downgrade or pause.&lt;/p&gt;

&lt;p&gt;If the reason is temporary budget, offer a timed pause.&lt;/p&gt;

&lt;p&gt;If the reason is missing functionality, capture the feature gap and trigger follow-up.&lt;/p&gt;

&lt;p&gt;If the problem is failed payment, do not treat it like voluntary churn.&lt;/p&gt;

&lt;p&gt;None of this requires a giant customer success department. It needs a simple loop:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;capture the reason&lt;/li&gt;
&lt;li&gt;match the reason to a recovery path&lt;/li&gt;
&lt;li&gt;measure recovered revenue, not vanity clicks&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I built SaveMyChurn around that idea: connect Stripe, detect churn and failed-payment leaks, and trigger personalised retention offers.&lt;/p&gt;

&lt;p&gt;There is a free cancellation audit here if you want to see the rough shape before connecting anything:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://savemychurn.com/cancellation-audit" rel="noopener noreferrer"&gt;https://savemychurn.com/cancellation-audit&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And a churn calculator if you just want to see what a few retention points are worth:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://savemychurn.com/churn-rate-calculator" rel="noopener noreferrer"&gt;https://savemychurn.com/churn-rate-calculator&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The short version: if churn is only a number on your dashboard, you are already too late.&lt;/p&gt;

</description>
      <category>stripe</category>
    </item>
    <item>
      <title>We Built a Custom Playwright Rendering Pipeline for Our MCP Server</title>
      <dc:creator>Zee</dc:creator>
      <pubDate>Fri, 24 Apr 2026 07:10:26 +0000</pubDate>
      <link>https://dev.to/zee_builds/we-built-a-custom-playwright-rendering-pipeline-for-our-mcp-server-5bdo</link>
      <guid>https://dev.to/zee_builds/we-built-a-custom-playwright-rendering-pipeline-for-our-mcp-server-5bdo</guid>
      <description>&lt;h1&gt;
  
  
  We Built a Custom Playwright Rendering Pipeline for Our MCP Server — Here's What We Learned
&lt;/h1&gt;

&lt;p&gt;At Haunt API, we build web extraction tools for AI agents. Our MCP server lets Claude and other AI assistants extract structured data from any URL. Simple enough on paper — fetch a page, parse the HTML, return JSON.&lt;/p&gt;

&lt;p&gt;The problem? Half the internet doesn't want to be fetched.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem With "Just Use Playwright"
&lt;/h2&gt;

&lt;p&gt;Most web scraping tutorials go something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;playwright.async_api&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;async_playwright&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;async_playwright&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;browser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chromium&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;launch&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new_page&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;goto&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;html&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;content&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And that works! For a demo. For a product that real users depend on, it falls apart fast:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Sites detect headless browsers&lt;/strong&gt; and serve captchas or empty pages&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SPA pages need time to render&lt;/strong&gt; — how long do you wait? 2 seconds? 5? 10?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You're burning resources&lt;/strong&gt; loading images, fonts, and CSS when you only need text&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Every render costs the same&lt;/strong&gt; — no caching, no intelligence&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We went through all of these. Here's how we solved each one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 1: Don't Use One Tool For Everything
&lt;/h2&gt;

&lt;p&gt;Our pipeline has three tiers, and most requests never hit Playwright:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Direct HTTP&lt;/strong&gt; — Works for ~80% of the web. Fast, cheap, no browser needed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;FlareSolverr&lt;/strong&gt; — Handles Cloudflare challenges and basic JS rendering.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Playwright&lt;/strong&gt; — Full browser rendering for JS-heavy SPAs that return empty skeletons.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The key insight: we detect &lt;em&gt;skeleton pages&lt;/em&gt; — HTML that has a &lt;code&gt;&amp;lt;div id="root"&amp;gt;&amp;lt;/div&amp;gt;&lt;/code&gt; but no actual content — and only spin up the browser when we need to. Most pages don't need it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;is_skeleton_html&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;html&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Detect if HTML is an unrendered JS skeleton.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;html&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;

    &lt;span class="c1"&gt;# Strip scripts/styles and check for visible text
&lt;/span&gt;    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;strip_tags&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;html&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;

    &lt;span class="c1"&gt;# Common SPA markers
&lt;/span&gt;    &lt;span class="n"&gt;skeleton_markers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;div id=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;root&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;&amp;lt;/div&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;div id=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__next&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;&amp;lt;/div&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;You need to enable JavaScript&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;marker&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;html&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;marker&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;skeleton_markers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Lesson 2: Smart Wait Strategies Beat Fixed Timers
&lt;/h2&gt;

&lt;p&gt;The worst thing about browser automation is the waiting. &lt;code&gt;time.sleep(5)&lt;/code&gt; is either too short (page hasn't loaded) or too long (wasting time on pages that loaded instantly).&lt;/p&gt;

&lt;p&gt;We built three concurrent wait strategies. First one to trigger wins:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Content Stability&lt;/strong&gt; — Poll the page's visible text every 200ms. If it hasn't changed for 1 second, the content has loaded.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Network Idle&lt;/strong&gt; — Wait for no new network requests for 500ms. Good for pages that make API calls after initial load.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Meaningful Content&lt;/strong&gt; — Wait until the page has at least 500 characters of visible text. Catches pages that load something but aren't done yet.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;wait_for_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Smart wait — detect when content has actually loaded.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;tasks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="nf"&gt;wait_for_content_stability&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="nf"&gt;wait_for_network_idle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="nf"&gt;wait_for_meaningful_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;done&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pending&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wait&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;return_when&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FIRST_COMPLETED&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;pending&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cancel&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;done&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pop&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;result&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;done&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;strategy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timeout&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This cut our average render time from 6 seconds to under 3.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 3: Fingerprint Rotation Matters
&lt;/h2&gt;

&lt;p&gt;Headless Chromium has tells. Sites check for them. If every request comes from the same user agent with the same viewport on the same timezone, you get blocked.&lt;/p&gt;

&lt;p&gt;We rotate fingerprints per-URL — same site sees a consistent browser (so cookies and sessions work), but different sites see different browsers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;FINGERPRINTS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ua&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Chrome/120.0 Windows&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;viewport&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1920&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1080&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;locale&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;en-US&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ua&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Chrome/119.0 macOS&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;viewport&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1440&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;900&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;locale&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;en-GB&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ua&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Chrome/120.0 Linux&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;viewport&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1366&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;768&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;locale&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;en-US&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="c1"&gt;# ... 10 total variants
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_fingerprint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Deterministic per-URL fingerprint selection.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;idx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hashlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;md5&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="nf"&gt;hexdigest&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;FINGERPRINTS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;FINGERPRINTS&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;idx&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Lesson 4: Block What You Don't Need
&lt;/h2&gt;

&lt;p&gt;When you're extracting text data, images and fonts are dead weight. We block them at the network level:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;BLOCKED_RESOURCES&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;font&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;media&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;texttrack&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;beacon&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;csp_report&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;eventsource&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;BLOCKED_DOMAINS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;google-analytics.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;facebook.net&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;doubleclick.net&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hotjar.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mixpanel.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;segment.io&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;# ... 20+ tracking domains
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;route_handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;route&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;route&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;resource_type&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;BLOCKED_RESOURCES&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;route&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;abort&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;route&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;BLOCKED_DOMAINS&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;route&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;abort&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;route&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;continue_&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This cuts HTML payload by 40-60% on most pages, which means faster renders and less RAM.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 5: Cache Renders, Not Requests
&lt;/h2&gt;

&lt;p&gt;If two users extract data from the same URL within 5 minutes, the page probably hasn't changed. We cache the rendered HTML with a TTL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;RenderCache&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;default_ttl&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cache&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OrderedDict&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;max_size&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;default_ttl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;default_ttl&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;entry&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;entry&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cached_at&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;entry&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ttl&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;entry&lt;/span&gt;
            &lt;span class="k"&gt;del&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cache hits return in 0ms. For an API that charges per request, this saves users money &lt;em&gt;and&lt;/em&gt; makes responses instant.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Architecture
&lt;/h2&gt;

&lt;p&gt;Final structure — 6 modules, each with a single job:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;playwright-service/
├── server.py          # FastAPI orchestration, browser lifecycle
├── fingerprint.py     # UA/viewport/locale rotation
├── smart_wait.py      # Content stability + network idle detection
├── site_detect.py     # Static vs SPA classification
├── cache.py           # LRU render cache with TTL
└── stealth.py         # Resource blocking + headless detection evasion
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each module is ~100 lines. Easy to test, easy to modify, easy to explain to new contributors.&lt;/p&gt;

&lt;h2&gt;
  
  
  What We Learned
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Don't reach for the browser first.&lt;/strong&gt; Most pages are server-rendered. Direct HTTP is 10x faster and 100x cheaper.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Wait smarter, not longer.&lt;/strong&gt; Detecting when content has actually loaded saves seconds per request.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Be a moving target.&lt;/strong&gt; Rotating fingerprints and blocking trackers keeps you under the radar.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cache aggressively.&lt;/strong&gt; Web pages don't change every second. A 5-minute render cache saves users money and makes your API feel fast.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Build modules, not monoliths.&lt;/strong&gt; Each piece of the pipeline has its own concerns. Keep them separate.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The Playwright browser engine is the oven. Everything around it — the routing, the waiting, the caching, the stealth — is the recipe. That's where the actual engineering lives.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;We're &lt;a href="https://hauntapi.com" rel="noopener noreferrer"&gt;Haunt API&lt;/a&gt; — web extraction built for AI agents. If you're building with Claude, Cursor, or any AI assistant, our &lt;a href="https://hauntapi.com#signup" rel="noopener noreferrer"&gt;MCP server&lt;/a&gt; gives your agent the ability to extract data from any URL in one line.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>showdev</category>
      <category>webscraping</category>
    </item>
    <item>
      <title>We Built a Custom Playwright Rendering Pipeline for Our MCP Server — Here is What We Learned</title>
      <dc:creator>Zee</dc:creator>
      <pubDate>Mon, 20 Apr 2026 19:23:07 +0000</pubDate>
      <link>https://dev.to/zee_builds/we-built-a-custom-playwright-rendering-pipeline-for-our-mcp-server-here-is-what-we-learned-38d9</link>
      <guid>https://dev.to/zee_builds/we-built-a-custom-playwright-rendering-pipeline-for-our-mcp-server-here-is-what-we-learned-38d9</guid>
      <description>&lt;h1&gt;
  
  
  We Built a Custom Playwright Rendering Pipeline for Our MCP Server — Heres What We Learned
&lt;/h1&gt;

&lt;p&gt;At &lt;a href="https://hauntapi.com" rel="noopener noreferrer"&gt;Haunt API&lt;/a&gt;, we build web extraction tools for AI agents. Our MCP server lets Claude and other AI assistants extract structured data from any URL. Simple enough on paper — fetch a page, parse the HTML, return JSON.&lt;/p&gt;

&lt;p&gt;The problem? Half the internet doesnt want to be fetched.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem With Just Use Playwright
&lt;/h2&gt;

&lt;p&gt;Most web scraping tutorials go something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;playwright.async_api&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;async_playwright&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;async_playwright&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;browser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chromium&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;launch&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new_page&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;goto&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;html&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;content&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And that works! For a demo. For a product that real users depend on, it falls apart fast:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Sites detect headless browsers&lt;/strong&gt; and serve captchas or empty pages&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SPA pages need time to render&lt;/strong&gt; — how long do you wait? 2 seconds? 5? 10?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You are burning resources&lt;/strong&gt; loading images, fonts, and CSS when you only need text&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Every render costs the same&lt;/strong&gt; — no caching, no intelligence&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We went through all of these. Here is how we solved each one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 1: Do Not Use One Tool For Everything
&lt;/h2&gt;

&lt;p&gt;Our pipeline has three tiers, and most requests never hit Playwright:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Direct HTTP&lt;/strong&gt; — Works for approximately 80% of the web. Fast, cheap, no browser needed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;FlareSolverr&lt;/strong&gt; — Handles Cloudflare challenges and basic JS rendering.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Playwright&lt;/strong&gt; — Full browser rendering for JS-heavy SPAs that return empty skeletons.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The key insight: we detect skeleton pages — HTML that has an empty root div but no actual content — and only spin up the browser when we need to.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 2: Smart Wait Strategies Beat Fixed Timers
&lt;/h2&gt;

&lt;p&gt;The worst thing about browser automation is the waiting. A fixed sleep is either too short or too long. We built three concurrent wait strategies — first one to trigger wins:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Content Stability&lt;/strong&gt; — Poll visible text every 200ms. If unchanged for 1 second, done.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Network Idle&lt;/strong&gt; — Wait for no new requests for 500ms.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Meaningful Content&lt;/strong&gt; — Wait until 500+ chars of visible text exist.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This cut our average render time from 6 seconds to under 3.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 3: Fingerprint Rotation Matters
&lt;/h2&gt;

&lt;p&gt;Headless Chromium has tells. We rotate fingerprints per-URL — same site sees a consistent browser, different sites see different browsers. 10 viewport variants across Windows, macOS, and Linux UAs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 4: Block What You Do Not Need
&lt;/h2&gt;

&lt;p&gt;When extracting text data, images and fonts are dead weight. We block them at the network level plus 20+ tracking domains. This cuts HTML payload by 40-60%.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 5: Cache Renders, Not Requests
&lt;/h2&gt;

&lt;p&gt;If two users extract data from the same URL within 5 minutes, the page probably has not changed. Cache hits return in 0ms.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Architecture
&lt;/h2&gt;

&lt;p&gt;Six modules, each with a single job:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;server.py&lt;/strong&gt; — FastAPI orchestration, browser lifecycle&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;fingerprint.py&lt;/strong&gt; — UA/viewport/locale rotation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;smart_wait.py&lt;/strong&gt; — Content stability + network idle detection&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;site_detect.py&lt;/strong&gt; — Static vs SPA classification&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;cache.py&lt;/strong&gt; — LRU render cache with TTL&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;stealth.py&lt;/strong&gt; — Resource blocking + headless detection evasion&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each module is approximately 100 lines. Easy to test, easy to modify.&lt;/p&gt;

&lt;h2&gt;
  
  
  What We Learned
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Do not reach for the browser first. Most pages are server-rendered.&lt;/li&gt;
&lt;li&gt;Wait smarter, not longer.&lt;/li&gt;
&lt;li&gt;Be a moving target with fingerprint rotation.&lt;/li&gt;
&lt;li&gt;Cache aggressively.&lt;/li&gt;
&lt;li&gt;Build modules, not monoliths.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The Playwright browser engine is the oven. Everything around it — the routing, the waiting, the caching, the stealth — is the recipe. That is where the actual engineering lives.&lt;/p&gt;




&lt;p&gt;We are &lt;a href="https://hauntapi.com" rel="noopener noreferrer"&gt;Haunt API&lt;/a&gt; — web extraction built for AI agents. If you are building with Claude, Cursor, or any AI assistant, our &lt;a href="https://hauntapi.com" rel="noopener noreferrer"&gt;MCP server&lt;/a&gt; gives your agent the ability to extract data from any URL.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>ai</category>
      <category>scraping</category>
      <category>playwright</category>
    </item>
    <item>
      <title>I Built an AI That Talks People Out of Cancelling Their Subscriptions</title>
      <dc:creator>Zee</dc:creator>
      <pubDate>Mon, 20 Apr 2026 15:35:05 +0000</pubDate>
      <link>https://dev.to/zee_builds/i-built-an-ai-that-talks-people-out-of-cancelling-their-subscriptions-2bm8</link>
      <guid>https://dev.to/zee_builds/i-built-an-ai-that-talks-people-out-of-cancelling-their-subscriptions-2bm8</guid>
      <description>&lt;p&gt;Here's the thing about churn: by the time someone clicks "Cancel Subscription", they've already decided. Your generic "Would you like 20% off?" popup is too late and too weak.&lt;/p&gt;

&lt;p&gt;I spent the last month building &lt;a href="https://savemychurn.com" rel="noopener noreferrer"&gt;SaveMyChurn&lt;/a&gt; — an AI-powered churn recovery tool for Stripe SaaS founders. This is how it works, what I learned building it, and why I think most cancellation flows are doing it wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem
&lt;/h2&gt;

&lt;p&gt;I was looking at my own Stripe dashboard one day and noticed something: the cancellation flow was the most ignored piece of the entire subscription experience. People pour weeks into onboarding, feature development, marketing — and then the cancel button just... ends things. No conversation. No understanding of why.&lt;/p&gt;

&lt;p&gt;For bootstrapped SaaS founders running £5K-50K MRR, every subscription matters. Losing 5% of your customers a month isn't a statistic — it's the difference between growing and dying.&lt;/p&gt;

&lt;p&gt;The existing tools didn't fit. Churnkey starts at $250/month — that's a significant chunk of revenue when you're small. The cheaper options are just form builders with a discount code at the end. Nobody was actually &lt;em&gt;talking&lt;/em&gt; to the customer.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I built
&lt;/h2&gt;

&lt;p&gt;SaveMyChurn does three things:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Listens to Stripe in real time&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When a customer hits cancel, Stripe fires a &lt;code&gt;customer.subscription.deleted&lt;/code&gt; webhook. SaveMyChurn catches it instantly, pulls the subscription metadata, payment history, and plan details, and builds a profile of who's leaving and why.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# The webhook handler — this is where it starts
&lt;/span&gt;&lt;span class="nd"&gt;@router.post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/webhooks/stripe&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;stripe_webhook&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;body&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;stripe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Webhook&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;construct_event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stripe-signature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;webhook_secret&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;customer.subscription.deleted&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;subscription&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="c1"&gt;# Build subscriber profile from Stripe data
&lt;/span&gt;        &lt;span class="n"&gt;profile&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;build_subscriber_profile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;subscription&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="c1"&gt;# Generate AI retention strategy
&lt;/span&gt;        &lt;span class="n"&gt;strategy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;generate_retention_strategy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="c1"&gt;# Send personalised recovery email
&lt;/span&gt;        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;send_retention_email&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;strategy&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. Generates a unique retention strategy per subscriber&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the part I'm most proud of. Instead of a static "here's 20% off" flow, an AI strategist analyses the subscriber's behaviour — how long they've been a customer, what plan they're on, their payment history, any support tickets — and creates a genuinely personalised retention offer.&lt;/p&gt;

&lt;p&gt;Someone cancelling after 2 months gets a different approach than someone who's been around for a year. Someone on a basic plan gets a different offer than someone on enterprise. The AI adjusts tone, offer type, discount level, and follow-up timing based on the full context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Follows up automatically&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;One email rarely saves a cancellation. SaveMyChurn runs a multi-step sequence — initial offer, follow-up with adjusted terms, final value reminder — spaced over a few days. Each step is informed by whether they opened the previous email, clicked anything, or went silent.&lt;/p&gt;

&lt;h2&gt;
  
  
  The tech stack
&lt;/h2&gt;

&lt;p&gt;Keeping it simple and cheap:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;FastAPI&lt;/strong&gt; backend — async Python, handles webhooks fast&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MongoDB&lt;/strong&gt; for subscriber profiles and strategy storage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Redis&lt;/strong&gt; for caching and rate limiting&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM via API&lt;/strong&gt; for strategy generation — the AI strategist&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resend&lt;/strong&gt; for transactional emails&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Docker&lt;/strong&gt; on a single VPS — the whole thing runs on one machine&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The LLM cost per strategy generation is under a penny. When your competitor charges $250/month, that's a ridiculous margin.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pricing model (and why it matters)
&lt;/h2&gt;

&lt;p&gt;I went with a commission model. Monthly fee + a percentage of recovered revenue. The idea is simple: if I don't save you money, I don't make money.&lt;/p&gt;

&lt;p&gt;This was a deliberate choice. Flat-fee tools have an incentive to get you signed up and keep you paying, regardless of results. Commission pricing means I'm motivated to actually recover subscriptions, not just ship a dashboard.&lt;/p&gt;

&lt;p&gt;For founders at the £5K-50K MRR stage, this aligns incentives in a way that $250/month flat fees don't.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Webhook reliability is everything.&lt;/strong&gt; If you miss a &lt;code&gt;customer.subscription.deleted&lt;/code&gt; event, you miss the entire recovery window. I ended up implementing retry queues and idempotency keys before anything else.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI strategy &amp;gt; rules engine.&lt;/strong&gt; I initially built a simple rule-based system (if cancel reason = "price" → offer discount). It was okay. The AI strategist that replaced it generates strategies I wouldn't have thought of — bundling features differently, offering plan downgrades instead of discounts, timing follow-ups based on engagement patterns.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;One email is never enough.&lt;/strong&gt; The first recovery email has maybe a 15-20% open rate. The follow-up catches another chunk. The third one gets the people who were "going to get around to it." Multi-step sequences doubled recovery rates compared to single emails.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where it's at
&lt;/h2&gt;

&lt;p&gt;SaveMyChurn is live and in production. It works end-to-end: Stripe webhook → AI strategy → personalised email sequence → dashboard showing what was saved.&lt;/p&gt;

&lt;p&gt;If you're a bootstrapped SaaS founder on Stripe watching subscriptions slip away, &lt;a href="https://savemychurn.com" rel="noopener noreferrer"&gt;give it a look&lt;/a&gt;. There's a free trial — no credit card required.&lt;/p&gt;

</description>
      <category>saas</category>
      <category>stripe</category>
      <category>ai</category>
      <category>retention</category>
    </item>
    <item>
      <title>Your AI Agent Can't Scrape That Page. Here's How to Fix It.</title>
      <dc:creator>Zee</dc:creator>
      <pubDate>Mon, 20 Apr 2026 15:16:08 +0000</pubDate>
      <link>https://dev.to/zee_builds/your-ai-agent-cant-scrape-that-page-heres-how-to-fix-it-2om7</link>
      <guid>https://dev.to/zee_builds/your-ai-agent-cant-scrape-that-page-heres-how-to-fix-it-2om7</guid>
      <description>&lt;h1&gt;
  
  
  Your AI Agent Can't Scrape That Page. Here's How to Fix It.
&lt;/h1&gt;

&lt;p&gt;You built an AI agent that needs real-time web data. Product prices, news articles, competitor info — whatever it is, you need clean HTML or JSON from a URL.&lt;/p&gt;

&lt;p&gt;So you fire off a &lt;code&gt;requests.get()&lt;/code&gt; and... &lt;strong&gt;403 Forbidden&lt;/strong&gt;. Cloudflare says no.&lt;/p&gt;

&lt;p&gt;Or you get a page, but it's empty — the content loads via JavaScript after the page renders, and your HTTP client never sees it.&lt;/p&gt;

&lt;p&gt;Sound familiar? Let's break down what's happening and how to actually solve it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Your Scraping Fails
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. JavaScript Rendering
&lt;/h3&gt;

&lt;p&gt;Modern sites are SPAs. The HTML you get from a raw HTTP request is a shell — the actual content is loaded by JavaScript after the page mounts. &lt;code&gt;requests&lt;/code&gt;, &lt;code&gt;axios&lt;/code&gt;, &lt;code&gt;fetch&lt;/code&gt; — none of them execute JS.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Cloudflare and Bot Detection
&lt;/h3&gt;

&lt;p&gt;Cloudflare fingerprints your connection:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;TLS fingerprint (does your HTTP client look like a browser?)&lt;/li&gt;
&lt;li&gt;HTTP/2 fingerprint&lt;/li&gt;
&lt;li&gt;Browser behavior (mouse movements, JS execution patterns)&lt;/li&gt;
&lt;li&gt;IP reputation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Regular HTTP clients fail all of these checks.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Complex Layouts
&lt;/h3&gt;

&lt;p&gt;Even when you get the HTML, extracting structured data from it is painful. You write brittle CSS selectors that break on every layout change.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Solutions (From Worst to Best)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Selenium/Playwright Headless Browsers
&lt;/h3&gt;

&lt;p&gt;They work... sometimes. But Cloudflare detects headless Chrome. You'll spend more time maintaining anti-detection patches than building your actual product.&lt;/p&gt;

&lt;h3&gt;
  
  
  Rotating Proxies + Custom Headers
&lt;/h3&gt;

&lt;p&gt;Expensive, slow, and fragile. You're playing whack-a-mole with detection rules.&lt;/p&gt;

&lt;h3&gt;
  
  
  Use an API That Handles Everything
&lt;/h3&gt;

&lt;p&gt;This is where tools like &lt;a href="https://hauntapi.com" rel="noopener noreferrer"&gt;Haunt API&lt;/a&gt; come in. It's a web extraction API built specifically for AI agents:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://hauntapi.com/v1/extract&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;x-api-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://example.com/product/123&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Get the product name, price, and availability&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="c1"&gt;# {
#   "product_name": "Wireless Headphones Pro",
#   "price": "$79.99",
#   "availability": "In Stock"
# }
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. One API call. Cloudflare bypassed, JavaScript rendered, structured data extracted.&lt;/p&gt;

&lt;h3&gt;
  
  
  How It Works Under the Hood
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Smart fetching&lt;/strong&gt; — tries direct HTTP first, falls back to headless browser with anti-fingerprinting for Cloudflare-protected sites&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;JavaScript executes&lt;/strong&gt; — SPA content becomes available&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI extracts&lt;/strong&gt; the data you described in your natural language prompt&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Clean JSON&lt;/strong&gt; returned to your application&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  MCP Server for Claude and Cursor
&lt;/h3&gt;

&lt;p&gt;If you're building with AI agents, Haunt also has an MCP server:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"haunt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"@hauntapi/mcp-server"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"env"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"HAUNT_API_KEY"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"your-key"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add that to your Claude Desktop or Cursor config and your AI agent can extract data from any website natively. Zero code.&lt;/p&gt;

&lt;h3&gt;
  
  
  REST API (No SDK Needed)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST https://hauntapi.com/v1/extract &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"x-api-key: your-key"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "url": "https://news.ycombinator.com",
    "prompt": "Get the top 5 stories with titles, points, and URLs"
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Free Tier
&lt;/h2&gt;

&lt;p&gt;100 extractions/month for free. No credit card required. Perfect for prototyping your AI agent before scaling up.&lt;/p&gt;

&lt;p&gt;Paid plans start at £19/mo for 1,000 requests with authenticated scraping and priority support.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to Use What
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;th&gt;Reliability&lt;/th&gt;
&lt;th&gt;Setup Time&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Raw requests&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;td&gt;Low (30%)&lt;/td&gt;
&lt;td&gt;5 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Selenium + proxies&lt;/td&gt;
&lt;td&gt;$$$&lt;/td&gt;
&lt;td&gt;Medium (60%)&lt;/td&gt;
&lt;td&gt;Hours&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Haunt API&lt;/td&gt;
&lt;td&gt;Free tier&lt;/td&gt;
&lt;td&gt;High (95%+)&lt;/td&gt;
&lt;td&gt;5 min&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;If your AI agent needs web data and you're tired of fighting bot detection, try &lt;a href="https://hauntapi.com" rel="noopener noreferrer"&gt;Haunt API&lt;/a&gt;. It handles Cloudflare, JavaScript rendering, and data extraction in a single API call.&lt;/p&gt;

&lt;p&gt;Free to start, built for AI agents and RAG pipelines.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Disclosure: I built Haunt API because I was tired of writing the same scraping infrastructure for every project.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>scraping</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
