<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Zee</title>
    <description>The latest articles on DEV Community by Zee (@zee_builds).</description>
    <link>https://dev.to/zee_builds</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3875644%2F87032ac1-a71e-4e82-9eb9-db42127d93d2.png</url>
      <title>DEV Community: Zee</title>
      <link>https://dev.to/zee_builds</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/zee_builds"/>
    <language>en</language>
    <item>
      <title>We Built a Custom Playwright Rendering Pipeline for Our MCP Server — Here is What We Learned</title>
      <dc:creator>Zee</dc:creator>
      <pubDate>Mon, 20 Apr 2026 19:23:07 +0000</pubDate>
      <link>https://dev.to/zee_builds/we-built-a-custom-playwright-rendering-pipeline-for-our-mcp-server-here-is-what-we-learned-38d9</link>
      <guid>https://dev.to/zee_builds/we-built-a-custom-playwright-rendering-pipeline-for-our-mcp-server-here-is-what-we-learned-38d9</guid>
      <description>&lt;h1&gt;
  
  
  We Built a Custom Playwright Rendering Pipeline for Our MCP Server — Heres What We Learned
&lt;/h1&gt;

&lt;p&gt;At &lt;a href="https://hauntapi.com" rel="noopener noreferrer"&gt;Haunt API&lt;/a&gt;, we build web extraction tools for AI agents. Our MCP server lets Claude and other AI assistants extract structured data from any URL. Simple enough on paper — fetch a page, parse the HTML, return JSON.&lt;/p&gt;

&lt;p&gt;The problem? Half the internet doesnt want to be fetched.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem With Just Use Playwright
&lt;/h2&gt;

&lt;p&gt;Most web scraping tutorials go something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;playwright.async_api&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;async_playwright&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;async_playwright&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;browser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chromium&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;launch&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new_page&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;goto&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;html&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;content&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And that works! For a demo. For a product that real users depend on, it falls apart fast:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Sites detect headless browsers&lt;/strong&gt; and serve captchas or empty pages&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SPA pages need time to render&lt;/strong&gt; — how long do you wait? 2 seconds? 5? 10?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You are burning resources&lt;/strong&gt; loading images, fonts, and CSS when you only need text&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Every render costs the same&lt;/strong&gt; — no caching, no intelligence&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We went through all of these. Here is how we solved each one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 1: Do Not Use One Tool For Everything
&lt;/h2&gt;

&lt;p&gt;Our pipeline has three tiers, and most requests never hit Playwright:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Direct HTTP&lt;/strong&gt; — Works for approximately 80% of the web. Fast, cheap, no browser needed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;FlareSolverr&lt;/strong&gt; — Handles Cloudflare challenges and basic JS rendering.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Playwright&lt;/strong&gt; — Full browser rendering for JS-heavy SPAs that return empty skeletons.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The key insight: we detect skeleton pages — HTML that has an empty root div but no actual content — and only spin up the browser when we need to.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 2: Smart Wait Strategies Beat Fixed Timers
&lt;/h2&gt;

&lt;p&gt;The worst thing about browser automation is the waiting. A fixed sleep is either too short or too long. We built three concurrent wait strategies — first one to trigger wins:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Content Stability&lt;/strong&gt; — Poll visible text every 200ms. If unchanged for 1 second, done.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Network Idle&lt;/strong&gt; — Wait for no new requests for 500ms.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Meaningful Content&lt;/strong&gt; — Wait until 500+ chars of visible text exist.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This cut our average render time from 6 seconds to under 3.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 3: Fingerprint Rotation Matters
&lt;/h2&gt;

&lt;p&gt;Headless Chromium has tells. We rotate fingerprints per-URL — same site sees a consistent browser, different sites see different browsers. 10 viewport variants across Windows, macOS, and Linux UAs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 4: Block What You Do Not Need
&lt;/h2&gt;

&lt;p&gt;When extracting text data, images and fonts are dead weight. We block them at the network level plus 20+ tracking domains. This cuts HTML payload by 40-60%.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 5: Cache Renders, Not Requests
&lt;/h2&gt;

&lt;p&gt;If two users extract data from the same URL within 5 minutes, the page probably has not changed. Cache hits return in 0ms.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Architecture
&lt;/h2&gt;

&lt;p&gt;Six modules, each with a single job:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;server.py&lt;/strong&gt; — FastAPI orchestration, browser lifecycle&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;fingerprint.py&lt;/strong&gt; — UA/viewport/locale rotation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;smart_wait.py&lt;/strong&gt; — Content stability + network idle detection&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;site_detect.py&lt;/strong&gt; — Static vs SPA classification&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;cache.py&lt;/strong&gt; — LRU render cache with TTL&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;stealth.py&lt;/strong&gt; — Resource blocking + headless detection evasion&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each module is approximately 100 lines. Easy to test, easy to modify.&lt;/p&gt;

&lt;h2&gt;
  
  
  What We Learned
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Do not reach for the browser first. Most pages are server-rendered.&lt;/li&gt;
&lt;li&gt;Wait smarter, not longer.&lt;/li&gt;
&lt;li&gt;Be a moving target with fingerprint rotation.&lt;/li&gt;
&lt;li&gt;Cache aggressively.&lt;/li&gt;
&lt;li&gt;Build modules, not monoliths.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The Playwright browser engine is the oven. Everything around it — the routing, the waiting, the caching, the stealth — is the recipe. That is where the actual engineering lives.&lt;/p&gt;




&lt;p&gt;We are &lt;a href="https://hauntapi.com" rel="noopener noreferrer"&gt;Haunt API&lt;/a&gt; — web extraction built for AI agents. If you are building with Claude, Cursor, or any AI assistant, our &lt;a href="https://hauntapi.com" rel="noopener noreferrer"&gt;MCP server&lt;/a&gt; gives your agent the ability to extract data from any URL.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>ai</category>
      <category>scraping</category>
      <category>playwright</category>
    </item>
    <item>
      <title>I Built an AI That Talks People Out of Cancelling Their Subscriptions</title>
      <dc:creator>Zee</dc:creator>
      <pubDate>Mon, 20 Apr 2026 15:35:05 +0000</pubDate>
      <link>https://dev.to/zee_builds/i-built-an-ai-that-talks-people-out-of-cancelling-their-subscriptions-2bm8</link>
      <guid>https://dev.to/zee_builds/i-built-an-ai-that-talks-people-out-of-cancelling-their-subscriptions-2bm8</guid>
      <description>&lt;p&gt;Here's the thing about churn: by the time someone clicks "Cancel Subscription", they've already decided. Your generic "Would you like 20% off?" popup is too late and too weak.&lt;/p&gt;

&lt;p&gt;I spent the last month building &lt;a href="https://savemychurn.com" rel="noopener noreferrer"&gt;SaveMyChurn&lt;/a&gt; — an AI-powered churn recovery tool for Stripe SaaS founders. This is how it works, what I learned building it, and why I think most cancellation flows are doing it wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem
&lt;/h2&gt;

&lt;p&gt;I was looking at my own Stripe dashboard one day and noticed something: the cancellation flow was the most ignored piece of the entire subscription experience. People pour weeks into onboarding, feature development, marketing — and then the cancel button just... ends things. No conversation. No understanding of why.&lt;/p&gt;

&lt;p&gt;For bootstrapped SaaS founders running £5K-50K MRR, every subscription matters. Losing 5% of your customers a month isn't a statistic — it's the difference between growing and dying.&lt;/p&gt;

&lt;p&gt;The existing tools didn't fit. Churnkey starts at $250/month — that's a significant chunk of revenue when you're small. The cheaper options are just form builders with a discount code at the end. Nobody was actually &lt;em&gt;talking&lt;/em&gt; to the customer.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I built
&lt;/h2&gt;

&lt;p&gt;SaveMyChurn does three things:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Listens to Stripe in real time&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When a customer hits cancel, Stripe fires a &lt;code&gt;customer.subscription.deleted&lt;/code&gt; webhook. SaveMyChurn catches it instantly, pulls the subscription metadata, payment history, and plan details, and builds a profile of who's leaving and why.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# The webhook handler — this is where it starts
&lt;/span&gt;&lt;span class="nd"&gt;@router.post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/webhooks/stripe&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;stripe_webhook&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;body&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;stripe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Webhook&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;construct_event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stripe-signature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;webhook_secret&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;customer.subscription.deleted&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;subscription&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="c1"&gt;# Build subscriber profile from Stripe data
&lt;/span&gt;        &lt;span class="n"&gt;profile&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;build_subscriber_profile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;subscription&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="c1"&gt;# Generate AI retention strategy
&lt;/span&gt;        &lt;span class="n"&gt;strategy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;generate_retention_strategy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="c1"&gt;# Send personalised recovery email
&lt;/span&gt;        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;send_retention_email&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;strategy&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. Generates a unique retention strategy per subscriber&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the part I'm most proud of. Instead of a static "here's 20% off" flow, an AI strategist analyses the subscriber's behaviour — how long they've been a customer, what plan they're on, their payment history, any support tickets — and creates a genuinely personalised retention offer.&lt;/p&gt;

&lt;p&gt;Someone cancelling after 2 months gets a different approach than someone who's been around for a year. Someone on a basic plan gets a different offer than someone on enterprise. The AI adjusts tone, offer type, discount level, and follow-up timing based on the full context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Follows up automatically&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;One email rarely saves a cancellation. SaveMyChurn runs a multi-step sequence — initial offer, follow-up with adjusted terms, final value reminder — spaced over a few days. Each step is informed by whether they opened the previous email, clicked anything, or went silent.&lt;/p&gt;

&lt;h2&gt;
  
  
  The tech stack
&lt;/h2&gt;

&lt;p&gt;Keeping it simple and cheap:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;FastAPI&lt;/strong&gt; backend — async Python, handles webhooks fast&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MongoDB&lt;/strong&gt; for subscriber profiles and strategy storage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Redis&lt;/strong&gt; for caching and rate limiting&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM via API&lt;/strong&gt; for strategy generation — the AI strategist&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resend&lt;/strong&gt; for transactional emails&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Docker&lt;/strong&gt; on a single VPS — the whole thing runs on one machine&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The LLM cost per strategy generation is under a penny. When your competitor charges $250/month, that's a ridiculous margin.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pricing model (and why it matters)
&lt;/h2&gt;

&lt;p&gt;I went with a commission model. Monthly fee + a percentage of recovered revenue. The idea is simple: if I don't save you money, I don't make money.&lt;/p&gt;

&lt;p&gt;This was a deliberate choice. Flat-fee tools have an incentive to get you signed up and keep you paying, regardless of results. Commission pricing means I'm motivated to actually recover subscriptions, not just ship a dashboard.&lt;/p&gt;

&lt;p&gt;For founders at the £5K-50K MRR stage, this aligns incentives in a way that $250/month flat fees don't.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Webhook reliability is everything.&lt;/strong&gt; If you miss a &lt;code&gt;customer.subscription.deleted&lt;/code&gt; event, you miss the entire recovery window. I ended up implementing retry queues and idempotency keys before anything else.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI strategy &amp;gt; rules engine.&lt;/strong&gt; I initially built a simple rule-based system (if cancel reason = "price" → offer discount). It was okay. The AI strategist that replaced it generates strategies I wouldn't have thought of — bundling features differently, offering plan downgrades instead of discounts, timing follow-ups based on engagement patterns.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;One email is never enough.&lt;/strong&gt; The first recovery email has maybe a 15-20% open rate. The follow-up catches another chunk. The third one gets the people who were "going to get around to it." Multi-step sequences doubled recovery rates compared to single emails.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where it's at
&lt;/h2&gt;

&lt;p&gt;SaveMyChurn is live and in production. It works end-to-end: Stripe webhook → AI strategy → personalised email sequence → dashboard showing what was saved.&lt;/p&gt;

&lt;p&gt;If you're a bootstrapped SaaS founder on Stripe watching subscriptions slip away, &lt;a href="https://savemychurn.com" rel="noopener noreferrer"&gt;give it a look&lt;/a&gt;. There's a free trial — no credit card required.&lt;/p&gt;

</description>
      <category>saas</category>
      <category>stripe</category>
      <category>ai</category>
      <category>retention</category>
    </item>
    <item>
      <title>Your AI Agent Can't Scrape That Page. Here's How to Fix It.</title>
      <dc:creator>Zee</dc:creator>
      <pubDate>Mon, 20 Apr 2026 15:16:08 +0000</pubDate>
      <link>https://dev.to/zee_builds/your-ai-agent-cant-scrape-that-page-heres-how-to-fix-it-2om7</link>
      <guid>https://dev.to/zee_builds/your-ai-agent-cant-scrape-that-page-heres-how-to-fix-it-2om7</guid>
      <description>&lt;h1&gt;
  
  
  Your AI Agent Can't Scrape That Page. Here's How to Fix It.
&lt;/h1&gt;

&lt;p&gt;You built an AI agent that needs real-time web data. Product prices, news articles, competitor info — whatever it is, you need clean HTML or JSON from a URL.&lt;/p&gt;

&lt;p&gt;So you fire off a &lt;code&gt;requests.get()&lt;/code&gt; and... &lt;strong&gt;403 Forbidden&lt;/strong&gt;. Cloudflare says no.&lt;/p&gt;

&lt;p&gt;Or you get a page, but it's empty — the content loads via JavaScript after the page renders, and your HTTP client never sees it.&lt;/p&gt;

&lt;p&gt;Sound familiar? Let's break down what's happening and how to actually solve it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Your Scraping Fails
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. JavaScript Rendering
&lt;/h3&gt;

&lt;p&gt;Modern sites are SPAs. The HTML you get from a raw HTTP request is a shell — the actual content is loaded by JavaScript after the page mounts. &lt;code&gt;requests&lt;/code&gt;, &lt;code&gt;axios&lt;/code&gt;, &lt;code&gt;fetch&lt;/code&gt; — none of them execute JS.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Cloudflare and Bot Detection
&lt;/h3&gt;

&lt;p&gt;Cloudflare fingerprints your connection:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;TLS fingerprint (does your HTTP client look like a browser?)&lt;/li&gt;
&lt;li&gt;HTTP/2 fingerprint&lt;/li&gt;
&lt;li&gt;Browser behavior (mouse movements, JS execution patterns)&lt;/li&gt;
&lt;li&gt;IP reputation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Regular HTTP clients fail all of these checks.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Complex Layouts
&lt;/h3&gt;

&lt;p&gt;Even when you get the HTML, extracting structured data from it is painful. You write brittle CSS selectors that break on every layout change.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Solutions (From Worst to Best)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Selenium/Playwright Headless Browsers
&lt;/h3&gt;

&lt;p&gt;They work... sometimes. But Cloudflare detects headless Chrome. You'll spend more time maintaining anti-detection patches than building your actual product.&lt;/p&gt;

&lt;h3&gt;
  
  
  Rotating Proxies + Custom Headers
&lt;/h3&gt;

&lt;p&gt;Expensive, slow, and fragile. You're playing whack-a-mole with detection rules.&lt;/p&gt;

&lt;h3&gt;
  
  
  Use an API That Handles Everything
&lt;/h3&gt;

&lt;p&gt;This is where tools like &lt;a href="https://hauntapi.com" rel="noopener noreferrer"&gt;Haunt API&lt;/a&gt; come in. It's a web extraction API built specifically for AI agents:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://hauntapi.com/v1/extract&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;x-api-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://example.com/product/123&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Get the product name, price, and availability&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="c1"&gt;# {
#   "product_name": "Wireless Headphones Pro",
#   "price": "$79.99",
#   "availability": "In Stock"
# }
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. One API call. Cloudflare bypassed, JavaScript rendered, structured data extracted.&lt;/p&gt;

&lt;h3&gt;
  
  
  How It Works Under the Hood
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Smart fetching&lt;/strong&gt; — tries direct HTTP first, falls back to headless browser with anti-fingerprinting for Cloudflare-protected sites&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;JavaScript executes&lt;/strong&gt; — SPA content becomes available&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI extracts&lt;/strong&gt; the data you described in your natural language prompt&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Clean JSON&lt;/strong&gt; returned to your application&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  MCP Server for Claude and Cursor
&lt;/h3&gt;

&lt;p&gt;If you're building with AI agents, Haunt also has an MCP server:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"haunt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"@hauntapi/mcp-server"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"env"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"HAUNT_API_KEY"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"your-key"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add that to your Claude Desktop or Cursor config and your AI agent can extract data from any website natively. Zero code.&lt;/p&gt;

&lt;h3&gt;
  
  
  REST API (No SDK Needed)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST https://hauntapi.com/v1/extract &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"x-api-key: your-key"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "url": "https://news.ycombinator.com",
    "prompt": "Get the top 5 stories with titles, points, and URLs"
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Free Tier
&lt;/h2&gt;

&lt;p&gt;100 extractions/month for free. No credit card required. Perfect for prototyping your AI agent before scaling up.&lt;/p&gt;

&lt;p&gt;Paid plans start at £19/mo for 1,000 requests with authenticated scraping and priority support.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to Use What
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;th&gt;Reliability&lt;/th&gt;
&lt;th&gt;Setup Time&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Raw requests&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;td&gt;Low (30%)&lt;/td&gt;
&lt;td&gt;5 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Selenium + proxies&lt;/td&gt;
&lt;td&gt;$$$&lt;/td&gt;
&lt;td&gt;Medium (60%)&lt;/td&gt;
&lt;td&gt;Hours&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Haunt API&lt;/td&gt;
&lt;td&gt;Free tier&lt;/td&gt;
&lt;td&gt;High (95%+)&lt;/td&gt;
&lt;td&gt;5 min&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;If your AI agent needs web data and you're tired of fighting bot detection, try &lt;a href="https://hauntapi.com" rel="noopener noreferrer"&gt;Haunt API&lt;/a&gt;. It handles Cloudflare, JavaScript rendering, and data extraction in a single API call.&lt;/p&gt;

&lt;p&gt;Free to start, built for AI agents and RAG pipelines.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Disclosure: I built Haunt API because I was tired of writing the same scraping infrastructure for every project.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>scraping</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
