<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Tinyfishie</title>
    <description>The latest articles on DEV Community by Tinyfishie (@tinyfishie).</description>
    <link>https://dev.to/tinyfishie</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3933533%2F78c3da8e-8c9c-4dce-9b99-452a1dd0750a.png</url>
      <title>DEV Community: Tinyfishie</title>
      <link>https://dev.to/tinyfishie</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/tinyfishie"/>
    <language>en</language>
    <item>
      <title>How to Extract Structured Data from A Website</title>
      <dc:creator>Tinyfishie</dc:creator>
      <pubDate>Tue, 19 May 2026 16:14:36 +0000</pubDate>
      <link>https://dev.to/tinyfishie/how-to-extract-structured-data-from-a-website-20ng</link>
      <guid>https://dev.to/tinyfishie/how-to-extract-structured-data-from-a-website-20ng</guid>
      <description>&lt;p&gt;TinyFish agents are cloud-based browser sessions that navigate any website — static, JavaScript-rendered, or requiring multi-step workflows — and return machine-readable structured data without requiring you to manage browser infrastructure, proxy rotation, or session handling.&lt;/p&gt;

&lt;p&gt;The right tool for structured data extraction depends entirely on what kind of website you're dealing with. This guide presents a four-tier framework: most sites fall into Tier 1 or 2, where you don't need TinyFish at all. Tier 3 and 4 are where managed infrastructure earns its cost.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prerequisites for the code examples:&lt;/strong&gt; Python 3.8+. Install: &lt;code&gt;pip install requests feedparser playwright tinyfish&lt;/code&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Responsible use note:&lt;/strong&gt; Always review a site's terms of service before automated data extraction. Public-facing data for research, competitive intelligence, and non-commercial analysis is generally low-risk. Extraction at scale for data resale or targeting platform users requires careful legal review.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  What Counts as "Structured Data Extraction"?
&lt;/h2&gt;

&lt;p&gt;The goal is machine-readable output — JSON, CSV, a typed object — rather than raw HTML or a PDF dump. A scraper that returns &lt;code&gt;&amp;lt;div class="price"&amp;gt;$29.99&amp;lt;/div&amp;gt;&lt;/code&gt; isn't done; a scraper that returns &lt;code&gt;{ "price": 29.99, "currency": "USD" }&lt;/code&gt; is.&lt;/p&gt;

&lt;p&gt;The right tool depends on what kind of site you're dealing with. Tier 1 sites are trivially easy. Tier 4 requires the most capability. Most real-world extraction projects span multiple tiers — a single pipeline might hit Tier 1 sources for some data and Tier 3 sources for others.&lt;/p&gt;




&lt;h2&gt;
  
  
  Four Tiers of Website Complexity
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tier&lt;/th&gt;
&lt;th&gt;Site type&lt;/th&gt;
&lt;th&gt;What makes it hard&lt;/th&gt;
&lt;th&gt;Right tool&lt;/th&gt;
&lt;th&gt;Code complexity&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Has an API or RSS feed&lt;/td&gt;
&lt;td&gt;Nothing — use the API&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;requests&lt;/code&gt; + JSON&lt;/td&gt;
&lt;td&gt;Trivial&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;JS-rendered, no API&lt;/td&gt;
&lt;td&gt;Content loads after JS executes&lt;/td&gt;
&lt;td&gt;Playwright (local) or TinyFish Fetch (managed)&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Strict automation requirements at scale&lt;/td&gt;
&lt;td&gt;Infrastructure gaps cause failures in production&lt;/td&gt;
&lt;td&gt;TinyFish Fetch with browser&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Authenticated or multi-step workflow&lt;/td&gt;
&lt;td&gt;Session state, conditional navigation, decisions&lt;/td&gt;
&lt;td&gt;TinyFish Web Agent&lt;/td&gt;
&lt;td&gt;Handled for you&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Tiers 3 and 4 aren't edge cases — they're where production data pipelines typically land once you move beyond toy examples.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqmhgkj6eukba6azx0ktb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqmhgkj6eukba6azx0ktb.png" alt="Four-tier web extraction decision framework: API feeds, JS-rendered, strict automation, authenticated workflows" width="340" height="620"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Tier 1 — Sites with APIs or RSS Feeds
&lt;/h2&gt;

&lt;p&gt;Always check for an official API before writing a scraper. Many sites that look like scraping targets have well-documented APIs that return exactly the structured data you need.&lt;/p&gt;

&lt;p&gt;Where to look: &lt;code&gt;/robots.txt&lt;/code&gt;, the site footer ("Developers" / "API" links), documentation subdomains, RapidAPI, or just searching &lt;code&gt;"[site name] API documentation"&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="c1"&gt;# Example: a site that returns JSON at a predictable endpoint
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://data-source.example.com/api/v1/products&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer your_api_key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;category&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;electronics&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;format&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;products&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;  &lt;span class="c1"&gt;# {'id': '123', 'name': 'Widget A', 'price': 29.99}
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;RSS feeds are Tier 1 for content monitoring:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;feedparser&lt;/span&gt;

&lt;span class="n"&gt;feed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;feedparser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://news-source.example.com/feed.xml&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;articles&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;link&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;published&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;published&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;feed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;entries&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the best case. It's fast, free, respects the site's intended access pattern, and won't break when page layouts change.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tier 2 — JavaScript-Rendered Pages
&lt;/h2&gt;

&lt;p&gt;When a site loads content after JavaScript executes, &lt;code&gt;requests&lt;/code&gt; alone returns an empty or incomplete page. You need a browser that runs JavaScript.&lt;/p&gt;

&lt;p&gt;Two equally valid approaches, each with a clear use case:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Playwright&lt;/strong&gt; — the right choice when you're running locally, need a controlled environment, or are integrating into an existing test suite. Free to use, open-source, excellent documentation.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;playwright.sync_api&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sync_playwright&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;sync_playwright&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;browser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chromium&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;launch&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new_page&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;goto&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://js-rendered-site.example.com/catalog&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wait_for_selector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.product-card&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# wait for JS to render
&lt;/span&gt;
    &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        () =&amp;gt; Array.from(document.querySelectorAll(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;.product-card&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;)).map(el =&amp;gt; ({
            name: el.querySelector(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;.name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;).innerText,
            price: el.querySelector(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;.price&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;).innerText,
        }))
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;TinyFish Fetch API&lt;/strong&gt; — the right choice when you want zero infrastructure overhead: no browser process to manage, no selector maintenance when the site updates, no proxy setup. Returns markdown or structured text from the live rendered page.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.fetch.tinyfish.ai&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;X-API-Key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TINYFISH_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]},&lt;/span&gt;
    &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;urls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://js-rendered-site.example.com/catalog&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;format&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;markdown&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[])&lt;/span&gt;
&lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;  &lt;span class="c1"&gt;# full page content after JS renders
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For simple JS-rendered pages where content is fully visible after load, either approach works. Playwright gives you more control; TinyFish Fetch eliminates infrastructure management.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tier 3 — Sites with Strict Automation Requirements
&lt;/h2&gt;

&lt;p&gt;Playwright requires significant infrastructure work to run reliably on sites with strict automation requirements at scale. The gaps compound:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Proxy management&lt;/strong&gt; — single-IP requests fail rate limits; rotating proxies require sourcing, billing, and maintenance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Session handling&lt;/strong&gt; — requests with stale or suspicious session fingerprints get challenged more frequently&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Concurrency&lt;/strong&gt; — running 50 Playwright instances locally saturates memory; running them in cloud containers requires container orchestration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;TinyFish Fetch API handles the infrastructure layer — proxy routing, session management, and request handling at the infrastructure level. Your extraction code stays the same.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.fetch.tinyfish.ai&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;X-API-Key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TINYFISH_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]},&lt;/span&gt;
    &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;urls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://strict-requirements-site.example.com/listings&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;format&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;markdown&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Check for errors — failed URLs don't consume credits
&lt;/span&gt;&lt;span class="n"&gt;result_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;errors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;errors&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[])&lt;/span&gt;
&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[])&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Fetch incomplete. Errors:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Fetch API is free on all TinyFish plans — zero credits consumed per request. You pay only for Web Agent steps (Tier 4). For Tier 3 use cases, this means running extraction at scale without per-request credit costs.&lt;/p&gt;

&lt;p&gt;For running extraction across many URLs in parallel, see &lt;a href="https://tinyfish.ai/docs?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;how to run 1,000 parallel web requests with the TinyFish Fetch API&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tier 4 — Authenticated or Multi-Step Workflows
&lt;/h2&gt;

&lt;p&gt;When extraction requires navigating through multiple pages, making decisions based on page content, or maintaining session state across requests — this is where a goal-directed web agent is the right abstraction.&lt;/p&gt;

&lt;p&gt;Framing note: Tier 4 workflows are appropriate when you're accessing accounts and portals you are authorized to use — your own organization's tools, accounts you operate, or systems where you have explicit access rights.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tinyfish&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TinyFish&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TinyFish&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# reads TINYFISH_API_KEY from env
&lt;/span&gt;
&lt;span class="c1"&gt;# Example: extract a structured report from a multi-step portal
&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://portal.example.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;goal&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Navigate to the Analytics section and extract the monthly summary.
    Return JSON with: { month, revenue, units_sold, top_products: string[] }
    Use the authorized credentials already stored in Vault.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# COMPLETED means the session ran — not that the goal succeeded.
# Check both infrastructure-level and goal-level results:
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;hasattr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FAILED&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Infrastructure error:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;getattr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;unknown&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;failure&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Agent could not complete goal:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reason&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Extracted:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key distinction from Tier 3: the Web Agent doesn't just fetch a URL — it understands a goal, navigates through steps to reach it, and returns structured output. Session state, conditional navigation ("if the report isn't available yet, check back in 10 minutes"), and multi-page workflows are handled by the agent, not by your code.&lt;/p&gt;




&lt;h2&gt;
  
  
  Choosing the Right Approach
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Does the site have an API or RSS feed?
  → Yes: Use the API (Tier 1). No further tooling needed.
  → No: Is the content JavaScript-rendered?
       → No (static HTML): requests + BeautifulSoup for simple cases.
       → Yes: Is this a local/controlled environment?
              → Yes: Playwright (Tier 2a).
              → No, or you need managed scale: TinyFish Fetch API (Tier 2b or 3).
                   → Does the workflow require multi-step navigation or session state?
                          → Yes: TinyFish Web Agent (Tier 4).
                          → No: TinyFish Fetch API (Tier 3).
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Use case&lt;/th&gt;
&lt;th&gt;Best tool&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Site has a public API&lt;/td&gt;
&lt;td&gt;Use the API&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Static HTML, simple extraction&lt;/td&gt;
&lt;td&gt;requests + BeautifulSoup&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;JS-rendered, local or test environment&lt;/td&gt;
&lt;td&gt;Playwright&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;JS-rendered, production at scale&lt;/td&gt;
&lt;td&gt;TinyFish Fetch&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Strict automation requirements at scale&lt;/td&gt;
&lt;td&gt;TinyFish Fetch (browser: true)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-step workflow or session state&lt;/td&gt;
&lt;td&gt;TinyFish Web Agent&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Most production extraction projects combine multiple tiers. An e-commerce monitoring pipeline might use a site's API for product catalog (Tier 1), TinyFish Fetch for pricing pages (Tier 3), and TinyFish Web Agent for authenticated inventory portals (Tier 4).&lt;/p&gt;




&lt;p&gt;The right extraction tool is the one that matches the complexity of what you're extracting. For Tier 1 and 2, the answer is almost always free open-source tools. For Tier 3 and 4, managed infrastructure pays for itself in engineering time avoided. Start with the simplest tool that works, and upgrade when the complexity demands it.&lt;/p&gt;




&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What is the simplest way to extract structured data from a website?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The simplest approach is checking whether the site has a public API or RSS feed first — if it does, use that directly and skip browser automation entirely. For sites without APIs, Python's &lt;code&gt;requests&lt;/code&gt; library works for static HTML. JavaScript-rendered sites require a browser; Playwright for local use, TinyFish Fetch API for managed-scale extraction.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When does Playwright break at scale?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Playwright is reliable for local and small-scale use. In production environments running hundreds or thousands of daily extractions, the infrastructure requirements compound: proxy sourcing and rotation, browser process memory limits, session fingerprint freshness, and container orchestration for concurrency. TinyFish Fetch handles this infrastructure layer, keeping your extraction code simple.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is the TinyFish Fetch API and how is it different from requests?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;TinyFish Fetch API runs a full browser session and returns the page content after JavaScript execution — similar to what Playwright returns, but without the infrastructure overhead. Unlike Python's &lt;code&gt;requests&lt;/code&gt;, which only performs an HTTP GET and returns the raw server response, TinyFish Fetch renders the complete DOM including dynamically loaded content. The Fetch API is free on all TinyFish plans — zero credits per successful request.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When do I need the Web Agent instead of the Fetch API?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Use the Web Agent when your extraction involves decisions, navigation across multiple pages, or session state that must persist across steps. If you're fetching a single URL and parsing the response, Fetch is sufficient. If you need to "navigate to the quarterly report, download the CSV, and extract the revenue line" — that's a multi-step goal requiring the Web Agent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is extracting publicly visible data legal?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Public-facing data extraction for research, competitive intelligence, and non-commercial purposes is generally low-risk in most jurisdictions. Terms of service are contractual (not legal) restrictions and their enforceability varies by context. Data that requires authentication — content behind login pages — is in a different category: only access portals and accounts you are authorized to use. Consult legal counsel for commercial data products or any extraction at significant scale.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do I extract structured data from a site that requires login?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Use TinyFish Web Agent with Vault, which stores credentials securely and injects them into the browser session. The agent navigates the authentication flow using credentials for accounts you are authorized to access, then continues to the target content. This is appropriate for your own organizational tools, SaaS dashboards, and portals where you hold the account.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Related Reading:&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://tinyfish.ai/blog/live-web-vs-cached-web-why-your-ai-agent-needs-real-time-data?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;Live Web vs Cached Web: Why Your AI Agent Needs Real-Time Data&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://tinyfish.ai/blog/find-real-world-code-examples-ai-agents?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;Find Real-World Code Examples with AI Agents on GitHub and Stack Overflow&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Want to scrape the web without getting blocked? Try &lt;a href="https://tinyfish.ai?utm_source=devto" rel="noopener noreferrer"&gt;TinyFish&lt;/a&gt; — a browser API built for AI agents and developers.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>webscraping</category>
      <category>dataextraction</category>
      <category>tutorial</category>
      <category>playwright</category>
    </item>
    <item>
      <title>Fetching Data from a Large URL List: The Complete Decision Guide</title>
      <dc:creator>Tinyfishie</dc:creator>
      <pubDate>Tue, 19 May 2026 11:13:31 +0000</pubDate>
      <link>https://dev.to/tinyfishie/fetching-data-from-a-large-url-list-the-complete-decision-guide-2kcc</link>
      <guid>https://dev.to/tinyfishie/fetching-data-from-a-large-url-list-the-complete-decision-guide-2kcc</guid>
      <description>&lt;p&gt;You have a list of 500 URLs — competitor product pages, supplier portals, job listings, or real estate listings. You need the data from each one.&lt;/p&gt;

&lt;p&gt;The answer to "which tool fetches this data reliably" depends on what's in that list — not on how many URLs there are.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's in your list → which tool:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;All static HTML, no strict automation requirements → &lt;code&gt;requests&lt;/code&gt; + &lt;code&gt;httpx&lt;/code&gt; (fastest, cheapest)&lt;/li&gt;
&lt;li&gt;JavaScript-rendered content, no strict automation requirements → Playwright or Crawlee&lt;/li&gt;
&lt;li&gt;Mixed list with some protected sites → Playwright + proxy rotation&lt;/li&gt;
&lt;li&gt;Protected or authenticated URLs at scale → &lt;a href="https://tinyfish.ai?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;TinyFish Web Agent&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Massive volume (100K+) of public pages → Scrapy&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;em&gt;(Not sure what's in your list? The URL classification probe in the Mixed List section below runs in seconds before you commit to a full crawl.)&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Tool That Fits the List
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Static HTML at Volume: requests + asyncio
&lt;/h3&gt;

&lt;p&gt;If your URLs are documentation pages, blog posts, static product catalogs, or any content that loads fully in the initial HTML response, Python's &lt;code&gt;requests&lt;/code&gt; library with async execution is the fastest and cheapest option—often by a large margin.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AsyncClient&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;html&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;crawl_list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;urls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;concurrency&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;AsyncClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;follow_redirects&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;urls&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;concurrency&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;batch&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;urls&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;concurrency&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="n"&gt;batch_results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;gather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
            &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;batch_results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Processed &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;concurrency&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;urls&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;urls&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;urls.txt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;urls&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()]&lt;/span&gt;

&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;crawl_list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;urls&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In our testing, this handles 1,000 static URLs in under a minute on a standard laptop. For 100K+ URLs, Scrapy's built-in scheduler, downloader middleware, and item pipeline make more sense—it handles deduplication, retry logic, and output formatting at Scrapy's architecture level.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where this breaks down:&lt;/strong&gt; Any URL that requires JavaScript execution. If the page shows a loading spinner and populates content after load, &lt;code&gt;requests&lt;/code&gt; returns the spinner HTML, not the content.&lt;/p&gt;

&lt;h3&gt;
  
  
  JavaScript Content: Playwright with Batching
&lt;/h3&gt;

&lt;p&gt;For lists where content loads via JavaScript—React SPAs, infinite scroll, dynamic filtering, price tables that render after an API call—you need a real browser.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;playwright.async_api&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;async_playwright&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fetch_js&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;goto&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;wait_until&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;networkidle&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;30000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;content&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;html&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;crawl_js_list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;urls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;concurrency&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;async_playwright&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;browser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;browser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chromium&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;launch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;headless&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;urls&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;concurrency&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                &lt;span class="n"&gt;batch&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;urls&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;concurrency&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
                &lt;span class="n"&gt;pages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new_page&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
                &lt;span class="n"&gt;batch_results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;gather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
                    &lt;span class="nf"&gt;fetch_js&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="p"&gt;])&lt;/span&gt;
                &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;pages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
                &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;batch_results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;finally&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Keep concurrency low (3–8 pages) when running locally—each headless Chromium instance consumes 100–300MB. For larger lists, cloud browser infrastructure (Browserless, Browserbase) handles the browser pool so you're not resource-limited on your machine.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where this breaks down:&lt;/strong&gt; Sites with strict automation requirements at the network and behavioral level. JavaScript-level automation handling helps at low volume; at scale, sites with enterprise-grade access requirements become harder to handle reliably.&lt;/p&gt;

&lt;h3&gt;
  
  
  Sites with Strict Requirements or Authenticated Access: TinyFish
&lt;/h3&gt;

&lt;p&gt;This is where simple HTTP requests stop being sufficient. Your list includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Product pages that return different content to automation than to browsers&lt;/li&gt;
&lt;li&gt;Pricing pages that require login using your own authorized account&lt;/li&gt;
&lt;li&gt;Sites with strict automation requirements that affect reliability at scale&lt;/li&gt;
&lt;li&gt;Authenticated portals where each URL requires an authorized session&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For these, maintaining a Playwright-based crawler means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Managing automation configuration that needs ongoing updates as site requirements evolve&lt;/li&gt;
&lt;li&gt;Building session management for authenticated URLs&lt;/li&gt;
&lt;li&gt;Handling multi-step login flows and session state&lt;/li&gt;
&lt;li&gt;Debugging failures that change based on site configurations you don't control&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AI web agents handle this at the infrastructure level. You pass a URL and a goal; the agent handles rendering, infrastructure-level request handling, and authentication for sites where you have authorized access.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;aiohttp&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;crawl_url&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;goal&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://agent.tinyfish.ai/v1/automation/run&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;X-API-Key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TINYFISH_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]},&lt;/span&gt;
        &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;goal&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;goal&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;aiohttp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ClientTimeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;120&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;HTTP_ERROR&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;
        &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="c1"&gt;# "COMPLETED" means the run finished — not that the goal succeeded.
&lt;/span&gt;        &lt;span class="c1"&gt;# Check for TASK_FAILED / SITE_BLOCKED / TIMEOUT before using result.
&lt;/span&gt;        &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;COMPLETED&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;crawl_protected_list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;urls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;goal&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;concurrency&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;aiohttp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ClientSession&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;urls&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;concurrency&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;batch&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;urls&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;concurrency&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="n"&gt;batch_results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;gather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="nf"&gt;crawl_url&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;goal&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt;
            &lt;span class="p"&gt;])&lt;/span&gt;
            &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;batch_results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Processed &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;concurrency&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;urls&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;urls&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;

&lt;span class="n"&gt;urls&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://protected-site.com/product/1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://protected-site.com/product/2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;goal&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Extract the product name, current price, and availability status. Return as JSON.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;crawl_protected_list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;urls&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;goal&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The concurrency limit is determined by your plan—10 concurrent agents on Starter, 50 on Pro. For a 1,000-URL list on Pro, that's 20 sequential batches of 50.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When the math shifts:&lt;/strong&gt; requests and Playwright are cheaper per-URL on cooperative, stable sites. The cost calculation changes when you include the full stack: cloud browser pools, proxy subscriptions, and ongoing maintenance. In our experience, production scrapers for sites that actively update their access requirements need patches every few weeks — that maintenance overhead accumulates faster than per-URL cost alone suggests. For URL lists with meaningful protected or authenticated content, the total operational cost typically exceeds TinyFish's per-step pricing before you reach production scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  Handling the Mixed List
&lt;/h2&gt;

&lt;p&gt;Real URL lists are rarely uniform. A supplier monitoring list might include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;60% static pricing pages (requests would work)&lt;/li&gt;
&lt;li&gt;30% JavaScript-rendered product tables (Playwright needed)&lt;/li&gt;
&lt;li&gt;10% authenticated portals with strict automation requirements (agents needed)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The practical approach: categorize your list before you crawl it. A quick HEAD request or a sample run reveals which URLs respond to simple HTTP requests vs. which require rendering vs. which block automation. Route each category to the appropriate tool. The 10% that requires agents is where reliability actually matters — authentication failures and automation blocks are what stall production workflows, not the cooperative pages.&lt;/p&gt;

&lt;p&gt;To classify URLs before routing them, a quick probe is faster than a full crawl:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;classify_url&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Returns &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;static&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;js&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, or &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;blocked&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; based on a quick probe.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;follow_redirects&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                      &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;User-Agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Mozilla/5.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;401&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;403&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;429&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;blocked&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;html&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;
        &lt;span class="n"&gt;js_signals&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;html&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                          &lt;span class="c1"&gt;# near-empty response
&lt;/span&gt;            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;div id=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;root&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;html&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;              &lt;span class="c1"&gt;# React
&lt;/span&gt;            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;div id=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;app&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;html&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;               &lt;span class="c1"&gt;# Vue
&lt;/span&gt;            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ng-version=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;html&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                    &lt;span class="c1"&gt;# Angular
&lt;/span&gt;            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;window.__NUXT__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;html&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                &lt;span class="c1"&gt;# Nuxt
&lt;/span&gt;            &lt;span class="n"&gt;html&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;p&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;html&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;2000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# minimal real content
&lt;/span&gt;        &lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;js&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;js_signals&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;static&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;blocked&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Sample 10% before committing to the full crawl
&lt;/span&gt;&lt;span class="n"&gt;sample&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sample&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;urls&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;urls&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
&lt;span class="n"&gt;categories&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;static&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;js&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;blocked&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[]}&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;sample&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;categories&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;classify_url&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)].&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Static: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;categories&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;static&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, JS: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;categories&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;js&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, Blocked: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;categories&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;blocked&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Route the full list to the matching tool based on these proportions
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A &lt;code&gt;429&lt;/code&gt; response means rate-limited — retry with backoff before escalating. A &lt;code&gt;403&lt;/code&gt; indicates access is blocked or restricted; retrying with the same tool won't help. A near-empty response or JS framework marker means JS rendering is needed. Clean HTML with visible &lt;code&gt;&amp;lt;p&amp;gt;&lt;/code&gt; tags is static.&lt;/p&gt;

&lt;h2&gt;
  
  
  Scale Considerations
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;List size&lt;/th&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Rough time (10 concurrent)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;100–1,000 static&lt;/td&gt;
&lt;td&gt;requests/httpx&lt;/td&gt;
&lt;td&gt;1–5 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;100–1,000 JS&lt;/td&gt;
&lt;td&gt;Playwright&lt;/td&gt;
&lt;td&gt;5–20 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;100–1,000 protected&lt;/td&gt;
&lt;td&gt;TinyFish agents&lt;/td&gt;
&lt;td&gt;10–30 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10,000+ static&lt;/td&gt;
&lt;td&gt;Scrapy&lt;/td&gt;
&lt;td&gt;Hours, distributed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10,000+ JS or protected&lt;/td&gt;
&lt;td&gt;Infrastructure + agents&lt;/td&gt;
&lt;td&gt;Plan accordingly&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For very large lists (100K+), distributed architecture matters regardless of tool—whether that's Scrapy's built-in scheduler, a task queue like Celery, or submitting batches to an async agent API and polling for results.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What's the fastest way to fetch data from a large URL list in Python?
&lt;/h3&gt;

&lt;p&gt;For static HTML content, &lt;code&gt;httpx&lt;/code&gt; with &lt;code&gt;asyncio&lt;/code&gt; is the fastest approach—you can process 20–50 URLs simultaneously with a single machine and finish 1,000 URLs in under a minute. The key is async execution: sequential requests would take 10–15x longer for the same list. For JavaScript-rendered content, Playwright in async mode with 5–10 concurrent browser pages is the practical ceiling before memory constraints become a factor on standard hardware.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I improve reliability when fetching data from many URLs?
&lt;/h3&gt;

&lt;p&gt;Rate limiting is the first line: 1–2 requests per second per domain for most sites, slower for aggressive protection. Rotate user agents across requests. For moderate protection, &lt;code&gt;requests&lt;/code&gt; with a realistic user agent and reasonable delays works. For sites with enterprise-grade automation detection, JavaScript-level automation plugins help at low volume but degrade at scale — TinyFish provides infrastructure-level browser sessions that are more reliable for protected sites at production scale.&lt;/p&gt;

&lt;h3&gt;
  
  
  Should I use Scrapy or Playwright for a large URL list?
&lt;/h3&gt;

&lt;p&gt;Scrapy if your URLs return static HTML and you need high volume (10K+) with built-in scheduling, retry logic, and output pipelines. Playwright if URLs require JavaScript execution. The two aren't mutually exclusive—Scrapy has a Playwright middleware (&lt;code&gt;scrapy-playwright&lt;/code&gt;) that handles JS rendering within Scrapy's architecture. For lists with mixed content types, start with Scrapy for the static subset and use a separate Playwright job for the JS-heavy URLs.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I deduplicate URLs before crawling?
&lt;/h3&gt;

&lt;p&gt;Normalize URLs first: lowercase the scheme and domain, sort query parameters alphabetically, strip tracking parameters (&lt;code&gt;utm_*&lt;/code&gt;, &lt;code&gt;ref=&lt;/code&gt;, &lt;code&gt;fbclid=&lt;/code&gt;), and resolve relative URLs to absolute. Python's &lt;code&gt;urllib.parse.urlparse&lt;/code&gt; plus a set for deduplication handles most cases. For large lists with near-duplicate URLs (same page, different session IDs), a URL fingerprinting library like &lt;code&gt;w3lib.url.canonicalize_url&lt;/code&gt; gives more aggressive deduplication.&lt;/p&gt;

&lt;h3&gt;
  
  
  When does crawling a URL list require authentication?
&lt;/h3&gt;

&lt;p&gt;When the target pages are behind login walls that your team has authorized access to—supplier pricing portals, internal tools, subscription content, or any page that redirects to a login page for unauthenticated requests. Signs your list needs auth: all results return the same HTML (the login page), response sizes are suspiciously uniform, or you see redirect chains ending at &lt;code&gt;/login&lt;/code&gt;. For authenticated list crawling at scale, session management becomes the primary complexity—handling login flows, session expiry, and re-authentication across many concurrent workers. TinyFish handles session management and multi-step login flows for sites where you have authorized account access — you provide credentials, the agent handles the rest.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try TinyFish Free
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;500 free steps, no credit card.&lt;/strong&gt; The fastest way to test whether TinyFish fits your workflow.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://agent.tinyfish.ai/sign-up?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;Start free →&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Related Reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pillar:&lt;/strong&gt; &lt;a href="https://tinyfish.ai/blog/the-best-web-scraping-tools-in-2026?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;The Best Web Scraping Tools in 2026&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://tinyfish.ai/blog/how-to-monitor-1000-websites-in-parallel-tinyfish-api?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;How to Monitor 1,000 Websites in Parallel with the TinyFish API&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://tinyfish.ai/blog/scrape-dynamic-websites-without-playwright?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;Scraping Dynamic Websites: When Playwright Is the Right Tool&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://tinyfish.ai/blog/why-stealth-plugins-fail-bot-detection?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;Why Your Stealth Plugin Isn't Working (And What Actually Does)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Want to scrape the web without getting blocked? Try &lt;a href="https://tinyfish.ai?utm_source=devto" rel="noopener noreferrer"&gt;TinyFish&lt;/a&gt; — a browser API built for AI agents and developers.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>datafetching</category>
      <category>urllist</category>
      <category>webautomation</category>
      <category>python</category>
    </item>
    <item>
      <title>Web Agents for Sales Intelligence: Conference Lead Extraction at Scale</title>
      <dc:creator>Tinyfishie</dc:creator>
      <pubDate>Tue, 19 May 2026 10:06:52 +0000</pubDate>
      <link>https://dev.to/tinyfishie/web-agents-for-sales-intelligence-conference-lead-extraction-at-scale-3kb2</link>
      <guid>https://dev.to/tinyfishie/web-agents-for-sales-intelligence-conference-lead-extraction-at-scale-3kb2</guid>
      <description>&lt;p&gt;Conference websites are a signal-dense environment for sales teams. Exhibitors have budget. Sponsors have intent. Speakers have influence. Attendees self-select into a category of people who care enough about a topic to show up in person. For a well-targeted go-to-market motion, a single conference list can be worth more than months of cold outreach to a generic ICP database.&lt;/p&gt;

&lt;p&gt;The problem is extraction. &lt;em&gt;Most conference websites are built differently.&lt;/em&gt; Exhibitor lists appear in different formats — some as HTML tables, some as filterable grids, some as PDFs, some as paginated directories with inconsistent naming conventions. Speaker pages vary. Sponsor tiers are labeled differently. And most importantly: the information you need is rarely in a format your CRM can ingest directly.&lt;/p&gt;

&lt;p&gt;Traditional automation — extraction with CSS selectors — breaks the moment a conference organizer redesigns their website, switches platforms, or adds a new filter layer. Which they do, regularly, because conference websites are rebuilt for each event. An extraction that worked for a 2025 conference often fails completely for the same conference in 2026, because the organizer switched from a custom site to Hopin or Swapcard or their own new platform.&lt;/p&gt;

&lt;p&gt;Web agents handle this because they read the page rather than pattern-match against it. This article covers how that works in practice, with working code for each pattern.&lt;/p&gt;

&lt;p&gt;Web agents for sales intelligence are automated programs that navigate live web pages — conference exhibitor directories, sponsor listings, speaker profiles — on behalf of a sales team, extracting structured prospect data without requiring custom code per site. Unlike traditional scrapers that depend on fixed CSS selectors, a web agent interprets page content by goal: given the instruction "find all exhibitors and return their company name, website, and category," the agent reads the page as a browser would, handles pagination and dynamic rendering, and returns clean JSON regardless of how the underlying HTML is structured. The result is a scalable data pipeline that monitors dozens of events monthly and feeds enriched leads directly into a CRM.&lt;/p&gt;




&lt;h2&gt;
  
  
  When does a web agent apply to sales intelligence?
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Your lead sources are conference and event websites&lt;/strong&gt; — exhibitor lists, speaker directories, sponsor pages&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;em&gt;Most event sites have different structures&lt;/em&gt; — making traditional extraction brittle&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;You're covering dozens of events monthly&lt;/strong&gt; — the scale makes manual extraction untenable&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;You need structured output into your CRM&lt;/strong&gt; — not PDFs or copy-paste, but clean data your tools can process&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Time-to-data matters&lt;/strong&gt; — conference lead data has a shelf life; the sooner it's in your pipeline, the more valuable it is&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  The operational problem at scale
&lt;/h2&gt;

&lt;p&gt;For sales teams tracking dozens of conferences per month, each conference website is different. Every time a site updates its layout or changes platforms, platform-specific scrapers break. The engineering overhead of maintaining those scrapers across that many events is disproportionate to the value of any individual conference.&lt;/p&gt;

&lt;p&gt;Web agents navigate conference websites by reading page content and structure directly — adapting to each site's layout without requiring custom code per event. When a site changes layouts, the agent adapts. When a new conference is added, it requires a goal prompt, not a new scraper.&lt;/p&gt;

&lt;p&gt;The operational outcome: dozens of events monitored monthly, structured lead data delivered automatically, zero custom code required per event.&lt;/p&gt;




&lt;h2&gt;
  
  
  The technical pattern: structured lead extraction from event sites
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;📸 IMAGE — Diagram showing web agents extracting structured lead data from diverse conference websites and delivering to CRM&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Conference sites present a range of extraction challenges — paginated exhibitor lists, filterable sponsor grids, speaker bio pages. The agent handles all of these with the same underlying approach: describe the goal, let the agent navigate.&lt;/p&gt;

&lt;h3&gt;
  
  
  Installation
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;tinyfish
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;TINYFISH_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sk-tinyfish-&lt;span class="k"&gt;*****&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Exhibitor and sponsor extraction
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timezone&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tinyfish&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AsyncTinyFish&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;BrowserProfile&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AsyncTinyFish&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;CONFERENCE_EVENTS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="c1"&gt;# URLs below are illustrative — replace with verified event page URLs before running
&lt;/span&gt;    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;event_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SaaStr Annual 2026&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;exhibitors_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://tech-conference-example.com/exhibitors&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;target_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;exhibitor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;event_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Dreamforce 2026&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;exhibitors_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://enterprise-summit-example.com/sponsors/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;target_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sponsor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;event_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;RevOps Summit 2026&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;exhibitors_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://revenue-summit-example.com/speakers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;target_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;speaker&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;extract_event_leads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;target&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;target_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="n"&gt;goal&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Extract all &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;s listed on this page.

    For each &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, extract:
    {{
        &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;company_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,
        &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;website&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;URL or null&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,
        &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;category&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;industry or booth category if shown, else null&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,
        &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tier&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sponsor tier (Gold/Silver/Bronze/etc.) or null if not applicable&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,
        &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;contact_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;primary contact name if shown, else null&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,
        &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;contact_title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;job title if shown, else null&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,
        &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;one-sentence company description if shown, else null&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;
    }}

    If the list is paginated, click through all pages before returning.
    If there is a &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Load More&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; button, click it until all items are loaded.
    Scroll down to ensure all visible items are captured.

    Return JSON:
    {{
        &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;event_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;event_name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,
        &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;target_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,
        &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;total_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: number,
        &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;leads&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: [ ...array of lead objects above... ]
    }}

    If the list requires login to your authorized account to view, return total_count as 0 and leads as [].
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;
&lt;span class="err"&gt;!&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Diagram&lt;/span&gt; &lt;span class="n"&gt;showing&lt;/span&gt; &lt;span class="n"&gt;web&lt;/span&gt; &lt;span class="n"&gt;agents&lt;/span&gt; &lt;span class="n"&gt;extracting&lt;/span&gt; &lt;span class="n"&gt;structured&lt;/span&gt; &lt;span class="n"&gt;lead&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;diverse&lt;/span&gt; &lt;span class="n"&gt;conference&lt;/span&gt; &lt;span class="n"&gt;websites&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;delivering&lt;/span&gt; &lt;span class="n"&gt;to&lt;/span&gt; &lt;span class="n"&gt;CRM&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="n"&gt;https&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="n"&gt;cdn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sanity&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;io&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;nhc04xln&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;production&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mf"&gt;33e59&lt;/span&gt;&lt;span class="n"&gt;ca62cc7afbfb67a6125992ee957b1abf1b6&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="n"&gt;x1024&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;png&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# (continued)
&lt;/span&gt;    &lt;span class="n"&gt;Do&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;click&lt;/span&gt; &lt;span class="nb"&gt;any&lt;/span&gt; &lt;span class="n"&gt;registration&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;payment&lt;/span&gt; &lt;span class="n"&gt;links&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;

    response = await client.agent.run(
        url=event[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;exhibitors_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;],
        goal=goal,
        browser_profile=BrowserProfile.MANAGED,
    )

    # For debugging: response.streaming_url has a live browser replay (valid 24h)
    # response.result is shaped by the goal — leads array is the key output
    result = response.result or {}

    if result.get(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;) == &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;failure&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;:
        return {
            &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;event_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: event[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;event_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;],
            &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;target_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: target,
            &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;leads&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: [],
            &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: result.get(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reason&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;goal_failed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;),
            &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;extracted_at&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: datetime.now(timezone.utc).isoformat(),
        }

    return {
        &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;event_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: event[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;event_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;],
        &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;target_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: target,
        &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;total_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: result.get(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;total_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, 0),
        &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;leads&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: result.get(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;leads&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, []),
        &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;extracted_at&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: datetime.now(timezone.utc).isoformat(),
    }

async def main():
    tasks = [extract_event_leads(event) for event in CONFERENCE_EVENTS]
    results = await asyncio.gather(*tasks)

    total_leads = sum(r[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;total_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;] for r in results)
    print(f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Total leads extracted: {total_leads} across {len(results)} events&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;)
    print(json.dumps(results, indent=2))

asyncio.run(main())
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Output schema
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"event_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"SaaStr Annual 2026"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"target_type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"exhibitor"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"total_count"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;312&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"leads"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"company_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Acme Software"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"website"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://vendor-a.example.com"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"category"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"CRM &amp;amp; Sales Tools"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"tier"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"contact_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"contact_title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"AI-powered CRM for sales teams"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"company_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"BuildIt Inc."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"website"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://vendor-b.example.com"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"category"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"DevTools"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"tier"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Gold Sponsor"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"contact_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Sarah Chen"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"contact_title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"VP of Sales"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"extracted_at"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-03-27T09:00:01Z"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Enriching leads with company-level data
&lt;/h2&gt;

&lt;p&gt;Raw exhibitor lists give you company names. What your CRM needs is enriched records — employee count, funding stage, tech stack, recent news. A second agent pass can pull this from each company's website or LinkedIn company page.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;enrich_company&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;company_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;website&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;goal&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Extract company information for: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;company_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
    Website: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;website&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;search for &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;company_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt; &lt;span class="n"&gt;company&lt;/span&gt; &lt;span class="n"&gt;website&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

    Extract:
    {{
        &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;company_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;company_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,
        &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;employee_count_range&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1-10 / 11-50 / 51-200 / 201-500 / 500+ — estimate from About or LinkedIn&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,
        &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;funding_stage&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bootstrapped / Seed / Series A / Series B / Series C+ / Public / Unknown&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,
        &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hq_location&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;City, Country or null&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,
        &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;industry&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;primary industry category&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,
        &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;founded_year&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: number or null,
        &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tagline&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;one-line description from homepage or null&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;
    }}

    Navigate to the website&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s About page or homepage.
    Do not fill in any contact forms.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;website&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Provide the company website URL directly (LinkedIn or company site)
&lt;/span&gt;        &lt;span class="n"&gt;goal&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;goal&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;browser_profile&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;BrowserProfile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;MANAGED&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;failure&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;company_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;company_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reason&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;

&lt;span class="c1"&gt;# Enrich all leads from an extraction run
&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;enrich_all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;leads&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;tasks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="nf"&gt;enrich_company&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lead&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;company_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;lead&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;website&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;lead&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;leads&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;lead&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;company_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;gather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The two-pass pattern — extract from event site, then enrich from company sites — separates concerns cleanly. The first pass is optimized for navigating conference-specific layouts. The second pass is optimized for company data extraction, which follows more consistent patterns across company websites.&lt;/p&gt;




&lt;h2&gt;
  
  
  Handling the common edge cases
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Paginated exhibitor lists&lt;/strong&gt; — The goal prompt includes "click through all pages before returning." For very long lists (500+ exhibitors), break into page-by-page runs and aggregate results, rather than running one very long session that risks timeout.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Filterable grids&lt;/strong&gt; — Some conference sites show exhibitors in a filterable grid by category or tier. Instruct the agent to clear all filters first, or specify which filter to apply: "Select the 'Gold Sponsor' filter before extracting."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PDF exhibitor lists&lt;/strong&gt; — Some conferences publish exhibitor lists as PDFs linked from the main page. The agent can navigate to the link, open the PDF, and extract text content. Include in the goal: "If the exhibitor list is a linked PDF, open it and extract the company names."&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Login-gated lists&lt;/em&gt; — Some premium conference platforms require registration with your own account to view the full attendee list. The goal above returns an empty array in this case. &lt;em&gt;For events where you have authorized credentials, add login steps to the goal for your own accounts.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sites that load content after scroll&lt;/strong&gt; — The goal includes "Scroll down to ensure all visible items are captured." For infinite-scroll lists, add: "Continue scrolling until no new items appear."&lt;/p&gt;




&lt;h2&gt;
  
  
  Scaling across your conference calendar
&lt;/h2&gt;

&lt;p&gt;For a sales team tracking a conference calendar, the right architecture runs this as a scheduled job: extract leads from each event site as it goes live, enrich them, and push to your CRM.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;CONFERENCE_CALENDAR&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;event_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SaaStr Annual&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;exhibitors_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2026-09-15&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;event_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Dreamforce&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;exhibitors_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2026-09-17&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="c1"&gt;# add your full event calendar
&lt;/span&gt;    &lt;span class="c1"&gt;# URLs below are illustrative — replace with verified URLs before running
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Cost estimate (PAYG at $0.015/credit):
# Extraction: 1 agent run per conference, ~10-20 steps (navigation + pagination)
#   → ~$0.15-$0.30 per conference
# Enrichment: 1 agent run per company, ~4 steps each (navigate + find About + extract)
#   → 300 companies × 4 credits × $0.015 = ~$18 per conference
# Total per conference: ~$18-20 for a fully enriched lead list
#
# Note: run steps vary with site complexity. Starter plan ($15/mo) covers ~1,650 credits —
# enough for roughly 4 full extraction + enrichment runs per month.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cost breakdown (PAYG at $0.015/credit): extraction runs as one agent session per conference — ~10-20 steps for a paginated list, roughly $0.15-$0.30. The enrichment pass runs one agent session per company: at ~4 steps each, 300 companies cost approximately $18. Total: roughly $18-20 for a fully structured, CRM-ready lead list. Manual research at equivalent depth costs hours of analyst time per event.&lt;/p&gt;




&lt;h2&gt;
  
  
  Build vs. buy: when extraction is still the right answer
&lt;/h2&gt;

&lt;p&gt;If your conference intelligence needs are narrow — one or two annual events, stable sites, static HTML lists — a simple extraction is cheaper and faster to build. Use it.&lt;/p&gt;

&lt;p&gt;The case for web agents compounds as the number of events scales, as event site diversity increases, and as the maintenance cost of broken scrapers accumulates. At dozens of events monthly, each with a different site structure, maintaining platform-specific scrapers is a full-time job that produces inconsistent results. A goal-based agent is a one-time investment that generalizes across events. &lt;/p&gt;

&lt;p&gt;The other factor is time-to-data. A extraction that breaks on Monday morning when the conference site updates means your team gets stale data for however long it takes engineering to fix it. An agent that reads the page as a human does degrades more gracefully — it might miss some edge cases, but it doesn't break silently.&lt;/p&gt;




&lt;h2&gt;
  
  
  Get started
&lt;/h2&gt;

&lt;p&gt;The free tier gives you 500 credits — enough to extract a full exhibitor list from one or two conference sites and see structured output before committing to production volume.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://tinyfish.ai/signup?utm_source=official_blog_TT&amp;amp;utm_medium=blog&amp;amp;utm_campaign=web-agents-sales-intelligence&amp;amp;utm_content=end_cta_sales_intelligence" rel="noopener noreferrer"&gt;Start free — 500 credits, no credit card required&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For sales teams or platforms tracking dozens of events monthly, &lt;a href="https://tinyfish.ai/contact?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;contact our enterprise team&lt;/a&gt; for volume pricing.&lt;/p&gt;




&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;How does the agent handle conference sites that change every year?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Because the agent reads the page based on intent ("find the exhibitor list") rather than fixed selectors, it adapts to annual redesigns without requiring code changes. The goal prompt may need minor tuning for major structural changes, but it doesn't break the way a selector-based extraction does.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can this extract attendee lists, not just exhibitors?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Attendee lists are usually login-gated on your own account or not published publicly for privacy reasons. Exhibitor, sponsor, and speaker lists are typically public. &lt;em&gt;For gated attendee data in accounts you are authorized to access, include login steps in the goal with your authorized credentials.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What does &lt;code&gt;COMPLETED&lt;/code&gt; status mean?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Infrastructure success — the browser ran. Not data extraction success. Always check &lt;code&gt;result.leads&lt;/code&gt; — an empty array with no error may mean the list genuinely has no entries, or it may mean the page structure wasn't handled. Use the &lt;code&gt;streaming_url&lt;/code&gt; to debug ambiguous cases.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do we push the output to our CRM?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The JSON output maps directly to standard CRM fields. Most CRMs (Salesforce, HubSpot, Pipedrive) have APIs or CSV import that accepts this structure. The enrichment pass adds the firmographic fields most CRMs expect.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is there a limit on how many events we can run concurrently?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Per-plan concurrency limits apply. When exceeded, requests queue automatically — no errors, but later extractions take longer. For a large conference calendar running simultaneously, size your batches to your plan's concurrency limit.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can this handle conference sites in languages other than English?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yes. The agent reads the page as it appears — if the site is in German or Japanese, the agent navigates the German or Japanese interface. Include &lt;code&gt;proxy_config&lt;/code&gt; with the appropriate country code for geo-restricted sites.&lt;/p&gt;




&lt;h2&gt;
  
  
  See It in Action
&lt;/h2&gt;

&lt;p&gt;The free tier includes &lt;strong&gt;500 steps&lt;/strong&gt; — enough to run a complete sales intelligence workflow against real data before committing to a plan.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://agent.tinyfish.ai/sign-up?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;Start free, no credit card →&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Related Reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Pillar:&lt;/strong&gt; &lt;a href="https://tinyfish.ai/blog/ai-web-agents-real-world-use-cases?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;What Can AI Web Agents Actually Do? 10 Real-World Use Cases&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://tinyfish.ai/blog/how-to-monitor-1000-websites-in-parallel-tinyfish-api?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;How to Monitor 1,000 Websites in Parallel with the TinyFish API&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://tinyfish.ai/blog/web-agents-procurement?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;Web Agents for Procurement: Multi-Vendor Portal Automation&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Want to scrape the web without getting blocked? Try &lt;a href="https://tinyfish.ai?utm_source=devto" rel="noopener noreferrer"&gt;TinyFish&lt;/a&gt; — a browser API built for AI agents and developers.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>webagents</category>
      <category>enterprise</category>
      <category>enterprisewebagents</category>
    </item>
    <item>
      <title>Web Agents for Insurance Operations: Prior Auth, Portal Monitoring, and Regulatory Tracking</title>
      <dc:creator>Tinyfishie</dc:creator>
      <pubDate>Tue, 19 May 2026 10:02:49 +0000</pubDate>
      <link>https://dev.to/tinyfishie/web-agents-for-insurance-operations-prior-auth-portal-monitoring-and-regulatory-tracking-3i8g</link>
      <guid>https://dev.to/tinyfishie/web-agents-for-insurance-operations-prior-auth-portal-monitoring-and-regulatory-tracking-3i8g</guid>
      <description>&lt;h1&gt;
  
  
  Web Agents for Insurance Operations: Prior Auth, Portal Monitoring, and Regulatory Tracking
&lt;/h1&gt;




&lt;p&gt;If you work in insurance operations, you already know the portal problem. Every payer has one. Each one is different. Some require two-factor login to your own account. Some time out after 15 minutes of inactivity. Some show prior auth status on page three of a workflow that starts on a dashboard that loads differently depending on which browser you're using.&lt;/p&gt;

&lt;p&gt;Multiply that across 50 payers, and you have a problem that doesn't scale with headcount. Hiring more people to click through more portals isn't a solution — it's a deferral. The portals don't get simpler. The volume doesn't go down.&lt;/p&gt;

&lt;p&gt;Web agents don't replace the portals. They handle the repetitive navigation tasks that humans currently perform manually — logging in, extracting status, and returning structured data that your systems can actually use. This article covers three use cases where that matters most: prior authorization tracking, multi-pharmacy price monitoring, and real-time regulatory document monitoring.&lt;/p&gt;




&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Scope note:&lt;/strong&gt; This article covers insurance operations automation patterns using web agents. The use cases below represent technically valid workflows validated through internal testing, not published case studies from specific TinyFish insurance customers.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  When does a web agent apply to insurance operations?
&lt;/h2&gt;

&lt;p&gt;Before the use cases: if you're evaluating whether this fits your team's situation, here's the honest framework.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;You're checking the same portals repeatedly&lt;/strong&gt; — prior auth status, claim status, eligibility — on a schedule your staff maintains manually&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Portal interfaces vary across payers&lt;/strong&gt; — no two are identical, making traditional RPA brittle&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;You need structured output, not screenshots&lt;/strong&gt; — downstream systems need data, not PDFs&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Volume exceeds what staff can cover&lt;/strong&gt; — 50+ portals, daily or more frequent checks&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Login walls to your authorized payer portals are blocking automation&lt;/strong&gt; — most traditional scrapers stop at the login screen. Web agents using your authorized credentials navigate login flows as a normal part of the workflow&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If your team checks fewer than 10 portals and the data isn't time-sensitive, manual workflows may be cheaper. The case for automation compounds as portal count and check frequency increase.&lt;/p&gt;




&lt;h2&gt;
  
  
  The portal problem: why insurance operations are stuck in 2005
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;📸 IMAGE — Architecture diagram showing TinyFish web agents checking 50 payer portals in parallel returning structured JSON&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Fifty payer portals sounds like a technology problem. It's actually a coordination problem that technology made worse.&lt;/p&gt;

&lt;p&gt;Each payer built their portal independently, on their own timeline, for their own internal reasons. The result is a landscape where every portal has a different login flow, a different session timeout, a different way of presenting the same data. Prior auth status might be on the dashboard for one payer, buried in a sub-menu for another, and only visible after clicking through a three-step workflow for a third.&lt;/p&gt;

&lt;p&gt;Traditional RPA — the previous generation's answer to this — was built around CSS selectors and fixed screen coordinates. When a payer updates their portal layout, the RPA script breaks. Someone has to find it, diagnose it, and fix it before the monitoring gap compounds. In a 50-portal environment, that's a near-continuous maintenance cycle.&lt;/p&gt;

&lt;p&gt;Web agents approach portals through standard browser sessions: reading the page, understanding what's on it, and navigating based on intent rather than coordinates. When a portal layout changes, the goal — "find the prior auth status for claim ID X" — doesn't.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Benchmark:&lt;/strong&gt; In TinyFish's internal testing, 50 payer portals were checked in 2 minutes and 14 seconds. A staff member doing the same task manually takes 45 minutes or more. At daily frequency, that's roughly 180 staff-hours per month on portal checks alone.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flbxk4l81wzbuciho7day.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flbxk4l81wzbuciho7day.png" alt="Architecture diagram showing TinyFish web agents checking 50 payer portals in parallel returning structured JSON" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Use case 1: Prior authorization status tracking across 50 portals
&lt;/h2&gt;

&lt;p&gt;Prior auth is time-sensitive in both directions. A pending auth that gets approved needs to trigger the next step in the care workflow. A denial needs to trigger the appeals process. Both require someone — or something — checking portal status on a regular cadence.&lt;/p&gt;

&lt;h3&gt;
  
  
  The code
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timezone&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tinyfish&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AsyncTinyFish&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;BrowserProfile&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AsyncTinyFish&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;PRIOR_AUTH_CHECKS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;payer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Aetna&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;portal_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://payer-portal-a.example.com/...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auth_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PA-2026-00441&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;payer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;UnitedHealthcare&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;portal_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://payer-portal-b.example.com/...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auth_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PA-2026-00892&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="c1"&gt;# scale to 50+ payers
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;check_auth_status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;portal_url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;auth_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;goal&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Find the prior authorization status for auth ID: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;auth_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.

    Log in if prompted using your authorized portal credentials. Navigate to the prior authorization or claims section.
    Locate the record matching auth ID &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;auth_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.

    Return JSON with this exact structure:
    {{
        &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auth_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;auth_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,
        &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;approved&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; or &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pending&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; or &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;denied&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; or &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;not_found&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,
        &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;effective_date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YYYY-MM-DD or null&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,
        &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;expiry_date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YYYY-MM-DD or null&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,
        &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;denial_reason&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string or null&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;
    }}

    If you cannot find the auth ID, return status as &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;not_found&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.
    Do not click any Submit or transaction buttons.  # prevents accidental form submissions or transactions during portal navigation
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;portal_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;goal&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;goal&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;browser_profile&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;BrowserProfile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;MANAGED&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# For debugging: response.streaming_url contains a live browser replay (valid 24h)
&lt;/span&gt;    &lt;span class="c1"&gt;# Layer 1: infrastructure failure — result is None
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;payer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;payer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auth_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;auth_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;infrastructure_failure&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;checked_at&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timezone&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;utc&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;isoformat&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="c1"&gt;# Layer 2: goal failure — agent ran but task didn't succeed
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# (continued)
&lt;/span&gt;    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;failure&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;payer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;payer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auth_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;auth_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reason&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;goal_failed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;checked_at&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timezone&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;utc&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;isoformat&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;payer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;payer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auth_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;auth_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;effective_date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;effective_date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;expiry_date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;expiry_date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;denial_reason&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;denial_reason&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;checked_at&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timezone&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;utc&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;isoformat&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;tasks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="nf"&gt;check_auth_status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;payer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;portal_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auth_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;PRIOR_AUTH_CHECKS&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;gather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;indent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note on result handling:&lt;/strong&gt; &lt;code&gt;status: "COMPLETED"&lt;/code&gt; means the browser ran — not that the auth ID was found. A portal that returned a "not found" page will also return &lt;code&gt;COMPLETED&lt;/code&gt;, but &lt;code&gt;result.status&lt;/code&gt; will be &lt;code&gt;"not_found"&lt;/code&gt; as instructed in the goal. Always check &lt;code&gt;result&lt;/code&gt;, not just the run status.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note on login walls:&lt;/strong&gt; This example assumes portal URLs for portals you are authorized to access that accept authenticated sessions or have login steps navigable by the agent. For portals with 2FA or SSO flows, additional goal steps are needed. &lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Output schema
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"payer"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Aetna"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"auth_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"PA-2026-00441"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"approved"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"effective_date"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-03-01"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"expiry_date"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-08-31"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"denial_reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"checked_at"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-03-27T09:00:01Z"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"payer"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"UnitedHealthcare"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"auth_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"PA-2026-00892"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"pending"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"effective_date"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"expiry_date"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"denial_reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"checked_at"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-03-27T09:00:04Z"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This output feeds directly into your care coordination workflow. Approved auths trigger the next step. Denials trigger the appeals queue. Pending items re-enter the next day's check cycle. The agent produces the data; your existing systems handle the routing.&lt;/p&gt;




&lt;h2&gt;
  
  
  Use case 2: Drug price monitoring across state transparency portals
&lt;/h2&gt;

&lt;p&gt;Most states now publish drug pricing data through public transparency portals — mandated by law, updated regularly, and freely accessible without a contract relationship. For pharmacy directors, formulary analysts, and benefit consultants, these portals are a legitimate and underused data source. The problem is coverage: 40+ states with different portal designs, different update schedules, and no unified feed.&lt;/p&gt;

&lt;p&gt;Monitoring them manually is the same problem as payer portal monitoring — the data exists, accessing it at scale requires either staff time or automation.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timezone&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tinyfish&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AsyncTinyFish&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;BrowserProfile&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AsyncTinyFish&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# State drug price transparency portals — publicly accessible, no contract required
&lt;/span&gt;&lt;span class="n"&gt;STATE_DRUG_PORTALS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;state&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CA&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://www.dhcs.ca.gov/provgovpart/pharmacy/Pages/drug-pricing.aspx&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;drug_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;metformin 500mg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;state&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;NY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://www.health.ny.gov/health_care/medicaid/program/pharmacy/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;drug_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;metformin 500mg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;state&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TX&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://hhs.texas.gov/providers/medicaid-prescription-drug-program&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;drug_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;metformin 500mg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="c1"&gt;# add remaining states
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;check_drug_price&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;drug_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;goal&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Find the current published price or reimbursement rate for: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;drug_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.

    Navigate to the drug pricing or formulary section.
    Look for a price table, search tool, or downloadable listing.

    Return JSON:
    {{
        &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;drug_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;drug_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,
        &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;published_price&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: number or null,
        &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;currency&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;USD&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,
        &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;price_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reimbursement rate / retail price / AWP / MAC — whichever applies&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,
        &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;effective_date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YYYY-MM-DD or null&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,
        &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;source_label&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;label or table header as shown on page&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;
    }}

    If no price is found, set published_price to null.
    Do not submit any forms or create any accounts.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;goal&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;goal&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;browser_profile&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;BrowserProfile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;LITE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# state gov sites typically don't need stealth
&lt;/span&gt;    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# For debugging: response.streaming_url has a live browser replay (valid 24h)
&lt;/span&gt;    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;failure&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;state&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# (continued)
&lt;/span&gt;            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;drug_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;drug_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reason&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;goal_failed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;checked_at&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timezone&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;utc&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;isoformat&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;state&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;drug_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;drug_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;published_price&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;published_price&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;currency&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;currency&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;USD&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;price_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;price_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;effective_date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;effective_date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;source_label&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;source_label&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;checked_at&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timezone&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;utc&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;isoformat&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;tasks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;check_drug_price&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;state&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;drug_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;STATE_DRUG_PORTALS&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;gather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;indent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;State transparency portals publish reimbursement rates, MAC pricing, and AWP data — the reference prices that feed formulary decisions and benefit design. Aggregating this across 40+ states concurrently takes minutes instead of days, and the output is structured data rather than a stack of PDFs to parse manually.&lt;/p&gt;




&lt;h2&gt;
  
  
  Use case 3: Real-time regulatory document monitoring
&lt;/h2&gt;

&lt;p&gt;State insurance filings change. Rate approvals, form filings, regulatory bulletins — the information is public, posted on state insurance department websites, and directly relevant to compliance and underwriting decisions. The problem is that 50 states publish on different schedules, in different formats, with no unified feed.&lt;/p&gt;

&lt;p&gt;Manual monitoring at this scale means either missing changes or dedicating staff to daily checks across 50 state websites. Neither is sustainable.&lt;/p&gt;

&lt;p&gt;The agent approach: run a daily scan across state insurance department filing pages, extract new documents since the last check, and return structured metadata.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timezone&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tinyfish&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AsyncTinyFish&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;BrowserProfile&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AsyncTinyFish&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;STATE_FILING_PAGES&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;state&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CA&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://www.insurance.ca.gov/0400-news/0200-bulletins/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;state&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;NY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://www.dfs.ny.gov/industry_guidance/circular_letters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;state&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TX&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://www.tdi.texas.gov/rule/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="c1"&gt;# add all 50 states
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;LAST_CHECK_DATE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2026-03-26&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# replace with your stored value
&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;check_state_filings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;goal&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Extract all regulatory documents or bulletins published after &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;LAST_CHECK_DATE&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.

    For each document found, return:
    {{
        &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;document title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,
        &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;published_date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YYYY-MM-DD&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,
        &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;document_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;full URL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,
        &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;category&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bulletin / circular / rate filing / form filing / other&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;
    }}

    Return as JSON array. If no new documents found, return an empty array [].
    Do not click any login or subscription prompts.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;goal&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;goal&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;browser_profile&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;BrowserProfile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;LITE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# state gov sites typically don't need stealth
&lt;/span&gt;    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

    &lt;span class="c1"&gt;# Check for goal-level failure before checking result type
&lt;/span&gt;    &lt;span class="c1"&gt;# Without this, a failure dict would pass the isinstance check as non-list and silently return []
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;failure&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;state&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;new_documents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reason&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;goal_failed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;checked_at&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timezone&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;utc&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;isoformat&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;state&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;new_documents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;checked_at&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timezone&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;utc&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;isoformat&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# (continued)
&lt;/span&gt;    &lt;span class="n"&gt;tasks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;check_state_filings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;state&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;STATE_FILING_PAGES&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;gather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Filter to states with new documents
&lt;/span&gt;    &lt;span class="n"&gt;updates&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;new_documents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;updates&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;indent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output feeds a compliance dashboard, triggers alerts for specific document categories, or routes to the relevant underwriting or legal team. The agent doesn't interpret the documents — it surfaces them. Interpretation stays with your team.&lt;/p&gt;




&lt;h2&gt;
  
  
  The compliance question: what web agents can and can't do in regulated environments
&lt;/h2&gt;

&lt;p&gt;This is the question that comes up in every insurance technology conversation, so let's address it directly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What web agents do:&lt;/strong&gt; navigate browser sessions, extract data from pages, return structured output. They interact with portals through standard browser sessions — navigating to a page, reading what's on it, and returning the relevant data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What web agents don't do:&lt;/strong&gt; store protected health information, transmit data across systems, or make clinical or coverage decisions. The agent is a data extraction layer. Where that data goes next — and what compliance obligations apply to that storage and transmission — is determined by your system architecture, not the agent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;On HIPAA:&lt;/strong&gt; A web agent accessing a payer portal doesn't inherently create new HIPAA obligations beyond those that already exist when a staff member accesses the same portal through the same browser. The agent performs authenticated browser access — reading page content through an authenticated session and returning structured data. The compliance obligations arise from what you do with the extracted data: how it's stored, who can access it, and how it's transmitted. That's your data pipeline's responsibility, not the extraction layer's.&lt;/p&gt;

&lt;p&gt;This doesn't mean HIPAA review isn't necessary — it means the review should focus on the full data flow, not just the agent step. Any healthcare or insurance deployment should involve a legal review of the complete architecture. The agent layer is usually not where the compliance complexity lives.&lt;/p&gt;




&lt;h2&gt;
  
  
  Build vs. buy: when to use TinyFish vs. building your own portal integration
&lt;/h2&gt;

&lt;p&gt;The alternative to a web agent for payer portal access is a direct API integration — but most payer portals don't have APIs. The ones that do (typically large national payers via HL7 FHIR) are worth pursuing first. Where APIs exist, use them.&lt;/p&gt;

&lt;p&gt;Web agents become the right tool when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;The portal has no API and no roadmap to add one&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The portal interface is stable enough to automate but too variable for traditional RPA&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The volume of checks justifies automation over manual workflows&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Your team can't maintain custom automation scripts as portals evolve&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The build-vs-buy question for the agent infrastructure itself: building a raw HTTP automation layer that handles login walls, session management, 2FA, and concurrent execution is months of engineering work. The free tier — &lt;strong&gt;500 credits, no credit card&lt;/strong&gt; — is enough to run a proof of concept against your real target portals before that decision needs to be made.&lt;/p&gt;




&lt;h2&gt;
  
  
  Get started
&lt;/h2&gt;

&lt;p&gt;The free tier gives you 500 credits — enough to run a prior auth status check across 50 portals and see real output before committing to anything.&lt;/p&gt;

&lt;p&gt;For healthcare and insurance teams with compliance requirements or enterprise-scale portal monitoring needs, &lt;a href="https://tinyfish.ai/contact?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;contact our enterprise team&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Can web agents handle your own payer portals that require 2FA or SSO login?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Agents can navigate multi-step authentication workflows, including TOTP-based MFA and standard SSO redirects, for portals you're authorized to access. Include the authentication steps for your authorized portal in the goal prompt, or use TinyFish's credential vault for secure credential management. SSO with federated identity is more complex and depends on the specific implementation. Include explicit authentication steps for your authorized portal in your goal prompt and use the credential vault for secure credential storage.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does this work for all 50 payer portals or just the major ones?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Any publicly accessible portal — including smaller regional payers — is reachable. Success rate depends on the portal's complexity and technical protection level, not its size. The agent navigates based on what it sees on screen, so portal size is not a limiting factor.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What happens when a payer portal updates its layout?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The goal-based approach is more resilient to layout changes than selector-based RPA. When a portal redesigns, the goal — "find prior auth status for ID X" — remains valid. In practice, goal prompts may need minor tuning after major redesigns, but the maintenance burden is substantially lower than maintaining CSS selectors.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is the extracted data HIPAA-compliant?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;HIPAA obligations arise from how extracted data is stored, accessed, and transmitted — not from the extraction mechanism itself. A web agent accessing a payer portal creates the same data handling obligations as a staff member accessing the same portal. The compliance review should cover your full data pipeline, not just the agent step. Any healthcare deployment should involve legal review of the complete architecture before going to production.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do I handle portals that time out mid-session?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Include session management instructions in the goal: "if a timeout or re-login prompt appears on your authorized portal, log in again with the provided credentials and resume." The agent follows standard re-authentication flows natively.&lt;/p&gt;




&lt;h2&gt;
  
  
  See It in Action
&lt;/h2&gt;

&lt;p&gt;The free tier includes &lt;strong&gt;500 steps&lt;/strong&gt; — enough to run a complete insurance data extraction workflow against real data before committing to a plan.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://agent.tinyfish.ai/sign-up?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;Start free, no credit card →&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Related Reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Pillar:&lt;/strong&gt; &lt;a href="https://tinyfish.ai/blog/ai-web-agents-real-world-use-cases?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;What Can AI Web Agents Actually Do? 10 Real-World Use Cases&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://tinyfish.ai/blog/how-to-monitor-1000-websites-in-parallel-tinyfish-api?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;How to Monitor 1,000 Websites in Parallel with the TinyFish API&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;When Web Agents Fail: Debugging Goal-Based Automation&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://tinyfish.ai/blog/web-agents-procurement?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;Web Agents for Procurement: Multi-Vendor Portal Automation&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Want to scrape the web without getting blocked? Try &lt;a href="https://tinyfish.ai?utm_source=devto" rel="noopener noreferrer"&gt;TinyFish&lt;/a&gt; — a browser API built for AI agents and developers.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>insuranceautomation</category>
      <category>webagents</category>
      <category>priorauthorization</category>
      <category>healthcare</category>
    </item>
    <item>
      <title>When Web Agents Fail: Debugging Goal-Based Automation</title>
      <dc:creator>Tinyfishie</dc:creator>
      <pubDate>Tue, 19 May 2026 10:01:42 +0000</pubDate>
      <link>https://dev.to/tinyfishie/when-web-agents-fail-debugging-goal-based-automation-307b</link>
      <guid>https://dev.to/tinyfishie/when-web-agents-fail-debugging-goal-based-automation-307b</guid>
      <description>&lt;p&gt;Web agent debugging separates infrastructure failures (the run didn't complete) from goal failures (the run completed but returned wrong data) — two categories that require completely different fixes.&lt;/p&gt;

&lt;p&gt;The agent ran. It returned &lt;code&gt;COMPLETED&lt;/code&gt;. But the price field is empty — or it pulled the wrong product, or it returned results from only the first page of a multi-page portal.&lt;/p&gt;

&lt;p&gt;That's the problem with agent debugging: the failure isn't explicit. A Playwright script throws a timeout or a selector error. An agent returns a status that says it finished — and the problem only surfaces when you check the data downstream.&lt;/p&gt;

&lt;p&gt;Understanding why agents fail differently from scripts is the foundation of effective web agent debugging. This guide covers both failure layers and a systematic approach to each.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Agent Failures Feel Different from Script Failures
&lt;/h2&gt;

&lt;p&gt;A scripted automation fails predictably. The selector didn't match, the element wasn't found, the HTTP request timed out — there's a specific line where execution stopped and an error you can read.&lt;/p&gt;

&lt;p&gt;A goal-directed agent has no single execution path. It plans, navigates, adapts. That flexibility is what makes it useful on dynamic pages, but it also means there's no breakpoint. The agent chose a sequence of steps you didn't write, and somewhere in that sequence it went wrong — but the run still completed.&lt;/p&gt;

&lt;p&gt;This is a structural difference, not a reliability problem. Agents trade determinism for adaptability. The tradeoff is that debugging requires a different mental model.&lt;/p&gt;

&lt;p&gt;The good news: most agent failures cluster into two categories — a pattern TinyFish has observed consistently across production runs at different scales and page types. Each category has a clear diagnostic path. Once you know which type you have, the fix is usually straightforward.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Related: &lt;a href="https://tinyfish.ai/blog/what-is-an-agentic-browser?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;What Is an Agentic Browser?&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Two Distinct Failure Types (And Why They Need Different Fixes)
&lt;/h2&gt;

&lt;p&gt;The most important diagnostic question isn't "what went wrong" — it's "did the run complete at all?"&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 1 — Infrastructure failure:&lt;/strong&gt; The run didn't complete. The result is &lt;code&gt;None&lt;/code&gt;, the status indicates an error, or you got no response at all.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 2 — Goal failure:&lt;/strong&gt; The run completed (status: &lt;code&gt;COMPLETED&lt;/code&gt;), but the result is wrong, incomplete, or inconsistent with what you asked for.&lt;/p&gt;

&lt;p&gt;These two failures have completely different root causes and require different debugging approaches. Treating a goal failure as an infrastructure problem sends you in the wrong direction immediately.&lt;/p&gt;

&lt;p&gt;The check pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="n"&gt;TINYFISH_API_KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your_api_key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://agent.tinyfish.ai/v1/automation/run&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;X-API-Key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;TINYFISH_API_KEY&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;goal&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Extract the product name, price, and availability from this listing&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://portal.example.com/products/item-123&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="err"&gt;!&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Decision&lt;/span&gt; &lt;span class="n"&gt;tree&lt;/span&gt; &lt;span class="n"&gt;showing&lt;/span&gt; &lt;span class="n"&gt;two&lt;/span&gt; &lt;span class="n"&gt;web&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="n"&gt;failure&lt;/span&gt; &lt;span class="n"&gt;paths&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;infrastructure&lt;/span&gt; &lt;span class="n"&gt;failure&lt;/span&gt; &lt;span class="n"&gt;on&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;left&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;goal&lt;/span&gt; &lt;span class="n"&gt;failure&lt;/span&gt; &lt;span class="n"&gt;on&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;right&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;diagnostic&lt;/span&gt; &lt;span class="n"&gt;steps&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;each&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="n"&gt;https&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="n"&gt;cdn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sanity&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;io&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;nhc04xln&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;production&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;b17d16ff1e2e2de063ea8f961517ea4a9a51df86&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="n"&gt;x1024&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;png&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Layer 1: Did the run complete?
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Infrastructure failure: HTTP &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# Debug infrastructure (see next section)
&lt;/span&gt;&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;COMPLETED&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Infrastructure failure: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Layer 2: Did the goal succeed?
&lt;/span&gt;        &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{})&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;price&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Goal failure: run completed but price missing&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="c1"&gt;# Debug goal specification (see goal failure section)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;COMPLETED&lt;/code&gt; is not the same as "goal succeeded." It means the infrastructure layer finished. The goal evaluation is always separate.&lt;/p&gt;




&lt;h2&gt;
  
  
  Infrastructure Failures: The Run Didn't Complete
&lt;/h2&gt;

&lt;p&gt;Infrastructure failures mean the run itself didn't complete. The result status tells you what happened:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Status&lt;/th&gt;
&lt;th&gt;What it means&lt;/th&gt;
&lt;th&gt;First thing to check&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;TASK_FAILED&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;The agent encountered a condition it couldn't continue from&lt;/td&gt;
&lt;td&gt;Check session state: does the page require authentication or specific cookies?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;SITE_BLOCKED&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;The target site's configuration prevented this run from completing&lt;/td&gt;
&lt;td&gt;Review whether the site requires specific access patterns, session state, or authentication on an account you control&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;MAX_STEPS_EXCEEDED&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;The agent reached the step limit without completing&lt;/td&gt;
&lt;td&gt;The goal is too broad. Split it: instead of "Extract all product data and submit the order form," use two separate calls — one to extract, one to submit. Each sub-task should complete in fewer than 20 steps.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;TIMEOUT&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;The run exceeded the time limit&lt;/td&gt;
&lt;td&gt;Check if the page loads slowly; try increasing timeout in your request parameters&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;None&lt;/code&gt; response&lt;/td&gt;
&lt;td&gt;Network or API issue&lt;/td&gt;
&lt;td&gt;Check your API key and network connectivity&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;On &lt;code&gt;SITE_BLOCKED&lt;/code&gt;:&lt;/strong&gt; This status means the target site's configuration — session requirements, access patterns, or content policies — prevented the run from completing. Check what the site actually requires: does it need authentication on an account you control? Does it require specific session cookies? Retry after addressing the underlying access requirement, not the symptom.&lt;/p&gt;

&lt;p&gt;Infrastructure failures generally aren't retryable without changing something. A &lt;code&gt;TIMEOUT&lt;/code&gt; might resolve on retry; a &lt;code&gt;SITE_BLOCKED&lt;/code&gt; on a page that requires authentication won't.&lt;/p&gt;




&lt;h2&gt;
  
  
  Goal Failures: COMPLETED Doesn't Mean Done
&lt;/h2&gt;

&lt;p&gt;Goal failures are subtler: the run completed, status is &lt;code&gt;COMPLETED&lt;/code&gt;, but the data is wrong or missing. Three common patterns:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern 1: Missing fields&lt;/strong&gt;&lt;br&gt;
The agent returned a result but specific fields are empty or &lt;code&gt;null&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Symptom:&lt;/em&gt; &lt;code&gt;result.data.get("price")&lt;/code&gt; is &lt;code&gt;None&lt;/code&gt;, but the page clearly shows a price.&lt;br&gt;
&lt;em&gt;Cause:&lt;/em&gt; The goal didn't specify the field explicitly enough. The agent extracted what it interpreted as relevant — just not the right thing.&lt;br&gt;
&lt;em&gt;Fix:&lt;/em&gt; Name the fields explicitly in the goal.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern 2: Wrong element&lt;/strong&gt;&lt;br&gt;
The agent extracted data — but from the wrong element or section of the page.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Symptom:&lt;/em&gt; The data looks plausible but doesn't match the specific item you targeted.&lt;br&gt;
&lt;em&gt;Cause:&lt;/em&gt; The page has multiple similar-looking elements and the goal didn't specify which one.&lt;br&gt;
&lt;em&gt;Fix:&lt;/em&gt; Add context about the element's location or distinguishing characteristic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern 3: Inconsistent results across runs&lt;/strong&gt;&lt;br&gt;
The same goal produces different results when run multiple times on the same URL.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Symptom:&lt;/em&gt; Run 1 extracts &lt;code&gt;$29.99&lt;/code&gt;; Run 2 extracts &lt;code&gt;$24.99&lt;/code&gt;; the page shows both a regular and sale price.&lt;br&gt;
&lt;em&gt;Cause:&lt;/em&gt; The goal doesn't specify which value to use when multiple options exist.&lt;br&gt;
&lt;em&gt;Fix:&lt;/em&gt; Add specificity: "Extract the current sale price, not the original price."&lt;/p&gt;

&lt;p&gt;To see which pattern you have, use streaming to watch the agent's steps:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="n"&gt;TINYFISH_API_KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your_api_key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://agent.tinyfish.ai/v1/automation/run-sse&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;X-API-Key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;TINYFISH_API_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Accept&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text/event-stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;goal&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Extract the product name, price, and availability&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://portal.example.com/products/item-123&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;iter_lines&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;  &lt;span class="c1"&gt;# Watch what the agent navigates and extracts
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The step stream shows what the agent actually saw and acted on — which element it identified as the price, which page state it was in when it extracted. This is the closest equivalent to adding &lt;code&gt;console.log&lt;/code&gt; to a script.&lt;/p&gt;




&lt;h2&gt;
  
  
  Using the Browser API for Step-Level Debugging
&lt;/h2&gt;

&lt;p&gt;When the step stream shows the agent going wrong but you're not sure why, the next step is to inspect the actual page state the agent encountered.&lt;/p&gt;

&lt;p&gt;The Browser API gives you a raw CDP connection to inspect the exact DOM state the web agent encountered — useful when step-stream output is ambiguous. It's a separate TinyFish product from the Web Agent and requires its own session.&lt;/p&gt;

&lt;p&gt;Identify the failing step from the stream, then use the Browser API to reproduce that step and inspect the DOM directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;playwright.sync_api&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sync_playwright&lt;/span&gt;

&lt;span class="n"&gt;TINYFISH_API_KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your_api_key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Create a Browser API session
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.browser.tinyfish.ai&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;X-API-Key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;TINYFISH_API_KEY&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;cdp_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cdp_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# WebSocket endpoint for CDP connection
&lt;/span&gt;
&lt;span class="c1"&gt;# Connect Playwright to the TinyFish cloud browser
&lt;/span&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;sync_playwright&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;browser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chromium&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect_over_cdp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cdp_url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;contexts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;pages&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="c1"&gt;# Navigate to where the agent went wrong
&lt;/span&gt;    &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;goto&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://portal.example.com/products/item-123&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Inspect the actual DOM state
&lt;/span&gt;    &lt;span class="n"&gt;price_elements&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query_selector_all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.price, [data-price], .product-price&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;el&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;price_elements&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;inner_text&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;class&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This shows you exactly what the agent was looking at: how many price elements were on the page, what loaded dynamically after the initial render. Once you understand the page structure the agent encountered, the goal fix usually becomes clear.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Related: &lt;a href="https://tinyfish.ai/blog/tinyfish-vs-playwright-when-to-use-each?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;TinyFish vs Playwright: When to Use Each&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Real Fix Is Usually the Goal Text
&lt;/h2&gt;

&lt;p&gt;Most goal failures are fixable by improving the goal text. The agent is doing what it was asked — the ask just wasn't specific enough.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix 1: Specify the output schema explicitly — start here first&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;❌ Vague:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Get the product information from this page
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;✅ Specific:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Extract exactly these fields:
- product_name: the main product title (not the category or brand)
- price: the current price in USD (use the sale price if shown, otherwise the regular price)
- availability: one of 'in_stock', 'out_of_stock', or 'limited'
Return null for any field not found on the page.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The second version tells the agent what to call each field, where to look for it, how to handle ambiguity, and what to do when something is missing. The reason this works: explicit schemas reduce the agent's interpretation space. When the goal says "return null for any field not found," the agent no longer needs to decide between returning an empty string, a placeholder, or a label text — there's one valid answer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix 2: Scope the task tightly&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;❌ Broad:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Get all the prices from this site
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;✅ Scoped:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Extract the price of the item currently shown on this product detail page.
Do not navigate to other pages. Stop after extracting the price.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Explicit boundaries prevent the agent from interpreting a broad goal broadly. Without scope constraints, an agent that can navigate will navigate — not because something went wrong, but because the goal didn't say not to.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix 3: Add handling for missing data&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;If the price is not visible on the page, return null for the price field.
Do not return 'price unavailable' or similar strings — return null.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Agents improvise when data is missing unless you tell them not to. Without a "return null" instruction, an agent might reasonably return the label text, a related value, or an empty string — all of which look different in downstream code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix 4: Add page context when the layout is unusual&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;The product price appears below the product image and above the Add to Cart button.
Extract only that price — ignore any prices shown in the 'You might also like' section.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This works because the agent uses your spatial description as a disambiguation rule. When it encounters two elements that both look like prices, "below the product image" narrows it to one.&lt;/p&gt;




&lt;h2&gt;
  
  
  When the Problem Is the Site, Not the Goal
&lt;/h2&gt;

&lt;p&gt;Sometimes the agent is following the goal correctly, but the page itself behaves inconsistently across runs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Signs it's a site problem:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The goal text is specific and well-scoped&lt;/li&gt;
&lt;li&gt;Results vary across runs on the same URL with the same goal&lt;/li&gt;
&lt;li&gt;The step stream shows the agent reaching the right element but extracting different values&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Common site-side causes:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Content loads asynchronously:&lt;/em&gt; The agent may extract a value before the final price loads, or after a different variant's price has already populated the element.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Conditional elements:&lt;/em&gt; Promotional banners, A/B test variants, or personalized content may appear in some sessions but not others, changing the page structure the agent encounters.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Session state:&lt;/em&gt; Prices, availability, or content may differ based on login state or session cookies. If the agent runs without the expected session state, it sees a different version of the page.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Authentication expiry:&lt;/em&gt; For portals that require authentication on accounts you control, the session may expire mid-run on longer tasks. The agent completes but sees the logged-out version of the page for later steps.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Diagnostic approach:&lt;/strong&gt; Run the same goal 3 times on the same URL and compare results. If results differ: site problem. If results are consistently wrong in the same way: goal problem.&lt;/p&gt;

&lt;p&gt;For content timing issues, add an explicit instruction to the goal: "Wait for the price element to fully load before extracting."&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Start debugging your web agent&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;500 free steps, no credit card required. Run the two-layer check pattern against your actual targets.&lt;/p&gt;




&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What does COMPLETED mean in web agent results?&lt;/strong&gt;&lt;br&gt;
COMPLETED means the infrastructure layer finished without an error — not that the goal succeeded. Always validate returned data: check that required fields are present, non-null, and match expected values. This is the single most common source of silent bugs in agent pipelines.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why does my web agent return different results each time?&lt;/strong&gt;&lt;br&gt;
Run the same goal 3 times on the same URL. If results vary: site problem (dynamic content, A/B tests, session state). If results are consistently wrong in the same way: goal problem. The distinction matters because the fix is completely different.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I retry automatically when an agent fails?&lt;/strong&gt;&lt;br&gt;
For &lt;code&gt;TIMEOUT&lt;/code&gt; and transient infrastructure errors: yes, a single retry is reasonable. For &lt;code&gt;SITE_BLOCKED&lt;/code&gt; or &lt;code&gt;TASK_FAILED&lt;/code&gt;: retrying without changing the goal or session state rarely helps. For &lt;code&gt;MAX_STEPS_EXCEEDED&lt;/code&gt;: the goal needs to be split before retrying — adding retries without splitting will hit the limit again.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What does SITE_BLOCKED mean?&lt;/strong&gt;&lt;br&gt;
The target site's configuration prevented the run from completing — typically a session, authentication, or access pattern requirement. Identify what the site actually requires for the content you're trying to access, address that requirement, then retry.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do I know when to stop tuning the goal and accept the result?&lt;/strong&gt;&lt;br&gt;
If you've applied Fix 1 (explicit schema) and the agent still returns inconsistent results after 3 runs, the problem is likely the page — not the goal. At that point, use the Browser API to inspect what the page actually serves to an automated session, rather than continuing to refine goal text.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Related reading:&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://tinyfish.ai/blog/what-is-an-agentic-browser?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;What Is an Agentic Browser?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://tinyfish.ai/blog/tinyfish-vs-playwright-when-to-use-each?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;TinyFish vs Playwright: When to Use Each&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://tinyfish.ai/blog/headless-browser-vs-ai-agents-for-web-automation?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;Headless Browser vs AI Agents for Web Automation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Want to scrape the web without getting blocked? Try &lt;a href="https://tinyfish.ai?utm_source=devto" rel="noopener noreferrer"&gt;TinyFish&lt;/a&gt; — a browser API built for AI agents and developers.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>webagents</category>
      <category>debugging</category>
      <category>aiautomation</category>
      <category>goalbasedautomation</category>
    </item>
    <item>
      <title>How to Write Goals That Get Consistent Results from AI Agents</title>
      <dc:creator>Tinyfishie</dc:creator>
      <pubDate>Tue, 19 May 2026 09:57:25 +0000</pubDate>
      <link>https://dev.to/tinyfishie/how-to-write-goals-that-get-consistent-results-from-ai-agents-1bk8</link>
      <guid>https://dev.to/tinyfishie/how-to-write-goals-that-get-consistent-results-from-ai-agents-1bk8</guid>
      <description>&lt;p&gt;When you write a prompt for a chatbot, the worst outcome is a bad answer. You try again.&lt;/p&gt;

&lt;p&gt;When you write a goal for a web agent, the worst outcome is wrong data that passes downstream validation, an extraction that returns half the results on every run, or — in a transactional workflow — an action you didn't intend. A bad goal doesn't fail loudly. It fails quietly, returning a result that looks plausible until someone checks it against the source.&lt;/p&gt;

&lt;p&gt;Goal writing for web agents has its own patterns. This guide covers the four elements every reliable goal needs, and a reference table of goal patterns for common use cases.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Goal Writing Is Different from Prompt Engineering
&lt;/h2&gt;

&lt;p&gt;Prompt engineering is about getting a model to respond well. Goal writing is about instructing an agent to act reliably.&lt;/p&gt;

&lt;p&gt;The distinction matters in practice:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;A prompt&lt;/strong&gt; gets evaluated once. The model reads it, generates a response, done.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A goal&lt;/strong&gt; gets interpreted at each decision point as the agent navigates. Every time the agent decides whether to click something, where to look next, or how to structure its output, it refers back to the goal.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This means a vague goal doesn't produce one bad response — it produces inconsistent decisions across every step of every run. Two agents running the same vague goal against the same page can take different paths and return different results.&lt;/p&gt;

&lt;p&gt;Prompt engineering advice — be specific, give examples, define the format — applies here too. But goal writing adds two things that chatbot prompting doesn't need: &lt;strong&gt;scope&lt;/strong&gt; (where on the page to operate) and &lt;strong&gt;fallback behavior&lt;/strong&gt; (what to return when expected elements aren't there).&lt;/p&gt;

&lt;h2&gt;
  
  
  The Four Elements of a Reliable Goal
&lt;/h2&gt;

&lt;p&gt;Every goal that produces consistent results has these four properties. Missing any one of them is the most common source of extraction failures.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Element&lt;/th&gt;
&lt;th&gt;What it defines&lt;/th&gt;
&lt;th&gt;What happens without it&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Scope&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Which section/page to focus on&lt;/td&gt;
&lt;td&gt;Agent wanders, extracts from wrong sections&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Task&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;What specific action or extraction to perform&lt;/td&gt;
&lt;td&gt;Agent guesses what counts as completion&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Output format&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The structure of the returned data&lt;/td&gt;
&lt;td&gt;Each run returns data shaped differently&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Fallback behavior&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;What to do when expected elements are missing&lt;/td&gt;
&lt;td&gt;Silent failures, partial results, null fields&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Weak goal:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Get the product information.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Strong goal:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;In the main product listing grid (not featured or sponsored products),
extract the name, current price, and availability status for each product.
Return as JSON array: [{"name": string, "price": string, "availability": "in_stock" | "out_of_stock" | "unknown"}].
If a field is not visible for a product, use null — do not navigate to individual product pages.
Limit to 20 products.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The weak goal has none of the four elements. The strong goal has all of them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Specifying Output Format
&lt;/h2&gt;

&lt;p&gt;Unspecified output format is the single most common cause of inconsistent agent results. When you don't define the structure, the agent makes its own decisions — and those decisions can vary between runs.&lt;/p&gt;

&lt;h3&gt;
  
  
  From free-form to schema
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tinyfish&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TinyFish&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TinyFish&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# ❌ No output format — agent decides the structure
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://marketplace.example.com/listings&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;goal&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Extract the product listings.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Result varies: sometimes a list, sometimes a dict, sometimes nested
&lt;/span&gt;
&lt;span class="c1"&gt;# ✅ Explicit schema — consistent every time
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://marketplace.example.com/listings&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;goal&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Extract each product listing on this page. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Return as JSON array: [{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: string, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: string, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
        &lt;span class="sh"&gt;'"&lt;/span&gt;&lt;span class="s"&gt;price&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: number, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;currency&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;USD&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;seller&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: string}]. &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;If a field is not present, use null.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Choose the right schema style for your use case
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;For structured extraction (most common):&lt;/strong&gt; Specify field names, types, and what null means.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Return as JSON: {"title": string, "price": number, "in_stock": boolean}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;For list extraction with many items:&lt;/strong&gt; Define the array shape and element schema.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Return as JSON array, one object per result:
[{"rank": number, "url": string, "title": string, "snippet": string}]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;For multi-value fields:&lt;/strong&gt; Be explicit about how to handle them.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;If multiple prices are shown (sale price + original price), return both:
{"sale_price": number, "original_price": number, "currency": string}
If only one price is shown, use {"price": number, "currency": string}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;For boolean fields:&lt;/strong&gt; Define what each state means.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"availability": true if "In Stock" or "Available" is shown, false if "Out of Stock"
or "Unavailable", null if no availability information is visible
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Scoping Goals to Specific Page Sections
&lt;/h2&gt;

&lt;p&gt;Broad goals produce broad results. A page that has a main listing grid, a featured products sidebar, a recently viewed section, and a sponsored row will produce overlapping, duplicate data if your goal doesn't specify which section to use.&lt;/p&gt;

&lt;h3&gt;
  
  
  Add location context
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# ❌ Broad — agent extracts from all sections
"Extract all product prices on this page."

# ✅ Scoped — agent focuses on the right section
"In the main product listing grid (not the featured sidebar, sponsored section,
or recently viewed row), extract the price and name for each product."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Scope patterns by page type
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;E-commerce listing page:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;In the main search results grid, extract... (not sponsored listings at the top)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Pricing page with comparison table:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;In the pricing comparison table, extract each plan's name, monthly price,
and the features listed under 'Included'. Do not extract from the FAQ section.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Search results:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Extract the organic search results (not ads, not 'People also ask', not featured snippets).
For each result: title, URL, and snippet text.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Portal with sidebar navigation:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;In the main content area (not the sidebar navigation), extract...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When in doubt, describe what NOT to include. Exclusions are often easier to specify than inclusions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Writing Fallback Instructions
&lt;/h2&gt;

&lt;p&gt;Silent failures happen when the agent completes a run, returns a result, but the expected element wasn't present — and the goal gave no instruction for that case. The agent fills the gap with its best guess.&lt;/p&gt;

&lt;p&gt;Explicit fallback instructions tell the agent exactly what to return when something is missing:&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern: missing field
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Extract the product's SKU from the product details section.
If no SKU is visible on this page, return null for that field.
Do not navigate to other pages to find it.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pattern: multiple values found
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Extract the current price. If both a sale price and a regular price are shown,
return both: {"sale_price": number, "original_price": number}.
If only one price is shown, return {"price": number}.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pattern: conditional content
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Extract the inventory count if displayed. If inventory count is not shown
but the item is listed as available, return {"in_stock": true, "quantity": null}.
If the item shows "Out of Stock", return {"in_stock": false, "quantity": 0}.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pattern: pagination
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Extract listings from this page only. If the page shows fewer than 10 listings,
return what's available — do not navigate to the next page.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The rule: every state the page could be in should have an explicit instruction. If your goal covers the happy path but not the edge cases, you'll get inconsistent results whenever a page varies from the expected state.&lt;/p&gt;

&lt;h2&gt;
  
  
  Safety Instructions for Transactional Workflows
&lt;/h2&gt;

&lt;p&gt;For goals that involve forms, account actions, or any workflow with write access, explicitly define what the agent should and should not do. Safety instructions don't limit what your agent can do — they scope the task to precisely the intended action.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://portal.example.com/orders&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;goal&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Log in using the provided credentials. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Navigate to the Orders section and extract the 10 most recent orders. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;For each order: order ID, date, status, and total amount. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Return as JSON array. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;IMPORTANT: Do not click any buttons that modify order status. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Do not submit any forms. Do not proceed to any checkout or payment screens. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;If you reach a page that requires payment information, stop and return &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reached_payment_page&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;}.&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The safety instruction block at the end of a transactional goal is the standard pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;IMPORTANT: Do not [specific prohibited action]. Do not [second prohibited action].
If you encounter [edge case], stop and return {"error": "[description]"}.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Authenticated workflows operate on accounts you're authorized to access — safety instructions define the scope of what the agent does within that access, giving you precise control over each run.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fckkwn3iouoq15y68uyti.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fckkwn3iouoq15y68uyti.png" alt="Diagram showing the four elements of a reliable agent goal: scope, task, output format, and fallback behavior" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Goal Patterns by Use Case
&lt;/h2&gt;

&lt;p&gt;Copy-pasteable patterns for common scenarios. Replace the bracketed placeholders with your specifics.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Use case&lt;/th&gt;
&lt;th&gt;Goal pattern&lt;/th&gt;
&lt;th&gt;Key elements to customize&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Product price extraction&lt;/td&gt;
&lt;td&gt;&lt;code&gt;"In the [section] on this page, extract the [name/price/availability] for each product. Return as JSON array: [{\"name\": string, \"price\": number, \"currency\": string}]. If price is not shown, use null."&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Section name, field list, currency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Search result extraction&lt;/td&gt;
&lt;td&gt;&lt;code&gt;"Extract the organic search results (not ads). For each: title, URL, snippet. Return as JSON array, max [N] results."&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;N, any additional fields&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-step form submission&lt;/td&gt;
&lt;td&gt;&lt;code&gt;"Fill the [form name] with: [field]: [value], [field]: [value]. Click [button]. IMPORTANT: Do not proceed past the confirmation screen. Return the confirmation message or order ID shown."&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Form fields, stop condition&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Authenticated portal data&lt;/td&gt;
&lt;td&gt;&lt;code&gt;"Navigate to [section] using the provided credentials. Extract [data]. Return as JSON. Do not submit any forms or modify any data."&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Section, data fields&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Price comparison across pages&lt;/td&gt;
&lt;td&gt;&lt;code&gt;"Extract the current [plan name] price from the pricing page. Return: {\"plan\": string, \"monthly_price\": number, \"annual_price\": number, \"currency\": string}. Use null for any price not shown."&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Plan name, price fields&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Inventory monitoring&lt;/td&gt;
&lt;td&gt;&lt;code&gt;"Check the availability status of [product identifier] on this page. Return: {\"available\": boolean, \"quantity\": number or null, \"status_text\": string}. Capture the exact text shown for availability."&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Product ID, status fields&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Document/content extraction&lt;/td&gt;
&lt;td&gt;&lt;code&gt;"Extract the [content type] from the [section] of this page. Do not extract navigation, footers, or sidebar content. Return as plain text preserving paragraph breaks."&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Content type, exclusions&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;p&gt;&lt;strong&gt;Test these patterns on your actual target pages.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;TinyFish gives you 500 free credits to run goal-based agents against any URL — no credit card required. Start with one of the patterns above and see what consistent, structured output looks like.&lt;/p&gt;




&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;How long should an agent goal be?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;As long as it needs to be to cover all four elements — scope, task, output format, and fallback behavior. A one-line goal that doesn't specify output format will produce inconsistent results. A five-sentence goal that covers all four elements will produce consistent results. Length isn't a signal of quality; completeness is. In practice, most reliable goals run 50–150 words.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Should I put the output format at the beginning or end of the goal?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;End. Start with scope and task so the agent understands what it's doing before it encounters the format instruction. Ending with the format means the agent is already oriented when it reads the schema — it's not trying to simultaneously understand what it's extracting and how to structure it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I use the same goal for multiple URLs?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yes — as long as the page structure is consistent across those URLs. If the target field (price, availability, etc.) appears in a different location or under a different element type on some pages, your results will vary. For large URL lists with structural variance, write a classification step first and route to different goals by page type.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's the difference between a goal and a script (Playwright, Selenium)?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A script defines every step explicitly: click this, wait for that, read this selector. A goal defines the desired outcome and lets the agent figure out the steps. Scripts are brittle when page structure changes; goals adapt. Goals require more careful output specification because there's no explicit selector telling the agent exactly which element to read — the agent infers from the goal text.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do I debug a goal that's producing inconsistent results?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Use streaming to watch the agent's steps during a run — this shows you where it diverges from the expected path. Then check: (1) Is the scope ambiguous? The agent might be looking in the wrong section. (2) Is the output format specified? Unspecified format produces structural variation. (3) Is there a fallback for the failing case? A missing fallback produces silent failures. See &lt;a href="https://tinyfish.ai/blog/when-web-agents-fail-debugging?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;the full debugging guide&lt;/a&gt; for the complete diagnostic sequence.&lt;/p&gt;




&lt;h2&gt;
  
  
  Related Reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://tinyfish.ai/blog/when-web-agents-fail-debugging?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;When Web Agents Fail: Debugging Goal-Based Automation&lt;/a&gt; — diagnosing goal failures when the run completes but the output is wrong&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://tinyfish.ai/blog/ai-web-agents-real-world-use-cases?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;What Can AI Web Agents Actually Do? 10 Real-World Use Cases&lt;/a&gt; — transactional and extraction use cases with result examples&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://tinyfish.ai/blog/tinyfish-web-agent-getting-started-10-minutes?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;Getting Started with TinyFish in 10 Minutes&lt;/a&gt; — first goal-directed agent run from scratch&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Want to scrape the web without getting blocked? Try &lt;a href="https://tinyfish.ai?utm_source=devto" rel="noopener noreferrer"&gt;TinyFish&lt;/a&gt; — a browser API built for AI agents and developers.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>howtowritegoalsforaiagents</category>
      <category>webagents</category>
      <category>goalbasedautomation</category>
      <category>aiprompting</category>
    </item>
    <item>
      <title>Build a Real-Time Medicine Price Comparison Tool with Parallel AI Agents</title>
      <dc:creator>Tinyfishie</dc:creator>
      <pubDate>Tue, 19 May 2026 09:57:24 +0000</pubDate>
      <link>https://dev.to/tinyfishie/build-a-real-time-medicine-price-comparison-tool-with-parallel-ai-agents-53ah</link>
      <guid>https://dev.to/tinyfishie/build-a-real-time-medicine-price-comparison-tool-with-parallel-ai-agents-53ah</guid>
      <description>&lt;p&gt;TinyFish parallel agents are cloud-based browser sessions that run simultaneously across multiple pharmacy websites and return structured pricing data — product name, price, dosage form, and stock status — normalized into a consistent schema for side-by-side comparison.&lt;/p&gt;

&lt;p&gt;GoodRx works well for US consumers looking up drug prices at major chains. For developers building price transparency tools in other markets — or for any market where pharmacy pricing APIs don't exist — the only real-time data source is the pharmacy's own website. And pharmacy websites don't offer APIs.&lt;/p&gt;

&lt;p&gt;This tutorial builds a real-time medicine price comparison tool using parallel TinyFish agents, using Vietnamese pharmacy chains as the concrete example. The architecture generalizes to any market: replace the pharmacy URLs and update the currency normalization logic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Build a medicine price comparison tool in 4 steps:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Define your pharmacy list with search URLs&lt;/li&gt;
&lt;li&gt;Write a goal prompt that extracts consistent fields across different pharmacy sites&lt;/li&gt;
&lt;li&gt;Run agents in parallel with &lt;code&gt;Promise.allSettled&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Normalize price strings and rank by cost&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Why Pharmacy Price Comparison Is Hard to Build
&lt;/h2&gt;

&lt;p&gt;Pharmacy pricing is inherently fragmented. In most markets, chains set their own prices independently — and they don't publish those prices through APIs.&lt;/p&gt;

&lt;p&gt;The engineering problems stack up quickly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No unified API&lt;/strong&gt; — unlike retail (Amazon Product Advertising API) or travel (OTA affiliate APIs), pharmacy chains don't have a developer ecosystem. Each chain is its own data island.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Price variation is real and fast-moving&lt;/strong&gt; — prices change with promotions, supplier contracts, and regional stock. Data from last week is often wrong. Real-time extraction is the only reliable source.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sequential scraping multiplies wait time&lt;/strong&gt; — checking 5 pharmacy chains one after another means 5× the latency. A user waiting for a price comparison shouldn't wait 5× longer because the architecture is sequential.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Parallel browser agents solve all three: no API required, results are real-time, and checking 5 chains takes the same wall-clock time as checking one.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;Best when&lt;/th&gt;
&lt;th&gt;Breaks when&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Parallel browser agents&lt;/td&gt;
&lt;td&gt;No API exists, multi-market, real-time data needed&lt;/td&gt;
&lt;td&gt;High-frequency automated runs (100+/hour)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sequential scraping&lt;/td&gt;
&lt;td&gt;Single pharmacy, low request volume&lt;/td&gt;
&lt;td&gt;Latency is 5× longer per chain at scale&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Third-party API (GoodRx, etc.)&lt;/td&gt;
&lt;td&gt;US market, high-volume production use&lt;/td&gt;
&lt;td&gt;API unavailable in your market or too expensive&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Access scope:&lt;/strong&gt; This tool reads only public product search pages — the same pricing data any consumer sees when visiting a pharmacy website. It does not access prescription systems, patient portals, or any authenticated pharmacy resources.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For healthcare developers, this pattern applies to any pharmacy market. The example below uses five Vietnamese chains — Long Chau, Pharmacity, An Khang, Guardian, and Medicare — because Vietnam has no pharmacy price API and a fragmented pricing landscape. The same code runs for pharmacy chains in any country.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Choosing the right TinyFish API for this project:&lt;/strong&gt; TinyFish offers two tools with different cost profiles. The Fetch API (&lt;code&gt;api.fetch.tinyfish.ai&lt;/code&gt;) retrieves a page at a known URL — 1 credit = 15 pages. The Agent API (&lt;code&gt;agent.tinyfish.ai&lt;/code&gt;) handles pages requiring a search query or navigation — 1 credit per step. Most pharmacy price lookups require a search step before prices appear, so this tutorial uses Agent API. If a product has a stable direct URL on a given pharmacy's site, Fetch API is significantly cheaper for that call.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Parallel Agent Architecture
&lt;/h2&gt;

&lt;p&gt;One agent per pharmacy, all running simultaneously. Each agent navigates to the pharmacy's search page, finds the medicine, and returns structured data. &lt;code&gt;Promise.allSettled&lt;/code&gt; ensures that a slow or temporarily unavailable pharmacy doesn't delay results from the others.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prerequisites:&lt;/strong&gt; Node.js 18+, TypeScript, and a TinyFish API key.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; @tiny-fish/sdk
npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-D&lt;/span&gt; ts-node typescript
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;TINYFISH_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your_key_here
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Place &lt;code&gt;index.ts&lt;/code&gt; and &lt;code&gt;normalize.ts&lt;/code&gt; in the same directory, then run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx ts-node index.ts
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;TinyFish&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@tiny-fish/sdk&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;normalizePharmacyResult&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;./normalize&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;TinyFish&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// reads TINYFISH_API_KEY from env&lt;/span&gt;

&lt;span class="c1"&gt;// URLs verified May 2026 — verify before deploying, as pharmacy search paths change&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;PHARMACIES&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Long Chau&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://pharmacy-a.example.com/tim-kiem?key=&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Pharmacity&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://pharmacy-b.example.com/search?q=&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;An Khang&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://pharmacy-c.example.com/catalogsearch/result?q=&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Guardian&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://pharmacy-d.example.com/search?query=&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Medicare&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://pharmacy-e.example.com/search?s=&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;buildGoal&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;medicineName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;`
  Search for "&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;medicineName&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;".
  Find the medicine in the search results and return a JSON object with:
  - productName: string (exact name as displayed)
  - price: string (price exactly as shown, including currency symbols and separators)
  - currency: "VND"
  - dosageForm: string (tablet, capsule, liquid — in Vietnamese or English as shown)
  - stockStatus: string (in stock / out of stock — in Vietnamese or English as shown)
  - url: string (direct product page link)

  Return the single most relevant result for "&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;medicineName&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;".
  If no match is found, return null.
`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;compareMedicinePrices&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;medicineName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;encodeURIComponent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;medicineName&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;requests&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;PHARMACIES&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;pharmacy&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
    &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;agent&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;pharmacy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;goal&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;buildGoal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;medicineName&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="p"&gt;})&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// response.result is the parsed JavaScript value returned by the agent.&lt;/span&gt;
        &lt;span class="c1"&gt;// null is a valid result — the medicine was not found at this pharmacy.&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;raw&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;productName&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nl"&gt;price&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nl"&gt;dosageForm&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nl"&gt;stockStatus&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nl"&gt;url&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="na"&gt;pharmacy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;pharmacy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;raw&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nf"&gt;normalizePharmacyResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;};&lt;/span&gt;
      &lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;settled&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;allSettled&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;settled&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;pharmacy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;PHARMACIES&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;// error: true = infrastructure failure (timeout/network), distinct from null result (not found)&lt;/span&gt;
    &lt;span class="p"&gt;...(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;fulfilled&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="p"&gt;}));&lt;/span&gt;

  &lt;span class="c1"&gt;// Sort found results by price (ascending), unfound results last&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;priceVnd&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;priceVnd&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;priceVnd&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;priceVnd&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Usage&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;compareMedicinePrices&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Paracetamol 500mg&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;forEach&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pharmacy&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;priceVnd&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;toLocaleString&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;&lt;span class="s2"&gt;₫ — &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stockStatus&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pharmacy&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;: not found&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt; (error)&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Concurrency note:&lt;/strong&gt; The Free plan supports 2 concurrent agent runs — 5 pharmacies run in approximately 3 batches. The Starter plan (10 concurrent) runs all five simultaneously. Each agent step consumes 1 credit — a typical product search runs 3–6 steps.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F831cyia7jzgxdilnyco7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F831cyia7jzgxdilnyco7.png" alt="Parallel TinyFish agents searching five pharmacy chains simultaneously and returning normalized price data" width="800" height="255"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Price Normalization — The Real Engineering Challenge
&lt;/h2&gt;

&lt;p&gt;The parallel agents are the easy part. Price strings are the hard part.&lt;/p&gt;

&lt;p&gt;Five pharmacy websites display the same price in five different formats:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Long Chau:   "125,000₫"
Pharmacity:  "125.000 VNĐ"
An Khang:    "125.000đ"
Guardian:    "125,000"
Medicare:    "125000 đồng"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All five mean the same thing: 125,000 Vietnamese dong. But if you put these strings side by side without normalization, they're incomparable. The sort function breaks. The low-price highlight breaks. The entire comparison breaks.&lt;/p&gt;

&lt;p&gt;Create &lt;code&gt;normalize.ts&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// normalize.ts&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;parseVndPrice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Remove all currency labels and symbols&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;stripped&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;raw&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/₫|VNĐ|VND|đồng|dong|đ/gi&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="c1"&gt;// VND uses either comma or dot as the thousands separator.&lt;/span&gt;
  &lt;span class="c1"&gt;// Key insight: if there's no decimal component after the separator,&lt;/span&gt;
  &lt;span class="c1"&gt;// it's a thousands separator — not a decimal point.&lt;/span&gt;
  &lt;span class="c1"&gt;// "125,000" → 125000. "125.000" → 125000. "125000" → 125000.&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;digits&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;stripped&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;[^&lt;/span&gt;&lt;span class="sr"&gt;0-9&lt;/span&gt;&lt;span class="se"&gt;]&lt;/span&gt;&lt;span class="sr"&gt;/g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;num&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;parseInt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;digits&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;isNaN&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;num&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;num&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;DOSAGE_MAP&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;viên nén&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;tablet&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;viên nang&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;capsule&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;viên&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;tablet&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;ống&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;vial&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;gói&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;sachet&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;chai&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;bottle&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;hộp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;box&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;normalizeDosageForm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;lower&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toLowerCase&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;vi&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;en&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nb"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;entries&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;DOSAGE_MAP&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;vi&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;en&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;normalizeStockStatus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;in_stock&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;out_of_stock&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;limited&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;unknown&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;lower&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toLowerCase&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;còn hàng&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;in stock&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;in_stock&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;hết hàng&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;out of stock&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;out_of_stock&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;sắp hết&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;limited&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;limited&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;unknown&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;normalizePharmacyResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;productName&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;price&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;dosageForm&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;stockStatus&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;url&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;productName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;productName&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;priceVnd&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;price&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nf"&gt;parseVndPrice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;price&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;currency&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;VND&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;dosageForm&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;dosageForm&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nf"&gt;normalizeDosageForm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;dosageForm&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;stockStatus&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stockStatus&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nf"&gt;normalizeStockStatus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stockStatus&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;unknown&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why capture price as a string first.&lt;/strong&gt; Asking the agent to return the price as a number risks losing the formatting context needed to parse it correctly. "125.000" is ambiguous: is that 125,000 VND or 125.00 USD? Capturing the raw display string — "125.000 VNĐ" — preserves the currency label and makes the parsing logic unambiguous. Parse on your side, where you know the market context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Generalizing to other currencies.&lt;/strong&gt; Update &lt;code&gt;parseVndPrice&lt;/code&gt; for your local format. European currencies often use period as thousands separator and comma as decimal (€1.250,50). US/UK use comma as thousands separator. The normalization pattern is the same — strip labels, identify the separator convention for your market, parse to integer or float.&lt;/p&gt;




&lt;h2&gt;
  
  
  Adapting the Pattern to Your Pharmacy Market
&lt;/h2&gt;

&lt;p&gt;To adapt for a different country:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Replace &lt;code&gt;PHARMACIES&lt;/code&gt; URLs&lt;/strong&gt; with local chain search pages — any URL that accepts a medicine name query parameter works&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Update the goal prompt&lt;/strong&gt; if your market uses different field terminology&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Update &lt;code&gt;parseVndPrice&lt;/code&gt;&lt;/strong&gt; for local currency format and separators&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Update &lt;code&gt;DOSAGE_MAP&lt;/code&gt;&lt;/strong&gt; if the language differs (the agent extracts in whatever language the site uses)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Extension: price alert emails.&lt;/strong&gt; Run the comparison on a schedule and compare with the previous day's output. When a price drops below a threshold, send an email via nodemailer or a transactional email service.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Extension: brand vs. generic comparison.&lt;/strong&gt; Add a &lt;code&gt;brandType&lt;/code&gt; field to the goal prompt: &lt;code&gt;"brandType: 'brand' | 'generic' | 'unknown' (based on whether the product name contains the generic INN or is a brand name)"&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Extension: trend tracking.&lt;/strong&gt; Save each run's output to a database table keyed by &lt;code&gt;(pharmacy, productName, date)&lt;/code&gt;. Querying across dates gives you a price trend line per pharmacy — useful for transparency reports or consumer price alerts.&lt;/p&gt;

&lt;p&gt;For the same parallel extraction pattern applied to other multi-site scenarios, see &lt;a href="https://tinyfish.ai/blog/build-event-lodging-search-tool-ai-agents?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;how this pattern works for hotel price comparison across booking platforms&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;Pharmacy pricing is one of the most fragmented data problems in healthcare — hundreds of chains, no unified API, prices that change daily. Parallel agents solve the fragmentation: five chains, one function call, real-time results. The normalization layer solves the format problem: five different price strings, one comparable number. Together, they're the foundation of any serious price transparency tool.&lt;/p&gt;




&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Can this build a medicine price comparison tool without pharmacy APIs?&lt;/strong&gt;&lt;br&gt;
Yes. This tool reads public product search pages — the same pricing data any consumer sees when visiting the pharmacy's website without logging in. No pharmacy API partnership, data licensing agreement, or institutional approval is required. This is the practical solution for markets where pharmacy price APIs don't exist.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How does the price normalization handle different currency formats?&lt;/strong&gt;&lt;br&gt;
Pharmacy websites display prices in different formats even within the same country. The &lt;code&gt;parseVndPrice&lt;/code&gt; function strips currency labels and symbols, then removes all non-digit characters to extract the numeric value. For VND, this handles "125,000₫", "125.000 VNĐ", and "125000 đồng" identically. For other currencies, update the label strip list and handle the local decimal/thousands separator convention in your parsing function.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why capture the price as a string from the agent rather than asking for a number?&lt;/strong&gt;&lt;br&gt;
Asking the agent to return a number risks losing the formatting context needed to parse correctly. "125.000" is ambiguous — it's 125,000 VND in Vietnam but 125.0 in USD notation. Capturing the raw display string ("125.000 VNĐ") preserves the currency label and makes your parsing logic unambiguous. Parse on your side where you know the market context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What happens when a pharmacy doesn't carry the medicine?&lt;/strong&gt;&lt;br&gt;
&lt;code&gt;null&lt;/code&gt; is a valid agent result — the medicine wasn't found. This is distinct from &lt;code&gt;error: true&lt;/code&gt;, which indicates an infrastructure failure (network timeout, session error). Null results appear at the bottom of sorted output, allowing found results to float to the top regardless of which pharmacies carry the medicine.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do I add more pharmacy chains?&lt;/strong&gt;&lt;br&gt;
Add entries to the &lt;code&gt;PHARMACIES&lt;/code&gt; array with the chain name and its search URL. The goal prompt's natural language description adapts to different site structures — you may need to adjust field names in the goal if a chain uses unusual terminology for stock status or dosage form.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I schedule this for daily price tracking?&lt;/strong&gt;&lt;br&gt;
Yes. Run the comparison with a cron job or cloud scheduler and save results to a database table keyed by pharmacy, product name, and date. Comparing today's output with yesterday's automatically surfaces price changes. Set a threshold (e.g., any price drop &amp;gt; 10%) to trigger an email alert via a transactional email service.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Related Reading:&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://tinyfish.ai/blog/build-event-lodging-search-tool-ai-agents?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;Build a Hotel Price Comparison Tool for Events and Conventions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://tinyfish.ai/blog/build-video-game-price-comparison-10-platforms?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;Build a Video Game Price Comparison Tool Across 10 Platforms&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Deploy This with a Free Account
&lt;/h2&gt;

&lt;p&gt;The complete workflow above runs on TinyFish's free tier. &lt;strong&gt;500 free steps, no credit card&lt;/strong&gt; — enough to deploy this project and validate it against real data before choosing a plan.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://agent.tinyfish.ai/sign-up?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;Get your free API key →&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Want to scrape the web without getting blocked? Try &lt;a href="https://tinyfish.ai?utm_source=devto" rel="noopener noreferrer"&gt;TinyFish&lt;/a&gt; — a browser API built for AI agents and developers.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>webagents</category>
      <category>healthtech</category>
      <category>tutorial</category>
      <category>parallelagents</category>
    </item>
    <item>
      <title>Build a Video Game Price Comparison Tool Across 10 Platforms</title>
      <dc:creator>Tinyfishie</dc:creator>
      <pubDate>Tue, 19 May 2026 09:31:43 +0000</pubDate>
      <link>https://dev.to/tinyfishie/build-a-video-game-price-comparison-tool-across-10-platforms-3lld</link>
      <guid>https://dev.to/tinyfishie/build-a-video-game-price-comparison-tool-across-10-platforms-3lld</guid>
      <description>&lt;p&gt;TinyFish parallel agents are cloud-based browser sessions that query multiple game storefronts simultaneously and return structured pricing data — game title, current price, discount percentage, and direct purchase URL — normalized across currencies and formats into a single ranked list.&lt;/p&gt;

&lt;p&gt;Game prices vary wildly across storefronts and change constantly during sales. IsThereAnyDeal and CheapShark aggregate some of this, but neither covers every platform — itch.io is out, regional stores are out, newer storefronts take months to appear. When a game isn't in the database, you're back to checking each store manually.&lt;/p&gt;

&lt;p&gt;This tutorial builds a multi-platform game price comparison tool using TinyFish agents. Your configured storefronts are checked simultaneously — major PC storefronts, indie platforms, regional stores, or any public game store with a search page. You get a ranked price list in the time it takes to manually open two tabs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Build a video game price comparison tool in 4 steps:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Define your platform list with store search URLs&lt;/li&gt;
&lt;li&gt;Write a goal prompt that extracts price, discount, and link consistently&lt;/li&gt;
&lt;li&gt;Run all 10 agents in parallel with &lt;code&gt;Promise.allSettled&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Normalize currencies and rank by discounted price&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Why Building Beats Using CheapShark or IsThereAnyDeal
&lt;/h2&gt;

&lt;p&gt;CheapShark and IsThereAnyDeal are good starting points. They cover major PC storefronts reliably — and their APIs are free. For straightforward PC game price lookup, they're often sufficient.&lt;/p&gt;

&lt;p&gt;Browser agents make sense when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Your target platform isn't indexed&lt;/strong&gt; — regional stores, newer indie platforms, and console storefronts have partial or no CheapShark coverage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You need real-time data&lt;/strong&gt; — API-backed aggregators cache prices, sometimes aggressively. During major sale events, cached prices can lag the actual storefront by hours&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You're building beyond price lookup&lt;/strong&gt; — availability status, regional pricing differences, bundle detection, and DLC pricing require reading the actual store page rather than an API summary&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The engineering tradeoff is straightforward:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;Best for&lt;/th&gt;
&lt;th&gt;Limitation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;CheapShark / IsThereAnyDeal API&lt;/td&gt;
&lt;td&gt;Quick PC storefront lookup, free, no setup&lt;/td&gt;
&lt;td&gt;Doesn't cover all storefronts; cached data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Browser agents (this tutorial)&lt;/td&gt;
&lt;td&gt;All 10 platforms, real-time, fully customizable&lt;/td&gt;
&lt;td&gt;Credit cost per query; rate-limited at Free tier&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;p&gt;&lt;strong&gt;Choosing the right TinyFish API for this project:&lt;/strong&gt; TinyFish offers two tools with different access patterns. The &lt;a href="https://api.fetch.tinyfish.ai" rel="noopener noreferrer"&gt;Fetch API&lt;/a&gt; (&lt;code&gt;api.fetch.tinyfish.ai&lt;/code&gt;) retrieves a page at a known URL — free (0 credits). The &lt;a href="https://agent.tinyfish.ai" rel="noopener noreferrer"&gt;Agent API&lt;/a&gt; (&lt;code&gt;agent.tinyfish.ai&lt;/code&gt;) handles storefronts requiring a search step — 1 credit per step. For storefronts with stable product page URLs, Fetch API can retrieve the price directly (free, 0 credits). For storefronts that require a search query, Agent API is required (1 credit per step). Where you have direct product URLs, Fetch API is the right choice — free on all plans.&lt;/p&gt;

&lt;h2&gt;
  
  
  Video Game Price Comparison: Running Parallel Agents
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Prerequisites:&lt;/strong&gt; Node.js 18+, TypeScript, and a TinyFish API key.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; @tiny-fish/sdk
npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-D&lt;/span&gt; ts-node typescript
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;TINYFISH_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your_key_here
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;TinyFish&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@tiny-fish/sdk&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;TinyFish&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// reads TINYFISH_API_KEY from env&lt;/span&gt;

&lt;span class="c1"&gt;// Replace these with the storefronts you want to compare.&lt;/span&gt;
&lt;span class="c1"&gt;// Before adding any storefront, check its Terms of Service regarding automated access.&lt;/span&gt;
&lt;span class="c1"&gt;// Many storefronts have public APIs (e.g. CheapShark, IsThereAnyDeal) that are&lt;/span&gt;
&lt;span class="c1"&gt;// the preferred integration method — use browser agents only for stores without APIs.&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;PLATFORMS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Storefront A&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://example-store-a.com/search?q=&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Storefront B&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://example-store-b.com/games/search?query=&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Storefront C&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://example-store-c.com/search?term=&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="c1"&gt;// Add more storefronts here — any public search URL works&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;buildGoal&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;gameTitle&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;platformName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;`
  Search for the game "&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;gameTitle&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;" on this storefront.
  Return a JSON object with:
  - gameTitle: string (exact title as shown on this store)
  - price: string (current price exactly as displayed, including currency symbol)
  - originalPrice: string or null (if there is an active discount, the original/crossed-out price)
  - discountPercent: number or null (e.g. 40 for 40% off, null if not discounted)
  - currency: string (3-letter code: "USD", "EUR", "GBP", etc.)
  - platform: "&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;platformName&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;" (e.g. "Storefront A" or "Storefront B")
  - url: string (direct link to this game's store page)
  - available: boolean (true if the game is purchasable; false if unavailable, region-locked, or subscription-only)

  If the game is not found on this storefront, return null.
  Return only the most relevant match for "&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;gameTitle&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;".
`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;compareGamePrices&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;gameTitle&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;encodeURIComponent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;gameTitle&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;requests&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;PLATFORMS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;platform&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
    &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;agent&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;platform&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;goal&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;buildGoal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;gameTitle&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;platform&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// response.result is the parsed JavaScript object returned by the agent.&lt;/span&gt;
        &lt;span class="c1"&gt;// null = game not found. Check for goal failure vs legitimate null:&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="k"&gt;typeof&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;object&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
            &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;failure&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;platform&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;platform&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;platform&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;platform&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
      &lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;settled&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;allSettled&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;settled&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;platform&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;PLATFORMS&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;// error: true = network/timeout failure, distinct from null (game not found)&lt;/span&gt;
    &lt;span class="p"&gt;...(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;fulfilled&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="p"&gt;}));&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why &lt;code&gt;Promise.allSettled&lt;/code&gt; not &lt;code&gt;Promise.all&lt;/code&gt;:&lt;/strong&gt; A single storefront timing out or returning an error shouldn't cancel the nine successful results. &lt;code&gt;allSettled&lt;/code&gt; ensures every agent completes and reports independently.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Concurrency note:&lt;/strong&gt; The Free plan supports 2 concurrent agent runs — 10 platforms run in approximately 5 batches. The Starter plan (10 concurrent) runs all 10 simultaneously. Each agent step consumes 1 credit. Querying all 10 platforms for one game title typically costs 30–60 credits depending on site complexity.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fct79rg17dhepmmgyx4x5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fct79rg17dhepmmgyx4x5.png" alt="Ten TinyFish agents querying Steam, GOG, Epic, Xbox, PlayStation, and five other game storefronts simultaneously, returning normalized USD prices" width="800" height="255"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Normalizing Prices Across 10 Currencies and Formats
&lt;/h2&gt;

&lt;p&gt;Ten storefronts return ten different price formats: &lt;code&gt;"$29.99"&lt;/code&gt;, &lt;code&gt;"€24,99"&lt;/code&gt;, &lt;code&gt;"£19.99"&lt;/code&gt;, &lt;code&gt;"AU$39.95"&lt;/code&gt;, &lt;code&gt;"Free"&lt;/code&gt;, &lt;code&gt;"N/A (subscription only)"&lt;/code&gt;. Without normalization, sorting by price is impossible.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// normalize.ts&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;CURRENCY_TO_USD&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;USD&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;EUR&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1.09&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;// update these rates from an exchange rate API in production&lt;/span&gt;
  &lt;span class="na"&gt;GBP&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1.27&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;AUD&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.65&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;CAD&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.73&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;parsePriceToUsd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;currency&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;raw&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toLowerCase&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;free&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toLowerCase&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;n/a&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toLowerCase&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;subscription&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="c1"&gt;// Strip currency symbols and labels, normalize decimal separator&lt;/span&gt;
  &lt;span class="c1"&gt;// Remove all non-numeric chars except dot; handle comma as thousands separator&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cleaned&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;[^&lt;/span&gt;&lt;span class="sr"&gt;0-9.,&lt;/span&gt;&lt;span class="se"&gt;]&lt;/span&gt;&lt;span class="sr"&gt;/g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/,/g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;amount&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;parseFloat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;cleaned&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;isNaN&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;rate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;CURRENCY_TO_USD&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;currency&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;amount&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;rate&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// round to cents&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;calcSavings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;currentUsd&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;originalUsd&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nl"&gt;savingsUsd&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nl"&gt;discountPct&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;currentUsd&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;originalUsd&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;originalUsd&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;savingsUsd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;originalUsd&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;currentUsd&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;discountPct&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;savingsUsd&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nx"&gt;originalUsd&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;savingsUsd&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;savingsUsd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;discountPct&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;normalizeGameResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;price&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;originalPrice&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;currency&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;discountPercent&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;available&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;currency&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;currency&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;USD&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;priceUsd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;parsePriceToUsd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;price&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;currency&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;originalUsd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;originalPrice&lt;/span&gt;
    &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nf"&gt;parsePriceToUsd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;originalPrice&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;currency&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;savings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;calcSavings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;priceUsd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;originalUsd&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;priceUsd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;originalUsd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;effectiveDiscountPct&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;discountPercent&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="nx"&gt;savings&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;discountPct&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;savingsUsd&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;savings&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;savingsUsd&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Three normalization decisions worth explaining:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Currency conversion at query time.&lt;/strong&gt; Build the lookup table once, apply it at normalization — don't call an exchange rate API per result. In production, fetch rates daily and cache them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;"Free"&lt;/code&gt; returns 0.&lt;/strong&gt; Sorting by price puts free games at the top, which is almost always the right behavior for a deals tool.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;null&lt;/code&gt; for subscription/unavailable.&lt;/strong&gt; A subscription-only game shouldn't appear as &lt;code&gt;$0&lt;/code&gt; — it's not a direct purchase price. &lt;code&gt;null&lt;/code&gt; price pushes it to the bottom of sorted output.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Putting it together:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;normalizeGameResult&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;./normalize&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;rawResults&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;compareGamePrices&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Elden Ring&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;normalized&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;rawResults&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;normalizeGameResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}))&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;aPrice&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;priceUsd&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="kc"&gt;Infinity&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;bPrice&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;priceUsd&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="kc"&gt;Infinity&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;aPrice&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;bPrice&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;normalized&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;forEach&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;disc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;effectiveDiscountPct&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="s2"&gt;` (&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;effectiveDiscountPct&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;% off)`&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;platform&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;: $&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;priceUsd&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;toFixed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)}${&lt;/span&gt;&lt;span class="nx"&gt;disc&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; → &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run with &lt;code&gt;npx ts-node index.ts&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Building a Deal Alert Bot
&lt;/h2&gt;

&lt;p&gt;The comparison tool becomes a deal monitor with a scheduled run and a threshold check:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;WebhookClient&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;discord.js&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;DEAL_THRESHOLD_USD&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// alert when any platform drops below this&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;webhook&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;WebhookClient&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;DISCORD_WEBHOOK_URL&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;checkForDeals&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;gameTitle&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;compareGamePrices&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;gameTitle&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;normalized&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;results&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;normalizeGameResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}));&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;deals&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;normalized&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;priceUsd&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="kc"&gt;Infinity&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;DEAL_THRESHOLD_USD&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;deals&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;deals&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;d&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;`**&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;platform&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;**: $&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;priceUsd&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;toFixed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt; &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;effectiveDiscountPct&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="s2"&gt;`(-&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;effectiveDiscountPct&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;%)`&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; — &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;webhook&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`🎮 Deal alert for **&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;gameTitle&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;**:\n&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Run daily via cron or GitHub Actions&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;checkForDeals&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Elden Ring&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;checkForDeals&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Baldur's Gate 3&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Extension: Sale season monitor.&lt;/strong&gt; Run the comparison daily during major sale periods (typically mid-June and late November for PC storefronts). When any game in your watchlist drops more than 50%, trigger the webhook immediately. Track prices over time to see which stores run simultaneous sales and which don't.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Extension: multi-game watchlist.&lt;/strong&gt; Pass an array of game titles to &lt;code&gt;compareGamePrices&lt;/code&gt; in parallel via &lt;code&gt;Promise.all&lt;/code&gt; — 5 games × 10 platforms = 50 concurrent agent runs on a Pro plan.&lt;/p&gt;

&lt;p&gt;For the same 10-platform parallel pattern applied to retail products, see &lt;a href="https://tinyfish.ai/blog/build-medicine-price-comparison-parallel-agents?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;how parallel agents work for medicine price comparison across pharmacy chains&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;Game stores update prices hourly during sales. Checking 10 platforms manually for a single title takes long enough to miss a flash deal. Parallel agents collapse that to a single function call — all 10 checked simultaneously, prices normalized across currencies, ranked by actual cost. The same architecture that powers commercial deal aggregators, without the platform partnership requirements.&lt;/p&gt;




&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Can this build a video game price comparison tool that checks all major storefronts?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yes — browser agents navigate public storefront pages and require no per-platform API registration. Add any storefront with a public search page by appending an entry to the &lt;code&gt;PLATFORMS&lt;/code&gt; array — including regional stores, indie platforms, and storefronts not covered by existing aggregator APIs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When should I use browser agents instead of CheapShark API?&lt;/strong&gt; Use this decision framework:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;If you need&lt;/th&gt;
&lt;th&gt;→ Use&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Storefronts not covered by aggregator APIs&lt;/td&gt;
&lt;td&gt;→ Browser agents (configure your own platform list)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Real-time pricing during an active sale&lt;/td&gt;
&lt;td&gt;→ Browser agents (aggregators cache)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PC storefronts only, fast prototype&lt;/td&gt;
&lt;td&gt;→ CheapShark API (free, no credits needed)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Availability, bundle status, or regional pricing&lt;/td&gt;
&lt;td&gt;→ Browser agents (read the actual store page)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;In practice: start with CheapShark for the platforms it covers, add browser agents for the gaps.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why do existing APIs like CheapShark miss some platforms?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;CheapShark and IsThereAnyDeal aggregate prices from storefronts that maintain API partnerships or accept data feeds. Platforms without those partnerships — itch.io, console storefronts, newer indie platforms — don't appear. Browser agents have no such dependency: they read the public storefront pages that any customer visits, regardless of whether the platform has an API program.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How does the price normalization handle "Free" or subscription-only titles?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;parsePriceToUsd&lt;/code&gt; function returns &lt;code&gt;0&lt;/code&gt; for "Free" (putting free games first in sorted output) and &lt;code&gt;null&lt;/code&gt; for titles that are only available through a subscription service (not a direct purchase price). &lt;code&gt;null&lt;/code&gt; prices sort to the bottom — a subscription title isn't a direct purchase price and shouldn't displace paid options.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What happens when a storefront doesn't carry the game?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;null&lt;/code&gt; is the correct result — the game isn't listed on that platform. This is distinct from &lt;code&gt;error: true&lt;/code&gt; (network failure or timeout). The comparison output shows &lt;code&gt;null&lt;/code&gt;-result platforms as "not found" in the display layer, so users can see at a glance which stores carry the title.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How many platforms can run simultaneously?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The Free plan (PAYG, 500 credits to start) supports 2 concurrent agent runs — 10 platforms run in 5 batches. The Starter plan ($15/mo, 10 concurrent) runs all 10 at once. Each agent step consumes 1 credit; a typical 10-platform query costs 30–60 credits total depending on site complexity and JavaScript rendering requirements.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I extend this to track prices over time?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yes. Save each run's output to a database keyed by game title, platform, and timestamp. Plotting &lt;code&gt;priceUsd&lt;/code&gt; over time reveals sale patterns — which storefronts discount simultaneously, how long post-launch discounts take to appear across different store types, and which platforms hold full price longest. Add a cron job to run daily checks for your watchlist.&lt;/p&gt;




&lt;h2&gt;
  
  
  Deploy This with a Free Account
&lt;/h2&gt;

&lt;p&gt;The complete workflow above runs on TinyFish's free tier. &lt;strong&gt;500 free steps, no credit card&lt;/strong&gt; — enough to deploy this project and validate it against real data before choosing a plan.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://agent.tinyfish.ai/sign-up?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;Get your free API key →&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Related Reading:&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://tinyfish.ai/blog/build-medicine-price-comparison-parallel-agents?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;Build a Medicine Price Comparison Tool with Parallel Agents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://tinyfish.ai/blog/build-vehicle-rental-price-comparison-tool?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;Build a Vehicle Rental Price Comparison Tool with Parallel AI Agents&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Want to scrape the web without getting blocked? Try &lt;a href="https://tinyfish.ai?utm_source=devto" rel="noopener noreferrer"&gt;TinyFish&lt;/a&gt; — a browser API built for AI agents and developers.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>webagents</category>
      <category>gaming</category>
      <category>tutorial</category>
      <category>parallelagents</category>
    </item>
    <item>
      <title>TinyFish vs Playwright: When to Use Each for Web Automation</title>
      <dc:creator>Tinyfishie</dc:creator>
      <pubDate>Tue, 19 May 2026 08:59:35 +0000</pubDate>
      <link>https://dev.to/tinyfishie/tinyfish-vs-playwright-when-to-use-each-for-web-automation-4pd1</link>
      <guid>https://dev.to/tinyfishie/tinyfish-vs-playwright-when-to-use-each-for-web-automation-4pd1</guid>
      <description>&lt;p&gt;Playwright is running fine. Then you scale to 500 concurrent sessions. Or you add sites with strict automation requirements and reliability drops. Or you realize your team spends more time managing browser infrastructure than building the product.&lt;/p&gt;

&lt;p&gt;That's usually when the search starts.&lt;/p&gt;

&lt;p&gt;This article is for developers who already use Playwright and want to understand exactly where TinyFish fits — and where it doesn't. Not a takedown. A map.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Playwright Does (And Why It's Excellent)
&lt;/h2&gt;

&lt;p&gt;Playwright is one of the best browser automation tools available. It controls real Chromium, Firefox, and WebKit browsers with a clean, well-designed API. The ecosystem is mature: thorough documentation, active maintenance from the Microsoft team, and broad community support.&lt;/p&gt;

&lt;p&gt;For developers, the appeal is direct. You write the code. You control everything. Intercept requests, manage cookies, record HAR files, take screenshots, assert DOM elements. The degree of control Playwright gives you over browser behavior is genuinely hard to replicate elsewhere.&lt;/p&gt;

&lt;p&gt;The critical thing to understand about Playwright: it's a library. It doesn't provide infrastructure. &lt;strong&gt;You write the code, you run the infrastructure.&lt;/strong&gt; That's not a criticism — it's the design. Playwright does exactly what it was built to do.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Playwright Hits Limits at Scale
&lt;/h2&gt;

&lt;p&gt;The infrastructure model that makes Playwright flexible at small scale becomes operational overhead as you grow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Browser server management.&lt;/strong&gt; Running Playwright at scale means managing fleets of headless Chromium instances — 100–300MB memory per instance. At 50 concurrent sessions, you're making decisions about server scaling, spot instance reliability, and container orchestration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Proxy configuration.&lt;/strong&gt; Sites with strict automation requirements often need residential IP rotation. Building proxy rotation into Playwright means integrating a proxy provider, handling session affinity, managing per-request IP logic, and debugging reliability when rotation fails or IP reputations degrade.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Maintenance overhead.&lt;/strong&gt; When a target site changes its structure or tightens its automation requirements, your scraper breaks. Diagnosing and fixing these failures is real engineering work — work that doesn't ship product, but blocks pipelines.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reliability degrades with volume.&lt;/strong&gt; A Playwright script that works at 10 requests/hour can fail at 500/hour — not because the code changed, but because infrastructure factors create failure modes that weren't visible at low volume. Setting up reliable automation on sites with strict requirements takes significant infrastructure work with Playwright alone.&lt;/p&gt;

&lt;p&gt;If your targets are static pages on cooperative domains at modest volume, this overhead is manageable. If you're running production pipelines at scale across varied sites, the infrastructure becomes a full-time engineering problem.&lt;/p&gt;

&lt;p&gt;For more on where Playwright's reliability degrades by site type, see &lt;a href="https://tinyfish.ai/blog/scrape-dynamic-websites-without-playwright?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;Scraping Dynamic Websites: When Playwright Is the Right Tool&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://tinyfish.ai/signup?utm_source=official_blog_TT&amp;amp;utm_medium=blog&amp;amp;utm_campaign=tinyfish-vs-playwright&amp;amp;utm_content=inline_pain_solution_vs_playwright" rel="noopener noreferrer"&gt;Test TinyFish against your hardest URLs — 500 free credits, no credit card&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What TinyFish Adds
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;📸 IMAGE — Architecture comparison showing self-hosted Playwright stack versus TinyFish managed browser infrastructure&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Your existing Playwright scripts work. The change is minimal: create a TinyFish browser session, then connect your Playwright code to it.&lt;/p&gt;

&lt;p&gt;TinyFish exposes a Browser API that returns a CDP WebSocket endpoint. Since Playwright's &lt;code&gt;connect_over_cdp()&lt;/code&gt; method already speaks CDP, connecting to TinyFish instead of launching a local browser requires two lines — create a session, then connect. Everything after that is unchanged:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;playwright.async_api&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;async_playwright&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;async_playwright&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Before: local browser
&lt;/span&gt;    &lt;span class="c1"&gt;# browser = await p.chromium.launch()
&lt;/span&gt;
    &lt;span class="c1"&gt;# After: TinyFish managed browser — create a session, then connect
&lt;/span&gt;    &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tinyfish&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TinyFish&lt;/span&gt;
    &lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TinyFish&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sessions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;browser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chromium&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect_over_cdp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cdp_url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new_page&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;goto&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://example.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;content&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Everything after the connection line — your selectors, page interactions, extraction logic — stays identical.&lt;/p&gt;

&lt;p&gt;What changes is the infrastructure layer. Browser servers managed. Residential proxy routing included. Cold starts under 250ms. Browser sessions run with full infrastructure support — proxy routing, session state, and browser configuration are handled server-side. TinyFish achieves up to 85% success rate on sites with strict automation requirements across standard commercial site categories (source: tinyfish.ai).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The biggest difference between TinyFish and Playwright isn't what the browser can do — it's what you no longer have to build. TinyFish removes the infrastructure layer entirely: browser servers, proxy management, reliability handling. Your Playwright code connects unchanged; the operational overhead disappears.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;One distinction worth noting: TinyFish also offers a Web Agent layer for goal-based tasks, separate from the Browser API. If your use case is "navigate to X, extract Y, handle the flow autonomously," the Agent endpoint may serve you better than CDP. But for teams with existing Playwright code who want to remove browser infrastructure, the Browser API is the direct path.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffw9zhbrkyxfehgc28oj3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffw9zhbrkyxfehgc28oj3.png" alt="Architecture comparison showing self-hosted Playwright stack versus TinyFish managed browser infrastructure" width="799" height="306"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Playwright vs TinyFish: Side-by-Side
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Playwright&lt;/th&gt;
&lt;th&gt;TinyFish Browser API&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Hosting model&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Self-hosted (you manage servers)&lt;/td&gt;
&lt;td&gt;Cloud-managed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Setup&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Library install; infra setup varies&lt;/td&gt;
&lt;td&gt;API key, SDK, CLI, or MCP&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Maintenance&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;You own: server ops, proxy config, failure debugging&lt;/td&gt;
&lt;td&gt;Managed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Browser cold start&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Depends on your hardware&lt;/td&gt;
&lt;td&gt;&amp;lt;250ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Infrastructure reliability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Depends on your implementation&lt;/td&gt;
&lt;td&gt;Managed, included&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Max concurrent (managed)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Limited by your servers&lt;/td&gt;
&lt;td&gt;10 (Starter) / 50 (Pro)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cost at low volume&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Low — library is free&lt;/td&gt;
&lt;td&gt;$0.015/credit PAYG&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cost at high volume&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Server + proxy + engineering time&lt;/td&gt;
&lt;td&gt;$0.012/credit (Pro)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CDP compatible&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Native&lt;/td&gt;
&lt;td&gt;✅ — existing Playwright scripts work&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Full browser control&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ Complete&lt;/td&gt;
&lt;td&gt;✅ via CDP&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If Playwright wins a row, it wins. The table is accurate.&lt;/p&gt;

&lt;p&gt;One context the table doesn't capture: Playwright's "low" cost at low volume is the library cost. The real cost at scale includes server infrastructure, proxy subscriptions, and the engineering time spent on maintenance. That's what TinyFish replaces — and it's why the economics shift faster than the per-step price suggests.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to Stick With Playwright
&lt;/h2&gt;

&lt;p&gt;Use Playwright here. It's the right tool.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;UI and functional testing.&lt;/strong&gt; Playwright is purpose-built for testing — test runners, assertions, traces, screenshot comparison. TinyFish is a data extraction and automation platform, not a testing framework.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Simple, stable targets.&lt;/strong&gt; Working with a small number of cooperative sites whose structure doesn't change? If you're comfortable managing the browser process and the maintenance load is genuinely zero, Playwright is the leaner choice. The threshold isn't about volume — it's about whether infrastructure has become a problem yet.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Local development and prototyping.&lt;/strong&gt; Building and debugging automation flows is faster with local Playwright — full visibility into browser state, easy debugging, no API latency.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;When you need complete control.&lt;/strong&gt; Custom browser configurations, extension loading, low-level network manipulation — Playwright's architecture gives direct access to things the Browser API abstracts.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Early-stage validation.&lt;/strong&gt; If you're running quick experiments to test whether a data source is useful before committing to a pipeline, Playwright locally is the fastest path to an answer.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  When TinyFish Makes Sense
&lt;/h2&gt;

&lt;p&gt;TinyFish's value is clearest when the infrastructure problem becomes visible. That moment comes sooner than most teams expect.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Production pipelines at scale.&lt;/strong&gt; When thousands of concurrent sessions make browser infrastructure an ops problem, managed infrastructure is worth the per-step cost.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Teams that don't want to own browser ops.&lt;/strong&gt; If browser infrastructure isn't a core competency you want to build, outsourcing it is rational — your engineers ship product, not browser fleet maintenance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sites with strict automation requirements.&lt;/strong&gt; Sites that require more reliable infrastructure-level request handling — where JavaScript-layer plugins fail at scale — benefit from TinyFish's infrastructure-level approach. Infrastructure-level request handling is built in at the execution layer, not patched in afterward.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Goal-based multi-step tasks.&lt;/strong&gt; When your automation requires reasoning about what to do next rather than a fixed selector sequence, the Web Agent layer handles tasks that would require complex conditional Playwright logic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mixed complexity URL lists.&lt;/strong&gt; If your list includes simple pages alongside sites that require more sophisticated handling, routing to TinyFish selectively while keeping Playwright for simple cases is a valid tiered approach.&lt;/p&gt;

&lt;h2&gt;
  
  
  Can They Work Together?
&lt;/h2&gt;

&lt;p&gt;Yes. This is the most important point in this article: &lt;strong&gt;Playwright and TinyFish are not competing tools. They're different layers.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Playwright is the automation library. TinyFish is the infrastructure layer. You can run them in parallel — Playwright for tests and simple sites, TinyFish for production-scale or complex extraction — without any conflict.&lt;/p&gt;

&lt;p&gt;The migration is genuinely minimal. Here's a complete working example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;playwright.async_api&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;async_playwright&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;scrape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;async_playwright&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Replace launch() with a TinyFish session — the rest of your code is unchanged
&lt;/span&gt;        &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tinyfish&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TinyFish&lt;/span&gt;
        &lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TinyFish&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sessions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;browser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chromium&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect_over_cdp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cdp_url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new_page&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;goto&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;wait_until&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;networkidle&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;content&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;scrape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://example.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your selectors, event handlers, extraction logic — unchanged. The browser runs in TinyFish's managed infrastructure instead of your server.&lt;/p&gt;

&lt;p&gt;For session configuration, concurrent session management, and authentication handling, the full documentation is at &lt;a href="https://docs.tinyfish.ai/browser-api?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;docs.tinyfish.ai/browser-api&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.tinyfish.ai/browser-api?utm_source=official_blog_TT&amp;amp;utm_medium=blog&amp;amp;utm_campaign=tinyfish-vs-playwright&amp;amp;utm_content=inline_docs_bridge_vs_playwright" rel="noopener noreferrer"&gt;Read the Browser API docs — connection options, session config, concurrent setup&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you want to try it against your current Playwright scripts, the &lt;a href="https://tinyfish.ai/blog/tinyfish-web-agent-getting-started-10-minutes?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;Getting Started guide&lt;/a&gt; walks through first connection in under 10 minutes.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://tinyfish.ai/signup?utm_source=official_blog_TT&amp;amp;utm_medium=blog&amp;amp;utm_campaign=tinyfish-vs-playwright&amp;amp;utm_content=end_cta_vs_playwright" rel="noopener noreferrer"&gt;Try TinyFish free — 500 credits, no credit card required&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Is TinyFish a Playwright alternative or a complement?
&lt;/h3&gt;

&lt;p&gt;Both, depending on what you need. For teams with existing Playwright code, TinyFish's Browser API is a complement — your scripts run unchanged, just on managed infrastructure instead of your server. As a fuller alternative, TinyFish makes sense when you need infrastructure-level reliability at scale or goal-based agent automation. Most teams end up using both: Playwright for testing and simple sites, TinyFish for production-scale or complex extraction.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I use my existing Playwright scripts with TinyFish?
&lt;/h3&gt;

&lt;p&gt;Yes. TinyFish's Browser API returns a CDP WebSocket endpoint. Playwright's &lt;code&gt;connect_over_cdp()&lt;/code&gt; method connects to any CDP endpoint. The migration is minimal: create a TinyFish browser session, then pass &lt;code&gt;session.cdp_url&lt;/code&gt; to &lt;code&gt;playwright.chromium.connect_over_cdp()&lt;/code&gt;. Everything after that line — selectors, interactions, extraction logic — runs unchanged.&lt;/p&gt;

&lt;h3&gt;
  
  
  When does Playwright become impractical at scale?
&lt;/h3&gt;

&lt;p&gt;Common signals: you're spending more engineering time on browser infrastructure than on automation logic; reliability issues appear that aren't in your code; proxy management has become its own project; or target sites with strict requirements need ongoing maintenance as they evolve. If these apply at your current volume, the infrastructure overhead has exceeded the cost of a managed alternative.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does TinyFish handle sites with strict automation requirements?
&lt;/h3&gt;

&lt;p&gt;TinyFish achieves up to 85% success rate on sites with strict automation requirements across standard commercial site categories (source: tinyfish.ai). Sites that JavaScript-layer plugins can't reliably handle are within TinyFish's standard coverage. For sites outside this coverage, failures are transparent — you won't get silent incorrect results.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's the cost difference between Playwright and TinyFish at scale?
&lt;/h3&gt;

&lt;p&gt;Playwright's library is free, but at scale the real costs are server infrastructure, proxy subscriptions, and engineering time for maintenance. TinyFish charges $0.012–$0.015 per credit depending on plan, with browser infrastructure, residential proxy, and LLM inference included. Playwright wins at low volume and simple sites; the economics shift toward TinyFish when infrastructure overhead becomes a material cost. The Pro plan ($150/mo) includes 16,500 credits with 50 concurrent agents.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try TinyFish Free
&lt;/h2&gt;

&lt;p&gt;If the selector maintenance and infrastructure overhead are the bottleneck — not the control — the free tier is the fastest way to compare against your current Playwright setup.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;500 free steps, no credit card required.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://agent.tinyfish.ai/sign-up?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;Start free →&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;No browser to install, no selectors to write. Describe what you need, get back structured JSON.&lt;/p&gt;

&lt;h2&gt;
  
  
  Related Reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Pillar:&lt;/strong&gt; &lt;a href="https://tinyfish.ai/blog/the-best-web-scraping-tools-in-2026?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;The Best Web Scraping Tools in 2026&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://tinyfish.ai/blog/scrape-dynamic-websites-without-playwright?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;Scraping Dynamic Websites: When Playwright Is the Right Tool&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://tinyfish.ai/blog/tinyfish-web-agent-getting-started-10-minutes?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;Getting Started with TinyFish: Your First Web Agent in 10 Minutes&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Headless Browser vs AI Agents for Web Automation&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Want to scrape the web without getting blocked? Try &lt;a href="https://tinyfish.ai?utm_source=devto" rel="noopener noreferrer"&gt;TinyFish&lt;/a&gt; — a browser API built for AI agents and developers.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>playwright</category>
      <category>browserautomation</category>
      <category>webscraping</category>
      <category>tinyfishvsplaywright</category>
    </item>
    <item>
      <title>How to Choose a Web Automation Tool by Page Volume (With Real Cost Estimates)</title>
      <dc:creator>Tinyfishie</dc:creator>
      <pubDate>Tue, 19 May 2026 08:59:07 +0000</pubDate>
      <link>https://dev.to/tinyfishie/how-to-choose-a-web-automation-tool-by-page-volume-with-real-cost-estimates-k35</link>
      <guid>https://dev.to/tinyfishie/how-to-choose-a-web-automation-tool-by-page-volume-with-real-cost-estimates-k35</guid>
      <description>&lt;p&gt;Most web automation tool comparisons treat page volume as a footnote. It isn't.&lt;/p&gt;

&lt;p&gt;The tool that handles 500 pages a day beautifully will silently degrade at 50,000. The infrastructure that's cost-effective at 10,000 pages becomes the most expensive option in the room at 500,000. And the free tier that feels like a reasonable starting point has a ceiling that catches most teams by surprise somewhere in the middle of a project.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Page volume and access requirement level are the two primary variables that determine your tool decision — more than AI capability, ease of use, or no-code vs. code.&lt;/em&gt; Get either wrong and you're either paying for infrastructure you don't need or running a pipeline that breaks under load exactly when it matters.&lt;/p&gt;

&lt;p&gt;This guide maps each volume tier to the tools that actually work at that scale, with real cost estimates at each level so you can make the comparison with numbers rather than intuition.&lt;/p&gt;




&lt;h2&gt;
  
  
  How to Estimate Your Page Volume
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Quick decision rules before the detail:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Under 1K pages/day&lt;/strong&gt; — free tiers work; pick for convenience, not capability.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;1K–10K pages/day&lt;/strong&gt; — managed tools beat self-hosted once you count setup time.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;10K–100K pages/day&lt;/strong&gt; — engineering maintenance cost exceeds tool subscription cost; factor both.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;100K+ pages/day&lt;/strong&gt; — you're buying infrastructure, not a tool; build vs. buy is the real decision.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Before matching tools to volume tiers, you need an accurate number. Teams consistently underestimate this, and the underestimate is what causes mid-project tool switches.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The formula:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Daily pages = (number of target URLs) × (crawl frequency per day) × (pages per URL path)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few scenarios to calibrate against:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Competitor price monitoring across 50 e-commerce sites, updated daily:&lt;/strong&gt; If each site has ~200 product pages, that's 10,000 pages/day. If you need hourly updates, that's 240,000 pages/day.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Lead enrichment from 2,000 company profile pages, run once a week:&lt;/strong&gt; ~285 pages/day on average. Looks small — until you need it done in a 2-hour window, which effectively makes it ~1,000 pages/hour.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;News monitoring across 30 publications, 4x daily:&lt;/strong&gt; If each publication has ~50 new articles per cycle, that's 6,000 pages/day.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The number that matters for tool selection isn't the total — it's the &lt;strong&gt;peak load your pipeline needs to sustain&lt;/strong&gt;, and whether you need it done in a tight time window or can spread it across the day.&lt;/p&gt;




&lt;h2&gt;
  
  
  Volume Tier 1: Under 1,000 Pages Per Day
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What this looks like
&lt;/h3&gt;

&lt;p&gt;One-off research pulls. Small recurring monitors. Proof-of-concept scrapes before committing to a larger pipeline. A freelancer pulling a client's competitor catalog. A researcher collecting data from academic directories.&lt;/p&gt;

&lt;h3&gt;
  
  
  What works
&lt;/h3&gt;

&lt;p&gt;At this volume, almost any tool works. The decision is about convenience and your technical comfort level, not about infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Free options that are genuinely capable here:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Web Scraper Chrome Extension&lt;/strong&gt; (free): Works for public, unprotected pages. No scheduling, no parallelism, but for a one-time pull of a few hundred rows it's the fastest path to a CSV.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;ParseHub free tier&lt;/strong&gt;: 5 projects, up to 200 pages per run. If your target has fewer than 200 pages, this is a complete solution at zero cost.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Octoparse free tier&lt;/strong&gt;: &lt;em&gt;2 simultaneous scrapers, 10 tasks limit, up to 50K rows/month export. Better for recurring small-volume scrapes than ParseHub, but verify the task and row limits against your actual target before committing.&lt;/em&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;TinyFish free tier&lt;/strong&gt;: 500 credits, no credit card. The value here isn't the volume — it's that you get to test an AI agent against your actual target site, including any access requirements it enforces.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What to watch for:&lt;/strong&gt; Free tiers hide their ceilings. ParseHub's 200-page-per-run limit is the one most teams hit mid-project. If your target has 250 product pages, you're already over the limit. Verify the ceiling against your actual target page count before building a workflow around any free tier.&lt;/p&gt;

&lt;h3&gt;
  
  
  Real cost at this volume
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Monthly cost at ~500 pages/day&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Web Scraper Extension&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;td&gt;No scheduling, uses your IP&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ParseHub&lt;/td&gt;
&lt;td&gt;$0 (free tier)&lt;/td&gt;
&lt;td&gt;200 pages/run limit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Octoparse&lt;/td&gt;
&lt;td&gt;$0 (free tier)&lt;/td&gt;
&lt;td&gt;Local runs only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TinyFish&lt;/td&gt;
&lt;td&gt;$0 (free tier)&lt;/td&gt;
&lt;td&gt;500 credits total, not per day&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scrapy (self-hosted)&lt;/td&gt;
&lt;td&gt;$0 + server cost (~$5–10/mo VPS)&lt;/td&gt;
&lt;td&gt;Requires Python setup&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Volume Tier 2: 1,000 to 10,000 Pages Per Day
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What this looks like
&lt;/h3&gt;

&lt;p&gt;A small team's recurring data feed. Daily price monitoring across dozens of sites. A startup's competitive intelligence pipeline. Most "we scrape data to inform our product decisions" use cases live here.&lt;/p&gt;

&lt;h3&gt;
  
  
  What works
&lt;/h3&gt;

&lt;p&gt;This is where free tiers run out and you start paying for infrastructure. The key trade-off at this volume is between &lt;strong&gt;simplicity&lt;/strong&gt; (managed cloud tools) and &lt;strong&gt;cost efficiency&lt;/strong&gt; (self-hosted frameworks).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Managed cloud tools (simpler, higher per-page cost):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Apify&lt;/strong&gt;: Solid at this volume. A well-configured Actor running 5,000 pages/day typically costs $30–60/month in compute. The marketplace of pre-built Actors covers most common targets (Amazon, LinkedIn, Google Maps) and gets you to first data in under ten minutes without writing selectors. For targets outside the catalog, you're writing and maintaining custom Actors — factor that time in.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;TinyFish Browser API&lt;/strong&gt;: $15/month (Starter) for developers already using Playwright, Puppeteer, or Selenium. Connects via CDP over WebSocket — no SDK swap, you point your existing browser automation at TinyFish's endpoint instead. Sub-250ms cold start means parallelism scales cleanly without queuing delays. Best fit at this tier: developers who want &lt;em&gt;managed browser infrastructure&lt;/em&gt; without rebuilding their scraping stack.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;TinyFish Web Agent&lt;/strong&gt; (Starter, $15/month, 1,650 credits): Better fit when your target requires multi-step navigation or authentication rather than straightforward page extraction. A simple extraction runs 2–3 steps/page; an authenticated flow runs 8–10.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Browser API and Web Agent share the same credit pool, so you can mix both within one plan depending on what each target requires.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Self-hosted frameworks (more work, lower marginal cost):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Scrapy&lt;/strong&gt;: Free to run, but you're paying for a server and your own time. A $20/month cloud instance handles this volume easily. The real cost is the 4–8 hours of setup time and ongoing maintenance when target sites change. If your targets are static HTML with no strict access requirements, this is the most cost-efficient option at this volume.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What to watch for:&lt;/strong&gt; At 1,000–10,000 pages/day, you're large enough that sites with strict access requirements become a more significant factor. A managed tool that includes proxy rotation (like TinyFish) absorbs that cost into the subscription. A self-hosted Scrapy setup needs a separate proxy budget — &lt;em&gt;residential proxies (e.g., Bright Data) run ~$8/GB PAYG at this tier&lt;/em&gt;, which adds $20–80/month depending on page weight.&lt;/p&gt;

&lt;h3&gt;
  
  
  Real cost at this volume
&lt;/h3&gt;

&lt;p&gt;Estimated monthly cost at 5,000 pages/day:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Base cost&lt;/th&gt;
&lt;th&gt;Proxy cost&lt;/th&gt;
&lt;th&gt;Estimated total/mo&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Scrapy (self-hosted)&lt;/td&gt;
&lt;td&gt;~$20 (server)&lt;/td&gt;
&lt;td&gt;$30–80 (if needed)&lt;/td&gt;
&lt;td&gt;$20–100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Apify (pay-as-you-go)&lt;/td&gt;
&lt;td&gt;~$40–60 (compute)&lt;/td&gt;
&lt;td&gt;Separate&lt;/td&gt;
&lt;td&gt;$40–140&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TinyFish Starter&lt;/td&gt;
&lt;td&gt;$15&lt;/td&gt;
&lt;td&gt;Included&lt;/td&gt;
&lt;td&gt;$15&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TinyFish Pro&lt;/td&gt;
&lt;td&gt;$150&lt;/td&gt;
&lt;td&gt;Included&lt;/td&gt;
&lt;td&gt;$150&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;Note: TinyFish pricing includes browsers, proxies, and AI inference. Apify and Scrapy costs are compute only — add proxy costs separately for protected sites.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Volume Tier 3: 10,000 to 100,000 Pages Per Day
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What this looks like
&lt;/h3&gt;

&lt;p&gt;A mid-size company's market intelligence operation. An e-commerce brand monitoring pricing across hundreds of competitor sites. A SaaS product that needs fresh web data as a core feature. This is where scraping stops being a side project and becomes infrastructure.&lt;/p&gt;

&lt;h3&gt;
  
  
  What works
&lt;/h3&gt;

&lt;p&gt;At this volume, the hidden cost of scraping is no longer the tool subscription — it's engineering time. Selector-based scrapers break when target sites update. Proxy pools need management. Failure monitoring becomes a dedicated function. The teams that underestimate this end up with a part-time engineer whose primary job is keeping the scraping pipeline alive.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Managed infrastructure wins on total cost here:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;TinyFish&lt;/strong&gt; *(Browser API, for teams migrating from Playwright/Puppeteer): $150/month for 16,500 credits, 50 concurrent sessions; pay-as-you-go at $0.015/credit. TinyFish billing depends on which API you use — the calculations below assume 10 seconds per page; actual costs vary with page load time and workflow complexity.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Browser API (billed per time — 1 credit = 4 minutes, minimum 1 minute):&lt;/p&gt;

&lt;p&gt;10 sec/page → rounds up to 1 min → 0.25 credits/page&lt;/p&gt;

&lt;p&gt;50,000 pages/day × 0.25 credits × 30 days = 375,000 credits/month&lt;/p&gt;

&lt;p&gt;→ PAYG: ~$5,625/month | Pro plan (overage at $0.012/credit): ~$4,452/month &lt;/p&gt;

&lt;p&gt;Web Agent (billed per step — 1 credit = 1 step; for complex multi-step workflows):&lt;/p&gt;

&lt;p&gt;~3 steps/page × 50,000 pages/day × 30 days = 4,500,000 steps/month&lt;/p&gt;

&lt;p&gt;→ PAYG: ~$67,500/month | Not practical for bulk simple extraction at this volume.&lt;/p&gt;

&lt;p&gt;Most bulk extraction pipelines at this tier use the Browser API. Web Agent pricing is designed for complex authenticated workflows where the automation value per workflow justifies the cost.*&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Apify&lt;/strong&gt; &lt;em&gt;(Starter plan): Starts at $29/month; the Scale plan ($199/month) is typically required at 50,000 pages/day.&lt;/em&gt; At this volume, expect $200–500/month in compute, plus significant proxy costs for protected sites. Custom Actors require ongoing maintenance.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Bright Data&lt;/strong&gt;: At this volume, Bright Data's Scraping Browser becomes relevant — a fully managed Chrome instance with built-in proxy rotation. &lt;em&gt;Cost is primarily proxy bandwidth: residential proxies at ~$8/GB (PAYG; source: brightdata.com, April 2026). A 50,000-page/day operation scraping typical retail pages (~500KB each) uses roughly 25GB/day — approximately $6,000/month in proxy costs alone.&lt;/em&gt; Bright Data makes sense when geographic targeting or anti-detection reliability is the primary requirement, not as a general-purpose option.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Self-hosted at this volume:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Scrapy + infrastructure&lt;/strong&gt;: Technically possible, but at 50,000 pages/day you need distributed infrastructure — multiple servers, a job queue (Redis or Celery), a monitoring stack, and proxy management. A realistic infrastructure budget is $200–500/month, plus 20+ hours/month of engineering maintenance. Justified if you have a dedicated data engineering team and highly customized requirements.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What to watch for:&lt;/strong&gt; This is the volume tier where silent failure becomes a serious business problem. A pipeline that silently returns empty results for three days at 50,000 pages/day is a data quality incident, not a minor inconvenience. Factor monitoring and alerting into your tool evaluation — not just happy-path performance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Real cost at this volume
&lt;/h3&gt;

&lt;p&gt;Estimated monthly cost at 50,000 pages/day, assuming a mixed target set of simple and JS-heavy sites requiring managed browser infrastructure:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Estimated total/mo&lt;/th&gt;
&lt;th&gt;Selector maintenance&lt;/th&gt;
&lt;th&gt;Failure visibility&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Scrapy + proxies&lt;/td&gt;
&lt;td&gt;$2,000–2,300 ⁽¹⁾&lt;/td&gt;
&lt;td&gt;High (you own it)&lt;/td&gt;
&lt;td&gt;Manual&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Apify (custom Actors)&lt;/td&gt;
&lt;td&gt;$500–900&lt;/td&gt;
&lt;td&gt;Medium (Actor updates)&lt;/td&gt;
&lt;td&gt;Dashboard&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bright Data (proxy infra)&lt;/td&gt;
&lt;td&gt;$4,500–6,000+ ⁽²⁾&lt;/td&gt;
&lt;td&gt;High (your scrapers)&lt;/td&gt;
&lt;td&gt;Manual&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TinyFish Browser API (PAYG)&lt;/td&gt;
&lt;td&gt;~$5,625 ⁽³⁾&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Built-in&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TinyFish Browser API (Pro)&lt;/td&gt;
&lt;td&gt;~$4,452 ⁽³⁾&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Built-in&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;⁽¹⁾ Scrapy estimate: ~$200–500/month compute (industry estimate, no official source; based on 3–5 VPS instances + job queue) + ~$1,800/month residential proxy for ~30% protected pages (15,000 pages/day × 500KB × 30 days = 225GB × $8/GB). Compute only would be $200–500/month — proxy is the larger cost at this volume.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;⁽²⁾ Bright Data: residential proxy at $8/GB PAYG (source: brightdata.com, April 2026). 750GB/month for a mixed site set × $8 = $6,000/month.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;⁽³⁾ TinyFish: based on tinyfish.ai/pricing (April 2026) + assumed 10 sec/page (minimum 1 min billing = 0.25 credits/page). Actual costs vary with page load time. See calculation detail in the section above.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The TinyFish number looks higher than Scrapy until you add engineering time. At $150/hour for a developer, 20 hours/month of maintenance is $3,000 — not in the tool budget, but real cost.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Volume Tier 4: 100,000+ Pages Per Day
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What this looks like
&lt;/h3&gt;

&lt;p&gt;Enterprise-scale data operations. Google-scale inventory aggregation. A rideshare company collecting millions of pricing variables monthly. Financial services firms monitoring hundreds of regulatory portals in real time. This is not a side project.&lt;/p&gt;

&lt;h3&gt;
  
  
  What works
&lt;/h3&gt;

&lt;p&gt;At this volume, you're buying infrastructure, not tools. The question is whether you build it or buy it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Build:&lt;/strong&gt; A custom distributed scraping stack — Scrapy or custom crawlers running on Kubernetes, Bright Data or a private proxy pool for IP management, a data pipeline for cleaning and delivery. Engineering cost to build: 3–6 months of a senior engineer's time. Ongoing maintenance: a dedicated team. Justified for organizations with highly specific data requirements, existing data engineering capacity, and volume that makes the economics work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Buy:&lt;/strong&gt; TinyFish's enterprise tier is designed for this. At this tier, the economics shift from per-page cost to total infrastructure cost — the platform is running production workflows at this scale across multiple enterprise customers. The value proposition at this tier isn't the per-page cost — it's that you're buying a system that's already been hardened at that scale, with the reliability and compliance requirements enterprise operations need. Custom pricing at this tier; contact sales for specifics.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What to watch for:&lt;/strong&gt; At 100,000+ pages/day, the decision isn't really between tools — it's between building and buying. Both have merit depending on your engineering resources and how central web data collection is to your product. The right question isn't "which tool is cheapest per page?" It's "how much of our engineering capacity do we want this to consume?"&lt;/p&gt;




&lt;h2&gt;
  
  
  The Full Picture: Volume × Site Complexity
&lt;/h2&gt;

&lt;p&gt;Volume alone doesn't determine your tool. Site complexity — how much infrastructure the target requires — is the other axis. This matrix combines both:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Static / simple pages&lt;/th&gt;
&lt;th&gt;JS-heavy, requires managed browser&lt;/th&gt;
&lt;th&gt;Authenticated access (your own accounts)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;&amp;lt; 1K pages/day&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Free tools (ParseHub, Octoparse)&lt;/td&gt;
&lt;td&gt;TinyFish free tier&lt;/td&gt;
&lt;td&gt;TinyFish free tier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;1K–10K pages/day&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Scrapy (self-hosted) or Apify&lt;/td&gt;
&lt;td&gt;Apify or TinyFish Starter&lt;/td&gt;
&lt;td&gt;TinyFish Starter/Pro&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;10K–100K pages/day&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Scrapy + infra, Apify, or TinyFish Pro&lt;/td&gt;
&lt;td&gt;Apify or TinyFish Pro&lt;/td&gt;
&lt;td&gt;TinyFish Pro&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;100K+ pages/day&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Custom stack or TinyFish Enterprise&lt;/td&gt;
&lt;td&gt;TinyFish Enterprise&lt;/td&gt;
&lt;td&gt;TinyFish Enterprise&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The pattern: at low volume on simple sites, almost anything works and the cheapest option wins. As volume or site complexity increases, the tools that don't require ongoing maintenance become progressively more cost-effective when you count engineering time.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4pifx7uvr3ifh62oqm9m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4pifx7uvr3ifh62oqm9m.png" alt="Matrix showing web automation tool recommendations by page volume and bot protection level"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Cost Calculation Most Teams Get Wrong
&lt;/h2&gt;

&lt;p&gt;Every tool comparison in this category focuses on subscription price. The number that actually determines total cost is:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Total cost = tool subscription + proxy costs + (engineering hours × hourly rate)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Scrapy is free. But if a developer spends 15 hours/month keeping selectors current, that's $2,250/month at $150/hour — more expensive than any managed tool at comparable volume. The teams that make this mistake are the ones who calculate tool cost from the pricing page and engineering time from zero.&lt;/p&gt;

&lt;p&gt;The inversion point — where managed infrastructure becomes cheaper than self-hosted — happens somewhere between 5,000 and 20,000 pages/day for most teams, depending on target site complexity and how often sites update their frontend.&lt;/p&gt;




&lt;h2&gt;
  
  
  How to Estimate Your Starting Point
&lt;/h2&gt;

&lt;p&gt;If you're not sure where your project falls, start with the TinyFish free tier (500 credits, no credit card). Run it against your actual target site. The results tell you three things at once: whether AI-based extraction handles your target's structure, what your step-per-page ratio looks like for cost projection, and whether there's strict access requirements you didn't know about.&lt;/p&gt;

&lt;p&gt;That's a better calibration than any estimate you can make from a pricing page.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://tinyfish.ai/signup?utm_source=official_blog_TT&amp;amp;utm_medium=blog&amp;amp;utm_campaign=web-automation-tool-by-volume&amp;amp;utm_content=end_cta_volume" rel="noopener noreferrer"&gt;Start free — 500 credits, no credit card required&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;How much does web scraping cost?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It depends on volume and tool choice, but the honest answer is that the subscription price is rarely the whole number. At under 1,000 pages/day, free tiers from ParseHub, Octoparse, and TinyFish cover most use cases at zero cost. At 5,000 pages/day, expect $15–100/month depending on whether you need strict access handling. At 50,000 pages/day, &lt;em&gt;total cost including infrastructure and proxy fees typically runs $2,000–5,600/month&lt;/em&gt; depending on tool and proxy requirements — and if you're on a self-hosted setup, add engineering maintenance time on top of that. The full formula is: tool subscription + proxy costs + (engineering hours × hourly rate). Teams that only look at the subscription line consistently underestimate real cost by 2–3x.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What counts as a "page" for automation tool pricing?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It depends on the tool. For Scrapy and most traditional scrapers, a page is one HTTP request. For AI web agents like TinyFish, the unit is a "step" — a discrete action (navigate, click, extract). A single page extraction might require 2–5 steps; a multi-step authenticated workflow might require 10–15. Always ask vendors for step-to-page ratios for your specific use case before committing to a plan.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is Scrapy actually free at high volume?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The software is open source, but the infrastructure isn't free. At 50,000 pages/day you need distributed computing, job queues, monitoring, and proxy pools. A realistic total infrastructure cost is $400–800/month, plus ongoing engineering time. Scrapy is the most cost-efficient option when you have the engineering capacity to run it — it's not free, it's a trade of money for engineering time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What happens if I exceed my plan's page or step limit?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Most managed tools handle this differently. Apify charges compute unit overages at the pay-as-you-go rate. &lt;em&gt;TinyFish offers pay-as-you-go at $0.015/credit as an alternative to the monthly plan (Pro plan overages bill at $0.012/credit). Note that 1 credit covers 1 agent step, 4 minutes of browser session, or 15 page fetches — the effective per-page cost depends on which API you use.&lt;/em&gt; Scrapy has no limit — your infrastructure is the ceiling. Plan for overages before you hit them; discovering them during a critical run is a bad time to learn the policy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do I know if my volume estimate is accurate?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It usually isn't, in the direction of underestimation. The most common mistake: counting target URLs but not accounting for crawl frequency, or not including the pages you need to navigate through to reach the data (pagination, category pages, authentication flows). Add 30–50% to your estimate before selecting a plan tier.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Related reading:&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://tinyfish.ai/blog/the-best-web-scraping-tools-in-2026?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;The Best Web Scraping Tools in 2026 — Ranked and Reviewed →&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;📸 IMAGE — Matrix showing web automation tool recommendations by page volume and access requirement level&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Try TinyFish Free
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;500 free steps, no credit card.&lt;/strong&gt; The fastest way to test whether TinyFish fits your workflow.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://agent.tinyfish.ai/sign-up?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;Start free →&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Want to scrape the web without getting blocked? Try &lt;a href="https://tinyfish.ai?utm_source=devto" rel="noopener noreferrer"&gt;TinyFish&lt;/a&gt; — a browser API built for AI agents and developers.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>webautomation</category>
      <category>toolselection</category>
      <category>pagevolume</category>
      <category>costanalysis</category>
    </item>
    <item>
      <title>Headless Browser vs AI Agents for Web Automation: How to Choose</title>
      <dc:creator>Tinyfishie</dc:creator>
      <pubDate>Tue, 19 May 2026 08:58:40 +0000</pubDate>
      <link>https://dev.to/tinyfishie/headless-browser-vs-ai-agents-for-web-automation-how-to-choose-111d</link>
      <guid>https://dev.to/tinyfishie/headless-browser-vs-ai-agents-for-web-automation-how-to-choose-111d</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;**CHANGES IN THIS REVISION — **Docs link anchor text corrected ("docs.tinyfish.ai/web-agent" → "docs.tinyfish.ai").&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;Your Playwright script runs. The selector finds the element. The data comes back clean. Then the site ships a new layout, a cookie banner appears in a language your script doesn't expect, and a modal blocks the next step. You fix the script. The site ships another update. You fix the script again.&lt;/p&gt;

&lt;p&gt;At some point the maintenance math stops working. That's when developers start asking about AI agents.&lt;/p&gt;

&lt;p&gt;This article is for developers who already use Playwright, Puppeteer, or Selenium and want an honest answer to whether AI agents change their automation stack — and when.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is a Headless Browser?
&lt;/h2&gt;

&lt;p&gt;You've used it. Playwright, Puppeteer, Selenium.&lt;/p&gt;

&lt;p&gt;A headless browser is a browser engine without a graphical interface, controlled by code. It loads pages, executes JavaScript, renders content, and follows the instructions your script provides. It's a powerful, precise tool — and that precision is both its strength and its constraint.&lt;/p&gt;

&lt;p&gt;A headless browser does exactly what your script tells it to do. No more, no less. If your script says "click the button with ID &lt;code&gt;submit-btn&lt;/code&gt;," it clicks that button. If the button is now named &lt;code&gt;confirm-btn&lt;/code&gt;, the script fails.&lt;/p&gt;

&lt;p&gt;That determinism is valuable in the right context. It's also the root of the maintenance problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is an AI Web Agent?
&lt;/h2&gt;

&lt;p&gt;The key distinction isn't capability — it's direction.&lt;/p&gt;

&lt;p&gt;A headless browser is &lt;strong&gt;instruction-following&lt;/strong&gt;: you write the exact steps, the browser executes them. An AI web agent is &lt;strong&gt;goal-directed&lt;/strong&gt;: you describe the outcome you want, the agent works out the steps itself.&lt;/p&gt;

&lt;p&gt;With Playwright: &lt;code&gt;page.click('#submit-btn')&lt;/code&gt; — you specify the action.&lt;/p&gt;

&lt;p&gt;With a web agent: &lt;code&gt;"Submit the form and return the confirmation number"&lt;/code&gt; — you specify the goal.&lt;/p&gt;

&lt;p&gt;TinyFish's Web Agent takes a URL and a plain-language goal, navigates the page, makes decisions about what to do next, handles unexpected states, and returns a structured result. The agent handles the selector logic. You handle the goal definition.&lt;/p&gt;

&lt;p&gt;This comes with a real trade-off: agents are less predictable than scripts. If you need pixel-perfect determinism — test assertions against specific DOM states, for example — an agent introduces variability a script doesn't. For workflows where the goal matters more than the exact path, that trade-off favors agents.&lt;/p&gt;

&lt;p&gt;For a deeper introduction to what web agents can do in practice, see &lt;a href="https://tinyfish.ai/blog/what-is-an-ai-browser?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;What Is a Web Agent? The Complete Guide to AI Browser Agents in 2026&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Honest Comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Playwright / Puppeteer&lt;/th&gt;
&lt;th&gt;TinyFish Web Agent&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;How you control it&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Write exact steps (click, fill, navigate)&lt;/td&gt;
&lt;td&gt;Describe the goal in plain language&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Setup complexity&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Library install, you manage scripts&lt;/td&gt;
&lt;td&gt;API call with goal string&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Maintenance over time&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;High — selectors break when sites change&lt;/td&gt;
&lt;td&gt;Lower — agent adapts to layout changes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Handles unexpected states&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No — script fails on unexpected conditions&lt;/td&gt;
&lt;td&gt;Yes — agent reasons about what to do next&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Infrastructure required&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;You manage browser servers, proxies&lt;/td&gt;
&lt;td&gt;Managed by TinyFish&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Determinism&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;High — predictable, exact&lt;/td&gt;
&lt;td&gt;Lower — path may vary between runs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cost at low volume&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Library free; infra + proxy + maintenance extra&lt;/td&gt;
&lt;td&gt;$0.015/credit PAYG — no infra overhead&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cost at high volume&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Server fleet + proxy subscriptions + ongoing maintenance = significant ops cost&lt;/td&gt;
&lt;td&gt;$0.012/credit (Pro), all infra included&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Best for&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;UI testing, known stable sites, local dev&lt;/td&gt;
&lt;td&gt;Dynamic sites, goal-based tasks, production scale&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Where Playwright wins: control, predictability, local development, low-volume stable sites.&lt;/p&gt;

&lt;p&gt;Where TinyFish wins: managed infrastructure, goal flexibility, sites with frequent layout changes.&lt;/p&gt;

&lt;p&gt;If it's a draw on a row, it's a draw.&lt;/p&gt;

&lt;h2&gt;
  
  
  When a Headless Browser Is the Right Choice
&lt;/h2&gt;

&lt;p&gt;Playwright isn't going anywhere. For the majority of browser automation tasks, it's still the right answer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;UI and functional testing.&lt;/strong&gt; Playwright's native test runner, assertion API, and trace viewer are purpose-built for this. You need deterministic behavior, specific element state assertions, and reproducible test runs. Agents aren't designed for this job.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scraping a known site with a stable layout.&lt;/strong&gt; Documentation sites, static product catalogs, public APIs that require browser rendering — if the structure doesn't change, a Playwright script is cheaper and more predictable than an agent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Local development and prototyping.&lt;/strong&gt; Working locally with Playwright gives you full visibility into browser state, easy step-through debugging, and zero API latency. Starting with Playwright and migrating specific workflows to agents is a valid progression.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When you need complete control.&lt;/strong&gt; Custom browser configurations, low-level network interception, cookie manipulation, specific extension behavior — Playwright's architecture gives you direct access to things an agent abstracts away.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost-sensitive early stage.&lt;/strong&gt; At very low volume on cooperative, stable sites, Playwright's library cost is lower on paper. Once you add browser server infrastructure, proxy costs, and maintenance time, TinyFish's per-step pricing is typically comparable — and often cheaper than the full stack.&lt;/p&gt;

&lt;p&gt;If your use case is on this list, you don't need a managed platform. Use Playwright.&lt;/p&gt;

&lt;h2&gt;
  
  
  When an AI Agent Adds Value
&lt;/h2&gt;

&lt;p&gt;Here's the pattern that pushes teams toward agents: the script breaks. Not because of a bug — because the site changed.&lt;/p&gt;

&lt;p&gt;A cookie banner appeared in Dutch. A login modal on the test site loaded differently on mobile. A dynamic popup blocked the next click. A form added a new required field. A confirmation step now requires checking a box that wasn't there before.&lt;/p&gt;

&lt;p&gt;Each of these is a 20-minute fix. Multiplied across dozens of target sites, updated continuously, it becomes a full-time maintenance load that grows with the number of sites you're monitoring.&lt;/p&gt;

&lt;p&gt;AI agents absorb this maintenance overhead. The goal stays the same; the agent adapts to the changed path.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scenarios where agents add clear value:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Workflows with conditional decisions mid-flow.&lt;/strong&gt; "Navigate to the product, check if it's in stock, and if so, extract the price and the lead time" — a fixed script needs to handle all possible states explicitly. An agent handles them implicitly.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Sites that update layouts frequently.&lt;/strong&gt; E-commerce, news, job boards, real estate listings — any site where the structure changes often.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Running at production scale without owning browser infrastructure.&lt;/strong&gt; Managing Playwright at scale means managing servers, proxies, and session handling. TinyFish removes that layer.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Tasks where defining the exact selector sequence is the bottleneck.&lt;/strong&gt; If writing the script takes longer than the task itself, agents change the economics.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For specifics on where Playwright scripts degrade in production environments, see &lt;a href="https://tinyfish.ai/blog/scrape-dynamic-websites-without-playwright?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;Scraping Dynamic Websites: When Playwright Breaks&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://tinyfish.ai/signup?utm_source=official_blog_TT&amp;amp;utm_medium=blog&amp;amp;utm_campaign=headless-browser-vs-ai-agents&amp;amp;utm_content=inline_pain_solution_headless_vs_agent" rel="noopener noreferrer"&gt;Test TinyFish against your hardest automation tasks — 500 free credits, no credit card&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Spectrum: Library → Cloud Browser → AI Agent
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;📸 IMAGE — Spectrum diagram from Playwright library to TinyFish Browser API to TinyFish Web Agent showing control vs maintenance trade-off&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This isn't a binary choice. There's a spectrum — and TinyFish spans the right half of it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Library (Playwright/Puppeteer):&lt;/strong&gt; You write the script, you manage the infrastructure. Full control. Full maintenance responsibility.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cloud browser (TinyFish Browser API):&lt;/strong&gt; Same CDP interface, managed infrastructure. Your existing Playwright scripts connect with one line change. Browser servers, proxy routing, and reliability handling are managed. You keep the code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI Agent (TinyFish Web Agent):&lt;/strong&gt; Submit a goal, get a structured result. The agent handles navigation, decision-making, extraction. You define what you want, not how to get it.&lt;/p&gt;

&lt;p&gt;TinyFish spans the cloud browser and agent layers with one API key and one billing model. Use it as a CDP-compatible cloud browser — your existing Playwright code runs unchanged, you swap the launch endpoint. Or use it as a Web Agent — submit a goal, get a result. The choice is per-task, not per-account.&lt;/p&gt;

&lt;p&gt;You're not choosing between Playwright and TinyFish. You're choosing where on this spectrum your specific task sits — and whether the infrastructure overhead of running it yourself is worth it.&lt;/p&gt;

&lt;p&gt;The Web Agent API documentation is at &lt;a href="https://docs.tinyfish.ai?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;docs.tinyfish.ai&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;For a view of how teams move from Selenium to agents over time, see &lt;a href="https://tinyfish.ai/blog/from-selenium-to-ai-agents-migration?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;From Selenium to AI Agents: A Migration Guide for Web Automation Teams&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.tinyfish.ai?utm_source=official_blog_TT&amp;amp;utm_medium=blog&amp;amp;utm_campaign=headless-browser-vs-ai-agents&amp;amp;utm_content=inline_docs_bridge_headless_vs_agent" rel="noopener noreferrer"&gt;Read the Web Agent docs — goal syntax, response handling, concurrent runs&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;a href="https://tinyfish.ai/signup?utm_source=official_blog_TT&amp;amp;utm_medium=blog&amp;amp;utm_campaign=headless-browser-vs-ai-agents&amp;amp;utm_content=end_cta_headless_vs_agent" rel="noopener noreferrer"&gt;Try TinyFish free — 500 credits, no credit card required&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc7b9j4zgfyew0pngxk88.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc7b9j4zgfyew0pngxk88.png" alt="Spectrum diagram from Playwright library to TinyFish Browser API to TinyFish Web Agent showing control vs maintenance trade-off" width="800" height="196"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Can I use my existing Playwright code with TinyFish?
&lt;/h3&gt;

&lt;p&gt;Yes. TinyFish's Browser API exposes a CDP (Chrome DevTools Protocol) endpoint. Playwright's &lt;code&gt;connect_over_cdp()&lt;/code&gt; method connects to any CDP endpoint. Create a TinyFish browser session, then pass the returned &lt;code&gt;cdp_url&lt;/code&gt; to &lt;code&gt;playwright.chromium.connect_over_cdp(session.cdp_url)&lt;/code&gt;. Everything after that line — selectors, interactions, extraction logic — runs unchanged. You get managed infrastructure without rewriting your automation code.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's the difference between the TinyFish Browser API and Web Agent?
&lt;/h3&gt;

&lt;p&gt;The Browser API gives you a CDP-accessible managed browser — you write the Playwright or Puppeteer code that controls it. The Web Agent takes a goal in plain language and works out the steps itself, returning a structured result. Both use the same API key and billing pool. Browser API is for teams with existing Playwright code who want managed infrastructure. Web Agent is for goal-based tasks where writing the exact step sequence is the bottleneck.&lt;/p&gt;

&lt;h3&gt;
  
  
  Are AI agents more reliable than Playwright for web scraping?
&lt;/h3&gt;

&lt;p&gt;AI agents are more reliable than Playwright scripts when sites change layouts frequently, because they navigate by page semantics rather than fixed selectors. If scripts fail because sites change layouts frequently, agents are more resilient — they adapt to changed paths. If scripts fail because of infrastructure issues (IP reputation, session handling, scale), moving to managed infrastructure helps regardless of whether you use scripts or agents. If your scripts work reliably, agents don't add reliability — they add flexibility for goal-based tasks at the cost of some determinism.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does TinyFish's Web Agent handle sites that require authentication?
&lt;/h3&gt;

&lt;p&gt;The agent handles multi-step workflows including authenticated sessions. For your own authenticated accounts, you describe the goal ("log in with these credentials and extract the invoice list from my account") and the agent manages the navigation.  For authenticated workflows on your own accounts, use &lt;code&gt;use_vault: true&lt;/code&gt; with stored credential items rather than passing credentials in the goal string.&lt;/p&gt;

&lt;h3&gt;
  
  
  What tasks should never use an AI agent?
&lt;/h3&gt;

&lt;p&gt;UI and functional testing — you need deterministic assertions against specific DOM states, and agent variability breaks this requirement. Low-volume automation against stable, cooperative sites — the economics don't favor agents. Any workflow where you need byte-for-byte reproducibility across runs. And tasks where you need to debug exact failure points in a selector sequence — agents abstract that information away.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try TinyFish Free
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;500 free steps, no credit card.&lt;/strong&gt; The fastest way to test whether TinyFish fits your workflow.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://agent.tinyfish.ai/sign-up?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;Start free →&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Related Reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Pillar:&lt;/strong&gt; &lt;a href="https://tinyfish.ai/blog/the-best-web-scraping-tools-in-2026?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;The Best Web Scraping Tools in 2026&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://tinyfish.ai/blog/tinyfish-vs-playwright?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;TinyFish vs Playwright: When to Use Each for Web Automation&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://tinyfish.ai/blog/from-selenium-to-ai-agents-migration?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;From Selenium to AI Agents: A Migration Guide for Web Automation Teams&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://tinyfish.ai/blog/scrape-dynamic-websites-without-playwright?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;Scraping Dynamic Websites: When Playwright Breaks&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Want to scrape the web without getting blocked? Try &lt;a href="https://tinyfish.ai?utm_source=devto" rel="noopener noreferrer"&gt;TinyFish&lt;/a&gt; — a browser API built for AI agents and developers.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>headlessbrowser</category>
      <category>aiagents</category>
      <category>playwright</category>
      <category>webautomation</category>
    </item>
    <item>
      <title>Web Data Extraction: From Static Pages to AI Agents</title>
      <dc:creator>Tinyfishie</dc:creator>
      <pubDate>Tue, 19 May 2026 08:58:10 +0000</pubDate>
      <link>https://dev.to/tinyfishie/web-data-extraction-from-static-pages-to-ai-agents-2g6k</link>
      <guid>https://dev.to/tinyfishie/web-data-extraction-from-static-pages-to-ai-agents-2g6k</guid>
      <description>&lt;p&gt;You need data from a website in a usable format: JSON, CSV, a clean table. The site isn't an API. It's not static HTML. It might render content via JavaScript, require authentication, or return different results to automation than to a browser.&lt;/p&gt;

&lt;p&gt;Which tool you reach for depends entirely on which of those is true.&lt;/p&gt;

&lt;p&gt;This guide covers all four tiers of website complexity with working code and honest tool recommendations. Tiers 1 and 2 don't need TinyFish — the right answer for many sites is still &lt;code&gt;requests&lt;/code&gt; and &lt;code&gt;BeautifulSoup&lt;/code&gt;. The guide says so.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note: data extraction should respect each site's terms of service and robots.txt. The tools and techniques here are for legitimate use cases: price monitoring, market research, internal data pipelines, and publicly available information.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What Counts as "Structured Data Extraction"?
&lt;/h2&gt;

&lt;p&gt;Dumping raw HTML isn't extraction. Structured data extraction means getting machine-readable output — JSON, CSV, a clean table — where the fields map to the information you actually need: price, title, availability, review count.&lt;/p&gt;

&lt;p&gt;The right approach depends on one question: &lt;strong&gt;what kind of site is this?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The answer falls into four tiers, each requiring a different tool.&lt;/p&gt;

&lt;h2&gt;
  
  
  Four Tiers of Website Complexity
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tier&lt;/th&gt;
&lt;th&gt;Site type&lt;/th&gt;
&lt;th&gt;What makes it hard&lt;/th&gt;
&lt;th&gt;Right tool&lt;/th&gt;
&lt;th&gt;Code complexity&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Has an API or RSS feed&lt;/td&gt;
&lt;td&gt;Nothing — use the API&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;requests&lt;/code&gt; + JSON&lt;/td&gt;
&lt;td&gt;Minimal&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;JS-rendered, no strict requirements&lt;/td&gt;
&lt;td&gt;Content loads after JS runs&lt;/td&gt;
&lt;td&gt;Playwright or TinyFish Fetch&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Strict automation requirements at scale&lt;/td&gt;
&lt;td&gt;Infrastructure complexity at volume&lt;/td&gt;
&lt;td&gt;TinyFish Fetch API&lt;/td&gt;
&lt;td&gt;Low (API call)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Authenticated or multi-step workflow&lt;/td&gt;
&lt;td&gt;Session state + conditional decisions&lt;/td&gt;
&lt;td&gt;TinyFish Web Agent&lt;/td&gt;
&lt;td&gt;Low (goal string)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Tiers 1 and 2 don't require a managed platform. If you're on Tier 1 or 2 at low volume, use the simpler tool.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8th9t70dgbhv2y75ijeq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8th9t70dgbhv2y75ijeq.png" alt="Four tiers of website complexity from static HTML pages with APIs to authenticated multi-step workflows requiring AI agents"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Tier 1 — Sites with APIs or RSS Feeds
&lt;/h2&gt;

&lt;p&gt;Before writing a scraper, check for an API.&lt;/p&gt;

&lt;p&gt;Many sites that appear to require scraping have JSON endpoints. Browser devtools → Network tab → filter for &lt;code&gt;application/json&lt;/code&gt; requests. Product listing pages often load data from a &lt;code&gt;/api/products&lt;/code&gt; or &lt;code&gt;/v1/listings&lt;/code&gt; endpoint your scraper can call directly — faster, cleaner, and more stable than HTML parsing.&lt;/p&gt;

&lt;p&gt;For sites with RSS or Atom feeds (blogs, news, job boards), the feed is already structured data.&lt;/p&gt;

&lt;p&gt;For sites where the HTML itself is static and well-structured, &lt;code&gt;requests&lt;/code&gt; + &lt;code&gt;BeautifulSoup&lt;/code&gt; is the right tool. It's fast, it's free, and there's nothing to manage.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;bs4&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BeautifulSoup&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;extract_products&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;User-Agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Mozilla/5.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;raise_for_status&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;soup&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BeautifulSoup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;html.parser&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;soup&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.product-card&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;products&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select_one&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.product-name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;price&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select_one&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.product-price&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select_one&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;href&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;

&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;extract_products&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://example-shop.com/products&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This handles most documentation sites, static product catalogs, blog feeds, and public data sources. If it works, stop here.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where it breaks down:&lt;/strong&gt; Any site that loads content via JavaScript after the initial HTML response. You'll get the page shell with empty containers, not the data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tier 2 — JavaScript-Rendered Pages
&lt;/h2&gt;

&lt;p&gt;Single-page apps, infinite scroll, dynamic tables, price widgets that load after the initial render — these require a real browser. The HTML you get from &lt;code&gt;requests&lt;/code&gt; is what the server sends before JavaScript runs; the content you need loads after.&lt;/p&gt;

&lt;p&gt;Two options, both valid:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Option A: Playwright (local, controlled environments)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;playwright.async_api&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;async_playwright&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;extract_js_page&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;async_playwright&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;browser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;browser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chromium&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;launch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;headless&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new_page&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;goto&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;wait_until&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;networkidle&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wait_for_selector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.product-list&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;inner_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.product-list&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error loading &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
        &lt;span class="k"&gt;finally&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;extract_js_page&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://example-spa.com/listings&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Playwright is the right choice for local development, CI pipelines, and controlled environments. You manage the browser process.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Option B: TinyFish Browser API (managed infrastructure, same Playwright code)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For the same JS-rendered page, connect to TinyFish's managed browser instead of launching one locally. Your selectors and extraction logic stay identical:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tinyfish&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TinyFish&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;playwright.async_api&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;async_playwright&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;extract_js_page_managed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;async_playwright&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;browser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TinyFish&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sessions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="n"&gt;browser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chromium&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect_over_cdp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cdp_url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new_page&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;goto&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;wait_until&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;networkidle&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wait_for_selector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.product-list&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;inner_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.product-list&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error loading &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
        &lt;span class="k"&gt;finally&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The automation logic is unchanged. The infrastructure — browser servers, proxy routing, reliability handling — is managed by TinyFish. Browser cold starts under 250ms (source: tinyfish.ai).&lt;/p&gt;

&lt;p&gt;For local development and direct browser control, Playwright. For production extraction without running browser infrastructure, TinyFish Browser API.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tier 3 — Sites with Strict Automation Requirements
&lt;/h2&gt;

&lt;p&gt;The challenge at this tier isn't writing the code — it's owning the infrastructure that makes the code reliable at scale.&lt;/p&gt;

&lt;p&gt;Browser servers, proxy routing, session handling, failure recovery: these aren't one-time setup tasks. They're an ongoing operational surface that grows with your target list. TinyFish handles this infrastructure layer so your Python code stays focused on extraction logic — not ops. The same API call works whether you're running 10 requests or 10,000.&lt;/p&gt;

&lt;p&gt;TinyFish's Browser API handles the infrastructure layer. You use the same Playwright selectors; TinyFish manages browser servers, proxy routing, and reliability underneath.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tinyfish&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TinyFish&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;playwright.async_api&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;async_playwright&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;extract_with_managed_browser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Extract page content using TinyFish managed browser infrastructure.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;async_playwright&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;browser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TinyFish&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sessions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="n"&gt;browser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chromium&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect_over_cdp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cdp_url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new_page&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;goto&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;wait_until&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;networkidle&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;30000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;content&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error loading &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
        &lt;span class="k"&gt;finally&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;TinyFish achieves up to 85% success rate on sites with strict automation requirements (standard commercial sites; internal testing; source: tinyfish.ai) and a browser cold start under 250ms.&lt;/em&gt; For sites where these infrastructure concerns are the reliability bottleneck, the Fetch API removes that layer from your codebase.&lt;/p&gt;

&lt;p&gt;For running this type of extraction at scale across many URLs simultaneously, see &lt;a href="https://tinyfish.ai/blog/how-to-monitor-1000-websites-in-parallel-tinyfish-api?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;How to Monitor 1,000 Websites in Parallel with the TinyFish API&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://tinyfish.ai/signup?utm_source=official_blog_TT&amp;amp;utm_medium=blog&amp;amp;utm_campaign=how-to-extract-structured-data&amp;amp;utm_content=inline_pain_solution_extract_data" rel="noopener noreferrer"&gt;Try TinyFish Fetch — 500 free credits, no credit card required&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Tier 4 — Authenticated or Multi-Step Workflows
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Some extraction targets involve authenticated workflows on your own accounts&lt;/em&gt; — they require:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Logging in before data is accessible&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Navigation through multiple pages with conditional logic&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Filling a search form and extracting results from the response&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Handling state that changes between steps&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Writing and maintaining the exact navigation sequence for these workflows in Playwright is high-maintenance code. Every change to the login flow or form structure breaks the script.&lt;/p&gt;

&lt;p&gt;For goal-directed workflows, TinyFish's Web Agent takes a plain-language description of what you want and handles the navigation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_extraction_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;goal&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Run a goal-directed extraction agent.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://agent.tinyfish.ai/v1/automation/run&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;X-API-Key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TINYFISH_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]},&lt;/span&gt;
        &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;goal&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;goal&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;120&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="c1"&gt;# "COMPLETED" means the run finished — not necessarily that the goal succeeded
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;COMPLETED&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Describe what you want — the agent handles the navigation
&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;run_extraction_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://supplier-portal.com/pricing&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;goal&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Navigate to the pricing page, extract all product SKUs and their current prices. Return as JSON with fields: sku, name, price, currency.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent handles authentication, conditional navigation, and session state. You handle the goal definition.&lt;/p&gt;

&lt;p&gt;For complex workflows and authenticated access patterns, use TinyFish's credential vault (&lt;code&gt;use_vault: true&lt;/code&gt;) rather than passing credentials in the goal string.&lt;/p&gt;

&lt;p&gt;For more on goal-directed automation for complex extraction workflows, see &lt;a href="https://tinyfish.ai/blog/from-selenium-to-ai-agents-migration?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;From Selenium to AI Agents: A Migration Guide&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Choosing the Right Approach
&lt;/h2&gt;

&lt;p&gt;Decision flowchart:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Does the site have a documented API or RSS feed?
  → Yes: Use the API directly. Don't scrape what you can query.
  → No: Continue

Does the page content load via JavaScript?
  → No: requests + BeautifulSoup
  → Yes: Continue

Do you need managed infrastructure (scale, reliability, no server management)?
  → No: Playwright (local browser)
  → Yes: TinyFish Fetch API

Does the workflow require authentication, multi-step navigation, or conditional decisions?
  → No: TinyFish Fetch API
  → Yes: TinyFish Web Agent
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Start with the simplest tool that works. When that tool's limits appear — JavaScript rendering, infrastructure overhead at scale, multi-step workflows — the next tier is ready. Tiers 3 and 4 aren't edge cases; they're where production data pipelines typically land.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://tinyfish.ai/signup?utm_source=official_blog_TT&amp;amp;utm_medium=blog&amp;amp;utm_campaign=how-to-extract-structured-data&amp;amp;utm_content=end_cta_extract_data" rel="noopener noreferrer"&gt;Try TinyFish free — 500 credits, no credit card required&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What's the best Python library for extracting structured data from websites?
&lt;/h3&gt;

&lt;p&gt;The right Python library depends on whether your target uses static HTML or JavaScript rendering. For static HTML, &lt;code&gt;requests&lt;/code&gt; + &lt;code&gt;BeautifulSoup&lt;/code&gt; or &lt;code&gt;lxml&lt;/code&gt; is the fastest and simplest. For JavaScript-rendered pages, Playwright gives you full browser control. For managed infrastructure at scale, TinyFish's Fetch API removes the server maintenance. There's no single best library — the right choice is determined by the site type, volume, and whether you want to manage browser infrastructure yourself.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I extract JSON data that a website loads dynamically?
&lt;/h3&gt;

&lt;p&gt;Check the Network tab in browser devtools first. Filter for &lt;code&gt;application/json&lt;/code&gt; or &lt;code&gt;fetch&lt;/code&gt; requests. Many sites that appear to require scraping actually load their data from an API endpoint you can call directly with &lt;code&gt;requests&lt;/code&gt;. If the data isn't available via a direct API call, you'll need a real browser (Playwright or TinyFish Fetch with &lt;code&gt;browser: true&lt;/code&gt;) to wait for the JavaScript to execute and the data to load.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's the difference between web scraping and structured data extraction?
&lt;/h3&gt;

&lt;p&gt;Web scraping typically refers to the process of downloading web pages. Structured data extraction is what you do with those pages: parsing them to get machine-readable output in a defined schema. You can scrape without extracting (just downloading HTML) and extract without scraping (calling an API). For practical purposes, the distinction matters because it determines your tool choice: if you need raw page content, a fetcher is enough; if you need specific fields in a specific format, you also need a parsing step.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I handle pagination when extracting data?
&lt;/h3&gt;

&lt;p&gt;For static pagination (page 1, page 2, etc.), loop over page URLs and aggregate results. For infinite scroll or dynamic pagination, you need a real browser — Playwright can wait for the "Load More" button and click it programmatically, or TinyFish's Web Agent can handle pagination as part of a goal-directed workflow. For deep pagination at scale (thousands of pages), consider whether the site has an API endpoint for its paginated data — it's almost always faster than page-by-page extraction.&lt;/p&gt;

&lt;h3&gt;
  
  
  When does extracting structured data become a legal concern?
&lt;/h3&gt;

&lt;p&gt;The legal landscape varies by jurisdiction, site terms of service, and the type of data. In most cases, extracting publicly available data that doesn't contain personal information is legally defensible (see hiQ v. LinkedIn in the US context). Key boundaries: always respect robots.txt, don't extract and republish data in ways that compete with the source site's business, and be careful with personal data (GDPR applies even to publicly visible information about EU individuals). When in doubt, check the site's terms of service and consult legal counsel for commercial use cases.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try TinyFish Free
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;500 free steps, no credit card.&lt;/strong&gt; The fastest way to test whether TinyFish fits your workflow.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://agent.tinyfish.ai/sign-up?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;Start free →&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Related Reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Pillar:&lt;/strong&gt; &lt;a href="https://tinyfish.ai/blog/the-best-web-scraping-tools-in-2026?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;The Best Web Scraping Tools in 2026&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://tinyfish.ai/blog/scrape-dynamic-websites-without-playwright?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;How to Scrape Dynamic Websites Without Playwright&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://tinyfish.ai/blog/how-to-crawl-list-of-urls-at-scale?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;How to Crawl a List of URLs at Scale&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://tinyfish.ai/blog/from-selenium-to-ai-agents-migration?utm_source=official_blog_TT" rel="noopener noreferrer"&gt;From Selenium to AI Agents: A Migration Guide&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Want to scrape the web without getting blocked? Try &lt;a href="https://tinyfish.ai?utm_source=devto" rel="noopener noreferrer"&gt;TinyFish&lt;/a&gt; — a browser API built for AI agents and developers.&lt;/em&gt;&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
