<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Railse Xu</title>
    <description>The latest articles on DEV Community by Railse Xu (@railse_xu_7b4bacd2da4310b).</description>
    <link>https://dev.to/railse_xu_7b4bacd2da4310b</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4003562%2F816f6af5-1cbc-4451-86b7-b600ecb699c6.jpg</url>
      <title>DEV Community: Railse Xu</title>
      <link>https://dev.to/railse_xu_7b4bacd2da4310b</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/railse_xu_7b4bacd2da4310b"/>
    <language>en</language>
    <item>
      <title>I built two web-page APIs with Playwright — screenshots/PDF + clean article extraction</title>
      <dc:creator>Railse Xu</dc:creator>
      <pubDate>Fri, 26 Jun 2026 08:04:24 +0000</pubDate>
      <link>https://dev.to/railse_xu_7b4bacd2da4310b/i-built-two-web-page-apis-with-playwright-screenshotspdf-clean-article-extraction-19hj</link>
      <guid>https://dev.to/railse_xu_7b4bacd2da4310b/i-built-two-web-page-apis-with-playwright-screenshotspdf-clean-article-extraction-19hj</guid>
      <description>&lt;p&gt;I kept hitting two annoying needs in side projects:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Turn a web page into a &lt;strong&gt;screenshot or PDF&lt;/strong&gt; (link previews, thumbnails, archiving, reports).&lt;/li&gt;
&lt;li&gt;Pull the &lt;strong&gt;clean article text&lt;/strong&gt; out of a page buried in ads and navigation (for LLMs/RAG, reader apps, content pipelines).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Existing services were either pricey or fiddly, so I built two small APIs with Playwright, put them together under &lt;strong&gt;&lt;a href="https://renderly.rest" rel="noopener noreferrer"&gt;Renderly&lt;/a&gt;&lt;/strong&gt;, and listed them on RapidAPI. Here's the build + a few gotchas.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Screenshot &amp;amp; PDF API
&lt;/h2&gt;

&lt;p&gt;Give it a URL, get a full-page screenshot (PNG/JPEG) or a PDF. The key is &lt;strong&gt;real Chromium&lt;/strong&gt;, so modern CSS, web fonts, and JS-rendered content all show up.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;browser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;pw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chromium&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;launch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--no-sandbox&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--disable-dev-shm-usage&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;viewport&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;width&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1280&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;height&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;800&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new_page&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;goto&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;wait_until&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;networkidle&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;img&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;screenshot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;full_page&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;png&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;pdf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pdf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;print_background&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The differentiator: clean output for AI
&lt;/h3&gt;

&lt;p&gt;Most cheap screenshot APIs choke on cookie banners and ads — and in 2026 a lot of screenshots are fed to &lt;strong&gt;vision models&lt;/strong&gt;, where banners waste tokens and confuse layout. So I added &lt;code&gt;block_cookie_banners&lt;/code&gt; to hide common consent banners (OneTrust, Cookiebot, Quantcast…), ads, and chat widgets. You can also &lt;code&gt;hide_selectors&lt;/code&gt; and pass &lt;code&gt;cookies&lt;/code&gt;/&lt;code&gt;headers&lt;/code&gt; to capture pages behind a login.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Article Extraction API
&lt;/h2&gt;

&lt;p&gt;Give it a URL, get the &lt;strong&gt;clean main content&lt;/strong&gt; (Markdown / text / HTML) + title, author, word count. It &lt;strong&gt;renders with Chromium first, then extracts&lt;/strong&gt; with &lt;code&gt;trafilatura&lt;/code&gt;, so JS-loaded content works. Clean Markdown can cut LLM tokens ~70% in RAG pipelines.&lt;/p&gt;

&lt;h2&gt;
  
  
  A few gotchas
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Headless cold starts are slow&lt;/strong&gt; (~30s on Fly auto-stop) → keep one machine always on.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Playwright image + pip:&lt;/strong&gt; the &lt;code&gt;playwright&lt;/code&gt; package wasn't importable; add it to requirements explicitly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SSRF:&lt;/strong&gt; reject private/loopback IPs from user-supplied URLs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Don't build billing yourself:&lt;/strong&gt; RapidAPI handles keys + billing; verify &lt;code&gt;X-RapidAPI-Proxy-Secret&lt;/code&gt; so only proxied requests get through.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;Both have a free tier:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Site: &lt;a href="https://renderly.rest" rel="noopener noreferrer"&gt;https://renderly.rest&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Screenshot &amp;amp; PDF API: &lt;a href="https://rapidapi.com/xufei547/api/screenshot-pdf-api2" rel="noopener noreferrer"&gt;https://rapidapi.com/xufei547/api/screenshot-pdf-api2&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Article Extraction API: &lt;a href="https://rapidapi.com/xufei547/api/article-extraction-api" rel="noopener noreferrer"&gt;https://rapidapi.com/xufei547/api/article-extraction-api&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Feedback welcome — residential-proxy support, mobile viewports, and dark mode are on my list.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>python</category>
      <category>api</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
