<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Daniel Ainsworth</title>
    <description>The latest articles on DEV Community by Daniel Ainsworth (@danielainsworth).</description>
    <link>https://dev.to/danielainsworth</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3954791%2F6273c262-b5af-4bf3-926e-312e4b32c362.png</url>
      <title>DEV Community: Daniel Ainsworth</title>
      <link>https://dev.to/danielainsworth</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/danielainsworth"/>
    <language>en</language>
    <item>
      <title>How I built a Bluesky scraper using the AT Protocol API (and published it on Apify)</title>
      <dc:creator>Daniel Ainsworth</dc:creator>
      <pubDate>Wed, 27 May 2026 18:37:19 +0000</pubDate>
      <link>https://dev.to/danielainsworth/how-i-built-a-bluesky-scraper-using-the-at-protocol-api-and-published-it-on-apify-5g26</link>
      <guid>https://dev.to/danielainsworth/how-i-built-a-bluesky-scraper-using-the-at-protocol-api-and-published-it-on-apify-5g26</guid>
      <description>&lt;p&gt;Bluesky hit 40 million users earlier this year, and unlike Twitter, it runs on an open protocol — the AT Protocol — where public data is genuinely public and machine-readable by design. No $5,000/month enterprise API tier. No rate limits you need a lawyer to understand. Just a clean REST API that anyone can query.&lt;/p&gt;

&lt;p&gt;I wanted to scrape it. Here's how I built a production-ready actor and what I learned along the way.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Bluesky is easy to scrape (legitimately)
&lt;/h2&gt;

&lt;p&gt;Most social media scrapers are a fight against Cloudflare, rotating proxies, and terms of service grey areas. Bluesky is different. The AT Protocol was explicitly designed for third-party clients and data access. The public API at &lt;code&gt;public.api.bsky.app&lt;/code&gt; serves unauthenticated read requests. There's no fingerprinting, no CAPTCHA, no DOM parsing.&lt;/p&gt;

&lt;p&gt;The only wrinkle: the search endpoint (&lt;code&gt;app.bsky.feed.searchPosts&lt;/code&gt;) now requires authentication via a free App Password. Everything else — author feeds, threads, profiles — works without a token.&lt;/p&gt;

&lt;h2&gt;
  
  
  The three modes I built
&lt;/h2&gt;

&lt;p&gt;I wanted one actor that covered the main B2B use cases:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Search posts&lt;/strong&gt; — keyword and hashtag search with date range, language filter, and sort order. Uses &lt;code&gt;bsky.social/xrpc/app.bsky.feed.searchPosts&lt;/code&gt; with a Bearer token.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Author feed&lt;/strong&gt; — pull all posts from one or more handles. No auth needed. Useful for competitor monitoring or auditing a creator's content history.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Thread&lt;/strong&gt; — fetch a full conversation tree from a post URL. The API returns a nested tree; I flatten it depth-first so you get a clean ordered list of posts.&lt;/p&gt;

&lt;h2&gt;
  
  
  The one gotcha: API routing
&lt;/h2&gt;

&lt;p&gt;This burned me. I was sending authenticated requests (with a JWT) to &lt;code&gt;public.api.bsky.app&lt;/code&gt;. That endpoint is Cloudflare-fronted and returns 403 if you send auth tokens to it — it's for unauthenticated traffic only.&lt;/p&gt;

&lt;p&gt;The fix: authenticated calls go to &lt;code&gt;bsky.social&lt;/code&gt;. Unauthenticated reads go to &lt;code&gt;public.api.bsky.app&lt;/code&gt;. You auth against &lt;code&gt;bsky.social&lt;/code&gt;, get a JWT, then use that JWT only on subsequent &lt;code&gt;bsky.social&lt;/code&gt; calls.&lt;/p&gt;

&lt;h2&gt;
  
  
  Monorepo deployment headache
&lt;/h2&gt;

&lt;p&gt;I'm building a portfolio of Apify actors in a TypeScript monorepo with npm workspaces. The shared library (&lt;code&gt;@apify-actors/shared&lt;/code&gt;) contains PPE charging helpers and error classes. Locally, workspace resolution handles it cleanly. On Apify's build servers, there's no monorepo — just the uploaded actor folder.&lt;/p&gt;

&lt;p&gt;The solution: copy the shared source into &lt;code&gt;src/shared/&lt;/code&gt; inside each actor and use relative imports. tsup bundles it all into a single &lt;code&gt;dist/main.js&lt;/code&gt;. The shared code stays in one canonical place in the repo; each actor gets its own copy baked in at build time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Output schema
&lt;/h2&gt;

&lt;p&gt;Every post comes back as a flat JSON record:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://bsky.app/profile/user.bsky.social/post/3lhxxxxxxxxx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Post content here"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"authorHandle"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"user.bsky.social"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"authorDisplayName"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"User Name"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"likeCount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;142&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"repostCount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"replyCount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;19&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"images"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"thumb"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"fullsize"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"alt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"externalEmbed"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"uri"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"createdAt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2025-11-15T10:30:00.000Z"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Export as JSON, CSV, or Excel directly from Apify. Plug into Zapier or Make for no-code workflows.&lt;/p&gt;

&lt;h2&gt;
  
  
  The actor is live
&lt;/h2&gt;

&lt;p&gt;If you want to use it without building anything: &lt;strong&gt;&lt;a href="https://apify.com/boundingbog/bluesky-posts" rel="noopener noreferrer"&gt;Bluesky Posts Scraper on Apify Store&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;PPE pricing: $0.25 per run + $0.003 per post ($3/1,000). No subscription.&lt;/p&gt;

&lt;p&gt;The AT Protocol makes Bluesky one of the cleanest data sources you can work with right now. If your use case involves social listening, brand monitoring, or lead gen signals from a fast-growing tech-forward audience, it's worth adding to your stack.&lt;/p&gt;

</description>
      <category>bluesky</category>
      <category>webscraping</category>
      <category>typescript</category>
      <category>apify</category>
    </item>
  </channel>
</rss>
