What you'll build: A production Python service that wraps Apify community actors behind a unified async API and charges callers $0.05 USDC per request — no account, no OAuth, no API keys required from the consumer side.
The Problem: Social Data Has No Unified Interface
You'd think in 2026 getting posts from Bluesky, articles from Substack, or stories from Hacker News would be trivial. It isn't.
Each platform has its own quirks:
- Bluesky (AT Protocol) — has an API, but pagination is cursor-based and the auth flow expects a DID (Decentralized Identifier). Extracting engagement metrics requires chasing multiple nested objects per post.
- Substack — no public API. Newsletters are behind a mix of public RSS feeds, paywalled content, and inconsistent rendering. The "just scrape it" approach breaks constantly as their markup changes.
- Hacker News — the Algolia search API is decent, but getting it to behave consistently in a structured pipeline requires managing rate limits, result pagination, and normalizing the response shape.
None of this is insurmountable individually. But if you're building something that needs data from multiple sources — a trend-detection tool, a research assistant, or an AI agent that monitors developer sentiment — you're looking at maintaining three different scrapers, three different error-handling paths, and three different data schemas.
The cleaner alternative: wrap Apify's well-maintained community actors and expose a dead-simple unified API on top. That's what this guide builds.
Prerequisites
- Python 3.11+
- An Apify account (free tier works — actors in this guide use minimal compute units)
- A Base network USDC wallet if you want to test x402 payments (optional; the service also supports API key auth)
- A Linux server with nginx (this guide uses Alpine Linux; adapt as needed)
Install the Python dependencies:
pip install fastapi uvicorn httpx python-dotenv
Set up your environment variables:
export APIFY_TOKEN="your_apify_api_token"
export API_KEY="your_fallback_api_key" # optional, for API key auth
export PAY_TO="0xYourUSDCWalletAddress" # for x402 payments
export BASE_URL="https://your-service.example.com"
export FACILITATOR_URL="https://facilitator.xpay.sh"
Enter Apify Actors: Someone Already Solved the Hard Part
Apify's actor ecosystem is genuinely underrated for this use case. Before writing a single line of custom scraping code, I found community actors that already handle the gnarly bits:
| Platform | Actor | What it does |
|---|---|---|
| Bluesky | Bluesky Scraper | Search posts by keyword, returns engagement metrics |
| Substack | Substack Scraper | Scrape articles from any publication slug |
| Hacker News | HN Search Scraper | Full-text search with score/comment counts |
The Apify API contract is beautifully uniform: POST /v2/acts/{actorId}/run-sync-get-dataset-items. You send your input JSON, wait for the run to complete, and get back a dataset. One pattern, three scrapers. This is exactly the abstraction layer I needed.
Architecture: Three Layers, One File
The entire service is ~300 lines of Python (excluding the HTML landing page). Here's the stack:
Client → nginx (TLS termination) → FastAPI (port 8001) → Apify API
↑
x402 payment verification
via facilitator.xpay.sh
Infrastructure: Alpine Linux VPS, 256MB RAM, 3GB disk. FastAPI with async httpx fits comfortably in this envelope.
Layer 1: The Apify Wrapper
Define the actor IDs and a single async function that handles every Apify call:
# app.py
import httpx
import base64
import json
from fastapi import FastAPI, Request, Header, HTTPException
from fastapi.responses import JSONResponse
APIFY_BASE = "https://api.apify.com/v2/acts"
APIFY_TOKEN = os.environ["APIFY_TOKEN"]
ACTORS = {
"bluesky": "apify/bluesky-scraper",
"substack": "apify/substack-scraper",
"hn": "apify/hacker-news-scraper",
}
app = FastAPI()
async def call_apify(actor_id: str, body: dict, settle_data: dict | None = None):
url = (
f"{APIFY_BASE}/{actor_id}/run-sync-get-dataset-items"
f"?token={APIFY_TOKEN}&timeout=300"
)
async with httpx.AsyncClient(timeout=320.0) as client:
try:
resp = await client.post(url, json=body)
resp.raise_for_status()
headers = {}
if settle_data:
encoded = base64.b64encode(json.dumps(settle_data).encode()).decode()
headers["X-PAYMENT-RESPONSE"] = encoded
return JSONResponse(
content=resp.json(),
status_code=resp.status_code,
headers=headers,
)
except httpx.HTTPStatusError as e:
return JSONResponse(
content={"error": str(e), "detail": e.response.text},
status_code=e.response.status_code,
)
Three things worth noting here:
-
run-sync-get-dataset-itemsblocks until the run finishes and returns the dataset directly. No polling, no run ID tracking, no separate dataset fetch. For synchronous API semantics this is ideal. -
timeout=300on the query string is Apify's actor timeout.httpx.AsyncClient(timeout=320.0)gives an extra 20 seconds for network overhead. -
settle_datais the x402 payment receipt — included in the response headers so the caller's wallet can confirm settlement. More on this below.
The endpoint handlers are deliberately thin:
@app.post("/api/bluesky/search")
async def bluesky_search(request: Request, x_api_key: str | None = Header(None)):
settle_data = await authenticate_request(request, x_api_key)
body = await request.json()
return await call_apify(ACTORS["bluesky"], body, settle_data)
@app.post("/api/hn/search")
async def hn_search(request: Request, x_api_key: str | None = Header(None)):
settle_data = await authenticate_request(request, x_api_key)
body = await request.json()
return await call_apify(ACTORS["hn"], body, settle_data)
@app.post("/api/substack/search")
async def substack_search(request: Request, x_api_key: str | None = Header(None)):
settle_data = await authenticate_request(request, x_api_key)
body = await request.json()
return await call_apify(ACTORS["substack"], body, settle_data)
No business logic, no schema validation at this layer. The actor handles input validation on the Apify side; we forward and return.
Layer 2: Input/Output Schemas
Each endpoint has explicit schemas that serve double duty — used for both x402 payment metadata and for MCP/agent-card discovery:
INPUT_SCHEMAS = {
"/api/bluesky/search": {
"type": "object",
"properties": {
"searchQuery": {
"type": "string",
"description": "Search query for Bluesky posts",
},
"maxItems": {"type": "integer", "default": 10},
"scrapeType": {"type": "string", "default": "posts"},
},
"required": ["searchQuery"],
},
"/api/hn/search": {
"type": "object",
"properties": {
"searchTerms": {
"type": "array",
"items": {"type": "string"},
"description": "List of search terms to query HN",
},
"maxResults": {"type": "integer", "default": 10},
},
"required": ["searchTerms"],
},
"/api/substack/search": {
"type": "object",
"properties": {
"publicationSlug": {
"type": "string",
"description": "Substack publication slug (e.g. 'stratechery')",
},
"maxArticles": {"type": "integer", "default": 10},
},
"required": ["publicationSlug"],
},
}
Defining these schemas up front pays dividends: they're reused across the x402 payment requirements, the OpenAPI spec, the MCP manifest, and the A2A agent card. When an actor changes a field name, you update it in one place and all four discovery surfaces stay consistent.
Layer 3: x402 — Pay-Per-Request Without an Account
x402 is an emerging HTTP payment protocol built on ERC-20 tokens (specifically USDC on Base in this implementation). The full flow:
- Client sends a request with no credentials
- Server returns HTTP 402 with a
PAYMENT-REQUIREDheader containing payment details - Client sends a signed payment in an
X-PAYMENTheader - Server verifies + settles with a facilitator service, then forwards to Apify
- Server returns data with
X-PAYMENT-RESPONSEcontaining the settlement receipt
Building the payment requirements response:
BASE_URL = os.environ["BASE_URL"]
PAY_TO = os.environ["PAY_TO"] # Your USDC address on Base
def make_payment_requirements(resource_path: str) -> dict:
accept = {
"scheme": "exact",
"mimeType": "application/json",
"network": "base",
"maxAmountRequired": "50000", # $0.05 USDC (6 decimal places)
"resource": f"{BASE_URL}{resource_path}",
"description": f"API access: {resource_path}",
"payTo": PAY_TO,
"maxTimeoutSeconds": 60,
"asset": "0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913", # USDC on Base
"extra": {"name": "USDC", "version": "2"},
}
if resource_path in INPUT_SCHEMAS:
accept["inputSchema"] = INPUT_SCHEMAS[resource_path]
return {"x402Version": 1, "error": "Payment required", "accepts": [accept]}
The verify-and-settle flow delegates all blockchain complexity to a facilitator:
FACILITATOR_URL = os.environ.get("FACILITATOR_URL", "https://facilitator.xpay.sh")
async def verify_and_settle_payment(
payment_b64: str, resource_path: str
) -> tuple[bool, str, dict | None]:
payment_payload = json.loads(base64.b64decode(payment_b64))
payment_requirements = make_payment_requirements(resource_path)["accepts"][0]
facilitator_body = {
"x402Version": 1,
"paymentPayload": payment_payload,
"paymentRequirements": payment_requirements,
}
async with httpx.AsyncClient(timeout=30.0) as client:
# Step 1: verify the payment is valid BEFORE calling Apify
verify_resp = await client.post(
f"{FACILITATOR_URL}/verify", json=facilitator_body
)
result = verify_resp.json()
if not result.get("isValid", False):
return False, f"Payment invalid: {result.get('invalidReason')}", None
# Step 2: settle — this actually moves the USDC on-chain
settle_resp = await client.post(
f"{FACILITATOR_URL}/settle", json=facilitator_body
)
result = settle_resp.json()
if not result.get("success", False):
return False, "Settlement failed", None
return True, "", result
The two-step verify-then-settle pattern is important: you don't want to call an expensive actor run and then discover the payment was invalid.
Authentication Fallback
The service supports both x402 and a traditional API key for development and backward compatibility:
async def authenticate_request(
request: Request, x_api_key: str | None
) -> dict | None:
# Path 1: API key auth — returns None (no settlement receipt)
if x_api_key and x_api_key == os.environ.get("API_KEY"):
return None
# Path 2: x402 payment — returns settlement receipt
x_payment = request.headers.get("x-payment")
if x_payment:
success, error_msg, settle_data = await verify_and_settle_payment(
x_payment, request.url.path
)
if success:
return settle_data
raise HTTPException(status_code=400, detail=error_msg)
# Path 3: neither — trigger the 402 flow
requirements = make_payment_requirements(request.url.path)
raise HTTPException(
status_code=402,
headers={"PAYMENT-REQUIRED": json.dumps(requirements)},
)
Agent Discovery: Making the API AI-Native
An unexpected benefit of x402 is that it makes the API discoverable by AI agents without any human account creation. The service exposes three well-known endpoints:
GET /.well-known/x402 → list of paid resources with schemas
GET /.well-known/agent-card.json → Google A2A protocol agent card
GET /.well-known/mcp.json → MCP tool manifest for Claude/GPT agents
The MCP endpoint exposes each scraping endpoint as a tool with full input/output schemas and payment info. An AI agent that understands x402 can discover this service, verify the cost ($0.05/call), autonomously pay with its own wallet, and get structured data — no human in the loop.
Example MCP tool entry (generated dynamically from INPUT_SCHEMAS):
@app.get("/.well-known/mcp.json")
async def mcp_manifest():
tools = []
for path, schema in INPUT_SCHEMAS.items():
tools.append({
"name": path.lstrip("/").replace("/", "_"),
"description": f"Scrape data via {path}. Costs $0.05 USDC per call.",
"inputSchema": schema,
"payment": {
"required": True,
"amount": "$0.05 USDC",
"protocol": "x402",
},
})
return {"tools": tools}
Deployment: Alpine Linux, 256MB RAM
The server is a small Alpine Linux container. Getting FastAPI running reliably on 256MB required a few deliberate choices:
- Uvicorn, single worker. Multiple workers would exceed the RAM budget. One is fine for the traffic level of a side-project API.
-
Async throughout. Apify calls can take 30–60 seconds. Synchronous handling would starve the event loop. Every I/O operation uses
async/awaitwith httpx. - No database. Payment verification and settlement are stateless — the facilitator handles all that. No SQLite, no Redis, no persistence layer.
-
nginx as TLS proxy. FastAPI binds to
localhost:8001; nginx handles TLS and port-forwards via the VPS provider's subdomain.
The service startup script:
#!/bin/sh
pkill -f "uvicorn app:app" || true
sleep 1
cd /home/frog/hustler-proxy
nohup uvicorn app:app --host 127.0.0.1 --port 8001 > uvicorn.log 2>&1 &
echo "Started PID $!"
Testing the Full Flow
Check the 402 response before wiring up a wallet:
curl -si -X POST https://your-service.example.com/api/hn/search \
-H "Content-Type: application/json" \
-d '{"searchTerms": ["apify", "web scraping"], "maxResults": 3}'
Expected response:
HTTP/2 402
PAYMENT-REQUIRED: {"x402Version":1,"error":"Payment required","accepts":[...]}
Test with API key auth first to confirm the Apify integration works:
curl -s -X POST https://your-service.example.com/api/hn/search \
-H "Content-Type: application/json" \
-H "X-API-Key: your_fallback_api_key" \
-d '{"searchTerms": ["apify", "web scraping"], "maxResults": 3}' \
| python3 -m json.tool
Once confirmed, test x402 using the x402-fetch JavaScript library:
import { withPaymentInterceptor } from "x402-fetch";
import { privateKeyToAccount } from "viem/accounts";
const account = privateKeyToAccount("0x…your_private_key…");
const fetch402 = withPaymentInterceptor(fetch, account);
const res = await fetch402("https://your-service.example.com/api/hn/search", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ searchTerms: ["apify", "web scraping"], maxResults: 5 }),
});
const stories = await res.json();
console.log(stories);
Lessons Learned
1. Apify actors remove the maintenance burden, not just the initial work.
The Bluesky AT Protocol changed its auth requirements twice in six weeks. Both times, the community actor was updated within a day. Rolling your own scraper means those breaking changes are your problem to chase.
2. x402 is spec-fragile in ways the documentation doesn't warn you about.
The PAYMENT-REQUIRED header must contain the full JSON body, not a pointer to it. The mimeType field in the accepts array must be present. The inputSchema must be inside the accepts array, not in the outer body. Each of these I discovered by running the x402scan tool against my own service and reading the failure messages carefully.
3. The facilitator abstraction is the right call for small services.
Doing on-chain verification yourself means running a node or paying for an RPC provider. The facilitator pattern (verify + settle as a service) adds a trusted third party but removes 200+ lines of web3 code from your application. For a $0.05/call service the trade-off is obvious.
4. run-sync-get-dataset-items is the correct Apify endpoint for synchronous APIs.
The two-step run-then-fetch pattern (POST /runs → poll → GET /datasets/{id}/items) is fine for async jobs but adds 2–4 extra HTTP round-trips for every request. The sync endpoint blocks and returns the dataset in one call. Actors that take 30+ seconds to complete are fine — httpx handles the timeout gracefully.
5. Schema definitions are the API's contract, not an afterthought.
Defining INPUT_SCHEMAS up front meant they could be reused across the x402 payment requirements, the OpenAPI spec, the MCP manifest, and the A2A agent card. When an actor changed a field name, updating one dict kept all four discovery surfaces consistent.
What to Build Next
This pattern generalizes well beyond social data:
- E-commerce price monitor: wrap an Amazon product scraper actor with x402 pricing per ASIN lookup
-
Job listing aggregator: combine LinkedIn, Indeed, and Glassdoor actors behind a single
/api/jobs/searchendpoint - AI research assistant: expose a set of actors as MCP tools so a Claude agent can discover and pay for web data autonomously
-
Actor usage analytics: log every
call_apify()invocation to SQLite and expose a/api/statsendpoint — Apify's Actor run logs give you timing and compute unit cost per call, which lets you set prices based on actual cost
The core insight is that Apify's uniform API contract (run-sync-get-dataset-items + dataset response) makes it trivial to add new data sources. Adding a new actor is a one-line dict entry in ACTORS plus a new endpoint handler.
All code in this article is production code running on a live service. The approach works on the free Apify tier for low-volume use cases; higher traffic will require a paid Apify plan to cover compute unit costs.
Top comments (0)