DEV Community: Ben Utting

I Built a Skill That Pulls Any Australian Real Estate Agent's Sales History in 60 Seconds

Ben Utting — Fri, 01 May 2026 16:17:40 +0000

Researching a real estate agent's track record in Australia is painful. You open Domain, search the agent's name, click through to their profile, scroll the sold tab, copy details into a spreadsheet. Then you do it again on realestate.com.au because the two portals show different listings. Then maybe RateMyAgent for a third pass. Cross-reference the addresses, figure out which sales appear on both, fill in the gaps where one portal shows the price and the other says "Contact Agent."

For one agent, that's 30 to 90 minutes of clicking, scrolling, and copy-pasting. For a shortlist of five agents, it's an afternoon gone. And you still end up with a messy spreadsheet that's missing half the prices.

I built this tool for a client who needed to vet agents across multiple suburbs in Sydney. After watching them do it manually twice (once for Paddington, once for Mosman), I automated the whole pipeline. Input an agent's name, get back structured data: every recent sale with address, price, date, bedrooms, bathrooms, car spaces, days on market, and a link back to the original listing.

Now I'm giving it away for free. Here's how it works and how to set it up.

The problem in detail

The Australian real estate market has no single source of truth for agent performance. Domain and realestate.com.au both show sold listings, but they show different ones. An agent might have 12 sold properties on Domain and 8 on REA, with only 5 overlapping. Prices are inconsistent: Domain shows a sold figure, REA says "Contact Agent" for the same property. Neither portal offers a structured API for agent-level sold data.

If you're a vendor choosing which agent to list with, you're supposed to make a decision worth tens of thousands in commission based on whatever curated stats the agent puts in their pitch deck. If you're a buyers agent who needs to know who dominates a suburb, you're doing this research weekly across dozens of agents.

The data exists. It's just trapped behind JavaScript-rendered portals with no export button.

What the tool does

You give it an agent's name. It gives you their recent sold listings as clean, structured data.

uv run main.py search --agent "Sarah Mitchell" --agency "Ray White" --suburb Paddington --state NSW

Out comes a JSON array:

[
  {
    "agent_name": "Sarah Mitchell",
    "agency_name": "Ray White Paddington",
    "property_address": "14 Glenmore Road",
    "suburb": "Paddington",
    "state": "NSW",
    "postcode": "2021",
    "property_type": "house",
    "bedrooms": 3,
    "bathrooms": 2,
    "car_spaces": 1,
    "sold_price": 2150000,
    "sold_date": "2024-11-03",
    "days_on_market": 22,
    "listing_url": "https://www.domain.com.au/...",
    "source_portal": "domain.com.au"
  },
  {
    "agent_name": "Sarah Mitchell",
    "agency_name": "Ray White Paddington",
    "property_address": "7/88 Oxford Street",
    "suburb": "Paddington",
    "state": "NSW",
    "postcode": "2021",
    "property_type": "unit",
    "bedrooms": 2,
    "bathrooms": 1,
    "car_spaces": 1,
    "sold_price": 980000,
    "sold_date": "2024-10-18",
    "days_on_market": 14,
    "listing_url": "https://www.realestate.com.au/...",
    "source_portal": "realestate.com.au"
  }
]

Up to 15 listings per search. Both Domain and REA combined, deduplicated, with price data merged from whichever portal actually shows it. Output as JSON to the terminal, or export to CSV, Google Sheets, or fire a webhook to n8n.

Who this is for

Vendors choosing an agent. You're about to sign an exclusive agreement worth 2% of your property's value. Instead of trusting the agent's own marketing, pull their actual sales data. How many have they sold in the last 6 months? What's their average days on market? Are they actually selling in your suburb or three suburbs away?

Buyers agents. You need to know who dominates a pocket. Run it across the top 5 agents in a suburb and compare sold volumes, property types, and price ranges. Do it weekly and you'll spot patterns before your competitors do.

Agency owners. Benchmarking a recruit's claims against reality. They say they sold $40M last year. Pull the data and check.

Mortgage brokers and valuers. Recent comparable sales filtered by the selling agent, not just by suburb. Useful for building a picture of market activity in a specific area.

Proptech teams. Building agent comparison products, performance databases, or market intelligence dashboards. This tool gives you the raw data layer.

Anyone who currently does this research with 6 browser tabs and a spreadsheet.

How it works under the hood

The tool runs a five-stage pipeline. Each stage is its own module, testable independently.

Stage 1: Agent discovery

The first challenge is finding the agent's canonical profile URL on each portal. The tool searches Domain's agent directory (domain.com.au/find-agent) and REA's agent pages using httpx and BeautifulSoup. It parses the search results to extract the profile link and internal agent ID.

Common names are a problem. "John Smith" at "Ray White" might match four different agents across NSW, VIC, and QLD. Adding a suburb narrows it significantly. If the direct lookup fails entirely (some agents have unusual profile URL structures), the tool falls back to a Google search via SerpAPI, looking for profile pages across both portals.

Stage 2: Sold listings scrape

Once the profile URLs are resolved, Playwright launches headless Chromium and loads the agent's "Sold" tab on each portal. This is where it gets tricky: both portals render listing data with JavaScript. A simple HTTP request gets you an empty page. You need a real browser.

Domain uses traditional pagination with a "Next" button. The scraper clicks through up to 2 pages. REA uses a lazy-load scroll pattern: new listings only appear when you scroll to the bottom. The scraper scrolls incrementally, waiting for new cards to render after each scroll.

Both portals deploy Cloudflare and behavioural fingerprinting. The scraper uses randomised user agents, realistic viewport sizes, and delays between 1.5 and 3 seconds between every action. It also patches the navigator.webdriver property so Chromium doesn't announce itself as automated.

The output from this stage is raw HTML: the inner content of each listing card element. Messy, unstructured, full of nested divs and CSS classes that change without notice.

Stage 3: LLM extraction

This is where the tool gets interesting. Instead of writing brittle CSS selectors that break every time Domain updates their frontend, the raw HTML cards get sent to Claude Haiku with a structured extraction prompt.

The prompt tells Haiku exactly what fields to extract (address, suburb, postcode, property type, bedrooms, bathrooms, cars, price, date, URL) and how to handle edge cases: "Contact Agent" means null price, "terrace" maps to "house", price ranges use the lower value. The response comes back as JSON, validated against a Pydantic model.

Cost per run: roughly $0.001 for 10 listings. That's a fraction of a cent. The token usage gets logged so you can track spend over time.

If the Anthropic API is unreachable or returns malformed JSON, a regex fallback parser handles the most common HTML patterns from both portals. It's less accurate but better than nothing.

Stage 4: Deduplication and merge

The same property often appears on both Domain and REA with slightly different formatting. "14 Glenmore Road" on one, "14 Glenmore Rd" on the other. The pipeline normalises addresses (expands or contracts street type abbreviations, strips punctuation, lowercases) and deduplicates on normalised address plus sold date.

When a duplicate is found, Domain data takes priority for price (it's more frequently visible), but null fields get filled from the REA listing. The result is a single clean record that combines the best data from both sources.

Stage 5: Output

JSON to stdout by default. Also supports:

CSV export via pandas, saved to an output directory
Google Sheets append via gspread, adding rows to a configured spreadsheet
Webhook POST for piping results into n8n or any other automation tool

Every run gets logged to a local SQLite database: timestamp, agent name, records found, sources hit, and API cost. You can view history with uv run main.py history.

Setup takes about 3 minutes

git clone https://github.com/ben-utting/claude-skills.git
cd claude-skills/au-agent-sales-miner
cp .env.example .env
# Add your Anthropic API key to .env
uv sync
playwright install chromium

That's it. The only required key is your Anthropic API key (for the Haiku extraction step). SerpAPI, Google Sheets, and webhook credentials are all optional extras.

If you use Claude Code, it's even simpler. Point it at the folder and ask "search for Sarah Mitchell's recent sales in Paddington." The skill.md file tells Claude what the tool does and how to invoke it. No manual commands, no remembering flags.

Things to know before you run it

Anti-bot measures. Domain and REA both use Cloudflare. From a home/office IP you'll be fine for personal research. If you're planning to run this at scale (hundreds of agents per day), you'll want a residential proxy. The tool supports proxy configuration via the .env file.

Price visibility. REA frequently suppresses sold prices at the vendor's request. The tool handles this gracefully with null values, but don't expect 100% price coverage. Domain is significantly better for price data. In some suburbs, up to 40% of REA listings hide the sold figure.

Agent name disambiguation. Common names with common agencies can match multiple agents across states. Always add a suburb when you can. The tool picks the best match from the search results, but suburb context makes the difference between the right Sarah Mitchell and the wrong one.

Terms of service. Both portals prohibit automated scraping in their ToS. This tool is provided for research and educational purposes. Use it responsibly.

The repo: github.com/ben-utting/claude-skills

Over to you

How long does it take you to research a real estate agent right now? And if you could pull structured data from any Australian portal that doesn't have a public API, what would you build first?

ctrlaltautomate.com

The Homelab That Runs My Automation Business: One Mini PC, Two Salvaged Drives

Ben Utting — Tue, 28 Apr 2026 08:00:00 +0000

I have an £869 mini PC on a shelf. Plugged into it are two USB drives I've owned for over a decade. One was the expansion drive for my Xbox in about 2010. The other was an external backup drive I bought around 2015 and rediscovered in my bedroom. Together they run fifteen workloads that would otherwise be a monthly cloud bill.

This is what's actually on it, how much it replaced, and why I went this route instead of a VPS.

Why a box and not the cloud

The trigger was client demos. Prospects come in asking about OpenClaw or Claude Code setups and want to see something running. A throwaway GCP or AWS VM for that is £15 to £25 a month each, and even a cheap VPS is £5 to £10. Multiply by the number of demos, experiments, and half-finished builds I'd want to keep around, and the monthly number gets uncomfortable fast.

I have an infrastructure background. Azure, VMware, patching, commissioning and decommissioning servers is literally my day job. Running my own hypervisor at home is a Tuesday. The decision was: spend one time on hardware I own and run whatever I want on it, or pay a cloud provider in perpetuity for the same capacity. I bought the box.

The hardware

The host is a Beelink SER9 MAX. £869 from Amazon.

AMD Ryzen 7 255 with Radeon 780M graphics, 8 cores / 16 threads, boost to 4.97 GHz
64 GB DDR5-5600 (2 x 32 GB Micron)
1 TB Crucial NVMe (CT1000E100SSD8)
10 GbE on board, USB4, WiFi 6

It's a lot of machine for the price. Ryzen 7 H-series compute, 64 GB of fast DDR5, and a 10 GbE port in a palm-sized box. If I'd specced equivalent compute in the cloud I'd be north of £150 a month before storage.

Plugged into it over USB:

A 2 TB Seagate ST2000DM001, which started life as the USB expansion drive on my Xbox around 2010. It's partitioned into two halves: 500 GB for Proxmox backups, 1.3 TB as a spare.
A 1 TB Seagate Expansion USB drive from around 2015 that I'd forgotten about until I was tidying up. It holds the media library for Jellyfin.

Nothing on those drives I'd cry about losing. Everything important is either on the NVMe with proper Proxmox backups, or replicated to Google Drive (which will soon be going) via rclone.

The hypervisor

Proxmox VE 9.1.6 on kernel 6.17.13. Free, open source, boring in the best way. I've used ESXi and Hyper-V professionally for years and Proxmox is the one I want at home because it doesn't need a licence server, the web UI is good enough to not need vCenter, and the LXC support is first-class.

The storage layout is uncomplicated:

local on the NVMe (98 GB) for ISOs and templates
local-lvm thin pool on the NVMe (832 GB) for all VM and container disks, currently 35% used
media bind-mount to the 1 TB USB for Jellyfin
backups to the 500 GB partition on the 2 TB USB

Thirty-nine days uptime as I'm writing this. Load average 0.31. Using 19.7 GB of 58 GB RAM with everything running. It's genuinely idle most of the time.

What's actually running

Four VMs currently up, eleven LXC containers, fifteen workloads total.

AI and client work (VMs)

201 claudio (8 GB RAM). The always-on Claude Code box I wrote about previously. Runs my morning brief, client pulse, and the X post drafts every day.
202 costar (4 GB RAM). A dedicated VM for a client project automating prospecting. Kept separate so I can tear it down or snapshot it without touching anything else.
401 scout (4 GB RAM). Scout scraper that finds AI automation jobs every few hours and posts them to an n8n webhook.
200 openclaw-giuseppe (stopped). A demo VM I spin up when prospects want to see OpenClaw running. This is the one that justified the box in the first place.

Infrastructure and services (LXC containers)

500 n8n (4 GB). Workflow automation for client deliverables and my own content pipeline.
501 npm (2 GB). Nginx Proxy Manager, reverse-proxies every internal service onto a sensible hostname.
502 pihole (512 MB). Network-wide DNS and ad blocking.
503 evolution (2 GB). Long-running test environment for WhatsApp automations.
504 jellyfin (2 GB). Media server for the 1 TB external drive.
505 finance-dashboard (2 GB). The FastAPI + SQLite finance app I wrote about in the last post.
506 tailscale (2 GB). Mesh VPN so every box I own can reach every other box regardless of network.
507 life-os (2 GB). My self-hosted productivity stack.
508 vs-code (2 GB). Browser-based VS Code instance I can hit from any device.
509 calibre-web (2 GB). eBook library.
510 media-stack (2 GB, 200 GB disk). Everything around Jellyfin.

LXC is doing most of the heavy lifting here. Almost all of the services are Linux userspace apps that don't need a full kernel of their own, so containers are faster to spin up, use less RAM, and are trivial to back up.

What the cloud equivalent would cost

Taking the running workloads at a conservative £10 to £15 a month each for a similarly-specced cloud VM or managed container:

4 running VMs at £15 each = £60/month
11 LXC containers at £8 each (smaller workloads, cheaper tier) = £88/month
Say £150/month of replaced compute

That doesn't count storage. The 2 TB of external USB storage would be another £30 to £50 a month on any cloud object or block storage tier at those sizes.

Call it £180 to £200 a month. The Beelink paid for itself in under six months. Everything after that is free.

More importantly, the capacity to experiment is now effectively unlimited. If I want to spin up a new VM to test an idea, I click twice. If I want to throw away that VM an hour later, I click twice. There's no bill meter running in the background nudging me to stop.

The client demo workflow, concretely

The original reason for the box. When a prospect wants to see OpenClaw in action:

Clone the OpenClaw template VM (ID 901) into a fresh instance.
Log in, show them the skills I've built, walk through a real run.
Stop the VM when the call ends. Resources freed, backup snapshot kept.

Total cost to me: electricity. Total cost in the cloud for the same demo capacity kept warm: £15 to £25 a month per client, every month.

What I'd do differently and what's next

What I'd do differently: get the box sooner. I spent a couple of months running services on my main machine before committing to the Beelink. Those months of friction (every time I rebooted Windows, every time something drew too much CPU) were the cost of not doing it.

What's next is the network layer. Right now everything is behind whatever my ISP router gives me. The plan is a proper UniFi stack: Cloud Gateway Ultra, Flex Mini 2.5 GbE switch, U7 Lite WiFi 7 access point, and a DeskPi mini rack for about £300 total. That gets me VLAN segmentation (client traffic on its own network, lab traffic on another), a real firewall with IDS, and a home for the box that isn't a shelf.

The second phase is storage: a proper four-bay NAS with mirrored drives so I can stop relying on two random USB drives for anything important. That's the bit that's genuinely been held together with string.

Beyond that, the next Proxmox node. The joy of this setup is it scales by just adding another box. Two Beelinks clustered together is more compute than most small businesses have on-premise, and it still fits on a shelf.

The takeaway

If you've got even a little infrastructure instinct and you're paying for more than one cloud VM a month, a second-hand mini PC and a Proxmox install will pay for itself fast. Use whatever drives you've got. Proxmox doesn't care that your 2 TB backup volume used to sit on top of an Xbox 360.

The £869 hardware number is real. The two old USB drives are real. The 39 days of uptime are real. It's running my business.

ctrlaltautomate.com

Cinematic Product Videos with fal.ai and Kling 3.0 for $1 a Scene

Ben Utting — Tue, 21 Apr 2026 08:00:00 +0000

A client needed social media videos of their product in six different lifestyle scenes. Professional shoots would have cost thousands per location. We did all six for about $6 total, in under an hour.

The pipeline is two API calls: one to place the real product into a generated scene, one to animate it into a 5-second video with sound. Both run through fal.ai.

The brief

The client had a small physical product and a solid brand page with plenty of existing content. He sent me an AI-generated video he'd seen of someone walking through New York that seamlessly featured a product. He wanted something similar for his own brand: cinematic scenes showing the product in restaurant and bar settings, generated entirely from a single product photo.

The goal was to build a repeatable skill that could produce these scenes on demand, not just a one-off video.

Step 1: place the product into a scene

The first script uses Google's Nano Banana 2 edit model via fal.ai. You give it a reference photo of the real product and a text prompt describing the scene you want. It generates a new image with the product placed naturally into that environment, preserving the product's appearance, label, and proportions.

python generate_kontext.py product_photo.jpg \
  "Product on white linen table, candlelit restaurant, beside wine glass, warm golden light, cinematic" \
  --variations 5

The --variations 5 flag is important. AI image generation is inconsistent. Out of five attempts, usually two or three look good. One will be excellent. The rest get discarded. At $0.04 per image, generating five costs $0.20. Cheap enough to always overshoot.

One thing I learned: prompts need a scale anchor. If the product is small, the model will sometimes scale it up to fill the scene. Always include a size reference in the prompt: a wine glass, a hand, a plate. Something that tells the model how big the product actually is relative to its surroundings.

Step 2: animate the winner

The second script takes the best image from Step 1 and turns it into a 5-second video using Kling 3.0 Pro, also via fal.ai. It generates native audio too: sizzling sounds for a kitchen scene, ambient restaurant noise, clinking glasses.

python generate_video.py \
  "Hand reaches for product, picks it up, tilts gently, slow motion" \
  --image_url "https://fal.media/files/..." \
  --duration 5 \
  --cfg_scale 1.0

The cfg_scale setting matters. The default (0.5) gives the model creative freedom, which is fine for abstract content but bad for product shots. Setting it to 1.0 forces the model to follow the prompt closely. For product content, you want maximum adherence: the product should stay in frame, the motion should be what you described, nothing should morph or distort.

One video takes 60 to 180 seconds to generate and costs about $0.80. Combined with the image step, a full scene (5 image variations + 1 video) runs to about $1.

The scenes we built

We created a prompt library with six scenes, each with an image prompt and a matching motion prompt. Restaurant lifestyle, in-hand close-ups, kitchen action shots, moody food pairings, textured product beauty shots, and bar settings.

Each scene follows the same workflow: two commands, one decision (pick the best of five images), one output (a 5-second video with audio). Total cost for all six scenes: about $6. Total time: under an hour, including prompt iteration.

The prompt library is the reusable part. Once you've dialled in the style and scale for one product, adapting it for another is just swapping the product description and the reference photo.

What I'd do differently

Batch the image generation. Right now each scene is a separate script invocation. A wrapper that runs all six scenes, generates all 30 images, and presents them for review in one pass would save time.

Test 9:16 for Stories and Reels. All our content was 16:9. Kling supports 9:16 for vertical video, but only in text-to-video mode (not image-to-video). For Instagram Reels, you'd need to either crop or generate the initial image at 9:16.

Build a prompt template system. The prompt library works, but it's manual. A template where you swap in the product name, size description, and setting would make this reusable across clients without rewriting prompts from scratch.

Why this works for small brands

This client is a bootstrapped D2C brand. There's no budget for location shoots across six restaurants. But the social content needs to look premium because the product is premium.

This pipeline delivers that. Five minutes per scene, a dollar per video, and the output looks like it came from a production studio. The client picks from five image options, approves one, and gets a ready-to-post video with sound. No photographer, no stylist, no venue booking.

If you're selling a physical product and need lifestyle content at scale, this exact pipeline works. Two scripts, one API key, and a good product photo to start from.

ctrlaltautomate.com

The Finance Dashboard I Built in a Weekend to Replace Four Spreadsheets and an Accountant

Ben Utting — Mon, 20 Apr 2026 05:03:49 +0000

I used to spend around two hours a week touching my finances. Personal spend in one Excel file, business income and expenses in another, crypto scattered across Binance statements, Nexo interest CSVs, and a third spreadsheet where I logged what I'd actually bought. At year end, a £500 tax advisor would turn it all into a self-assessment.

I spent a weekend building a replacement. It's been running for months now. This is what's in it.

What it is

finance-dashboard is a FastAPI app with a SQLite database behind it and a vanilla JavaScript SPA on top. No React, no build step, no cloud. It runs on an Ubuntu 24.04 LXC container on my home Proxmox box, reachable only over VPN on port 8080.

Rough size:

main.py: 4,226 lines, 76 API endpoints
app.js: 4,818 lines
SQLite schema: 22 tables
Total: about 9,800 lines across 6 files

Heavy lifting comes from three libraries: pdfplumber for PDF parsing, openpyxl for Binance Excel exports, and httpx for the async calls out to CoinGecko (live crypto prices), frankfurter.app (historical USD/GBP rates), and Perplexity (AI-generated crypto analysis).

There is no Binance or Coinbase API integration. Imports happen via uploaded files. That was deliberate: I wanted the source of truth to be the statements themselves, not a live feed I'd have to trust.

The personal split: 55/15/15/15

Every month I enter my PAYE net pay. The app splits it four ways:

55% Outgoings
15% Invest 1
15% Invest 2
15% Fun

Those percentages and bucket names live in the settings table, so they're configurable if the rule ever changes. The split is computed, not moved. There's no ledger of transfers between buckets, just a running remaining figure per bucket:

budget_remaining_outgoings = budget_outgoings 
  - fixed_total 
  - outgoings_spent 
  - food_spent

Fixed outgoings (mortgage, phone, gym, Apple Music) are snapshotted per month. If my gym price goes up, last year's budget doesn't silently shift. That's a small thing that matters the first time you look back at a 9-month-old month and wonder why the numbers don't reconcile.

A separate month_invest_status table tracks whether each 15% chunk actually got invested that month. That's the checkbox that forces the discipline.

The business split: 60/40

One setting: business_tax_pct = 60. Every time business income or an expense changes, _recalc_tax_pot fires and walks all months in chronological order to recompute a cumulative running total:

tax_pct = float(settings.get("business_tax_pct", 60)) / 100
total_business = sum of month's business_income
amount_added = round(total_business * tax_pct, 2)
# upsert, then recompute running_total across all months

60% of every pound of business income is flagged as tax pot. The remaining 40% is "running costs allocation". It's pessimistic on purpose. I'd rather over-reserve and release at year end than under-reserve and scramble.

The UK tax calculation

This is the piece a generic accounting tool won't give you. _calculate_uk_tax() is about 240 lines and encodes the full 2025-26 self-assessment picture:

Personal allowance £12,570, tapered by £1 for every £2 above £100k
Basic 20%, higher 40%, additional 45%
Class 4 NI: 6% between £12,570 and £50,270, 2% above
Working-from-home flat-rate tiers (101+ hours a month = £26, 51-100 = £18, 25-50 = £10)
Trading allowance vs actual expenses, whichever gives the better deduction
Crypto income filtered to the exact 6 April to 5 April window

It aggregates P60 data where available, falls back to summed monthly payslips, adds P11D benefits, and subtracts PAYE already paid to estimate what's owed on self-assessment plus the two payments on account.

It also tracks six separate allowances in one view: savings interest (£500), dividends (£500), CGT (£3,000), ISA (£20,000), trading (£1,000), and WFH. That's the screen I miss most when I imagine going back.

CGT on crypto is its own beast. The code implements the HMRC matching rules properly: same-day matching, then the bed-and-breakfast rule (buys within 30 days after a sell), then the Section 104 pool for everything else.

The crypto profit-taking panel

Two things drive it, and neither is a "sell now" alarm.

First, a target allocation for the portfolio (BTC 30%, HBAR 30%, SOL 20%, ADA 10%, NEAR 10%). Live prices from CoinGecko get compared to those targets, and anything overweight generates a rebalance hint like "Sell 0.042 SOL". That's a nudge, not a signal.

Second, an "Analyse with AI" button per holding that sends the asset's cost basis, unrealised P&L, and allocation to Perplexity's sonar-deep-research model with a prompt asking for a HOLD / ACCUMULATE / REDUCE call, price targets, and the UK CGT implications of selling now. Results cache in a crypto_analysis_cache table so I'm not re-running expensive queries.

It's research on tap, grounded in my actual numbers.

What broke: the PDF import parser

This is where I spent most of the weekend, and where I've been back to patch things since.

The expense PDF parser is ~170 lines of regex. Every new vendor layout I import exposes a new edge case. The scars:

USD detection requires an actual $ sign, because GBP amounts near the word "Total" would otherwise match. Upwork uses "Total payments $111.99", Stripe uses "Amount due US$49.00", generic invoices use "TOTAL AMOUNT: $100.00". Three separate patterns.
GBP detection runs a priority system. Azure puts "Total (including Tax)" in one corner, Google puts "Total Amount" in another. The parser collects every candidate, scores them by type, and picks the highest-priority largest value.
Date parsing strips ordinal suffixes ("9th March 2025" becomes "9 March 2025") and tries eight formats. If none match, it falls back to parsing DD-MM from the filename.
Vendor detection is a hardcoded keyword dictionary of 13 names. If nothing matches, it uses the filename.

The fix pattern, whenever a new layout arrives, is to point Claude at the PDF and the parser and ask it to extend the regex priorities without breaking the existing ones. It takes about ten minutes and a test run against the old invoices to confirm nothing shifted. There's still a logging.warning line in production from when I was last debugging it. That's the tell.

I've deliberately not reached for an LLM to do the extraction. Regex is boring, fast, and free, and when it's wrong it's wrong in an obvious way.

What it replaced, and what it gave back

The two hours a week of data entry is now closer to ten minutes, and most of that is uploading statements. The £500-a-year accountant is gone. More than the money, I know the tax number in real time, which changes how I think about taking on an extra piece of work in February.

The less obvious win: every part of my money lives in one place. Personal, business, and crypto in one database means questions like "how much did I actually spend on AI tooling last year" or "how much unrealised gain is sitting in SOL" take one query, not an afternoon.

What's next

A couple of things I'd do differently and will get to:

The frontend is one 4,818-line app.js. It works, but it's the first place I'll feel the pain when I add the next feature. Splitting it into modules is overdue.
Live bank feeds via Open Banking would remove the monthly statement upload step. The reason I haven't is that I like the ceremony of it. Importing is when I actually look at the numbers.
A second screen for forecasting rather than tracking. The data is all there, the views aren't.

If you're a solopreneur with finances in four spreadsheets and an accountant bill at year end, a weekend of FastAPI and SQLite can genuinely replace it. You just have to be willing to encode your own tax rules.

ctrlaltautomate.com

How I Built a Lead Gen Machine That Finds My Clients on Upwork

Ben Utting — Fri, 17 Apr 2026 08:00:00 +0000

Two of my current clients came from the same system: a Python scraper that monitors Upwork every 20 minutes, scores each job with AI, and sends me a Telegram alert when something scores above a 6. I didn't find them. The system did.

This is how it works.

The problem

Upwork's search is fine if you check it manually a few times a day. But good jobs get buried in proposals fast. By the time I see a high-fit post, it already has 20+ applicants. I needed something that watched continuously and told me the moment a job worth bidding on appeared.

The architecture

The system runs on an Ubuntu VM on my home network. No cloud hosting, no SaaS. The full stack:

Patchright (Playwright fork) for maintaining a persistent Chromium session with Upwork
scrapling with Cloudflare bypass for fetching search results
SQLite for storing every job (1,410+ and counting)
N8N webhook for scoring and notification
systemd timer firing every 20 minutes, 6 AM to 8 PM
Telegram bot for real-time alerts
FastAPI dashboard on :8080 for browsing the data

It's a one-shot script. The timer fires, the scraper runs, it exits. No long-running process, no memory leaks, no daemon to babysit.

How the scraper works

Every 20 minutes, the script does this:

Opens the persistent Chromium profile to verify the Upwork session is still valid and extract cookies
For each search query (Automation, Workflow Automation, AI Automation, and a few others), fetches the search results page using a fresh browser profile with the extracted cookies
Parses each job listing, checks if it was posted in the last 10 minutes, and skips anything already in the database
For new jobs, fetches the full detail page to grab budget, client history, proposal count, and tags
Saves to SQLite and fires the N8N webhook

The fresh-profile trick is important. The persistent Chromium profile got fingerprinted by Cloudflare in April and couldn't auto-solve challenges anymore. Splitting login (persistent profile, Patchright) from scraping (fresh profile, scrapling) fixed it. If the scraper ever starts returning 5KB pages instead of full results, this is the first thing to check.

The scoring layer

Raw jobs go to an N8N webhook that scores them 1 to 10 based on fit. Hard disqualifiers kill the job immediately: unverified payment, hire rate below 30%, rating below 3.5. Boosts push the score up: n8n, Claude, OpenClaw, RAG, MCP, workflow automation, AI agent.

Anything scoring 6 or above gets a Telegram alert with the title, budget, and a link. I open it, read the description, and decide whether to bid. The whole loop from job posted to me reading it is usually under 20 minutes.

The enrichment layer

Every 20 minutes, a second systemd timer runs an AI enrichment script over any un-enriched jobs. It sends the job description to Gemini Flash Lite via OpenRouter and extracts structured fields: tools detected, skill requirements, industry, complexity, fit reasoning, and content opportunities.

After 1,410 jobs enriched, the patterns are clear. GoHighLevel is the most requested tool. n8n + GoHighLevel is the most common combo. 255 jobs scored a 9 for fit, 141 scored a perfect 10. Over 600 jobs were flagged as template opportunities, meaning someone is asking for the same thing that could be productised.

The dashboard

A FastAPI app reads the same SQLite database and serves a dashboard on :8080. It has tabs for KPIs, jobs over time, tool/skill breakdowns, budget distribution, client ratings, industry analysis, and a recent jobs table with drill-down.

The dashboard also has a maintenance tab that can start/stop the scraper, trigger immediate runs, and show a live colour-coded log viewer. It uses passwordless sudo for the systemd timer controls.

There's also an AI chat tab powered by OpenRouter that lets me ask questions about the data in natural language. "What percentage of automation jobs this week mention n8n?" gets answered from the actual database, not from a generic model.

What I'd do differently

Session management is fragile. Upwork invalidates the session every few months, and re-login requires opening a desktop session via Proxmox SPICE console and running the login script manually. I'd like to automate this, but Upwork's auth flow with 2FA makes it hard to do headlessly.

No database backup. The SQLite file only exists on this one VM. If the disk dies, 1,400+ enriched jobs are gone. A nightly sqlite3 .backup to a second location is overdue.

The N8N scoring could be local. Right now the webhook goes to n8n, which adds a network hop. Moving the scoring logic into the enrichment script would simplify the stack and remove the cloud dependency.

The result

The system has been running since mid-March 2026. It's scraped over 1,400 jobs, enriched all of them with AI, and surfaced the two clients I'm currently working with. It runs on a 4GB Ubuntu VM that costs nothing beyond the electricity.

More importantly, it changed how I think about freelancing. I don't browse Upwork anymore. I wait for the ping, read the job, and bid if it fits. The system does the searching. I do the selling.

ctrlaltautomate.com

Building My Always On Claude VPS

Ben Utting — Tue, 14 Apr 2026 08:00:00 +0000

Building Claudio: My Always-On Claude Code Box

I have an always-on Debian VM that reads the AI news, checks in on my clients, and sends me everything over Telegram. No extra infrastructure cost, just the $20/month Claude plan and cron.

That was V1. It lasted about two weeks before the OAuth tokens started expiring and every cron job died silently. This is the story of both versions.

Why build it

I run an AI automation freelance business. I have active clients, a content pipeline, and a morning news habit that used to eat 30 minutes before breakfast. I wanted a system that handled the recurring operational work without me opening a terminal.

The requirements were simple: run Claude Code skills on a schedule, store outputs on Google Drive, and notify me via Telegram. No orchestration platform. No extra cost beyond the $20/month Claude plan I already use for client work.

V1: cron and Claude Code

The first version was minimal. Claudio is a Debian 13 VM on my home network running Claude Code headless via cron.

0  6 * * *   /usr/bin/claude -p "/morning-brief" --permission-mode bypassPermissions >> ~/morning-brief.log 2>&1
0 10 * * *   /usr/bin/claude -p "/client-pulse" --permission-mode bypassPermissions >> ~/client-pulse.log 2>&1

That's the entire automation layer. Two lines in a crontab. It worked for about two weeks.

The first gotcha was the permission prompt. Headless cron jobs hung silently because Claude Code was waiting for an interactive permission check that nobody would ever see. The fix: --permission-mode bypassPermissions. Not prominently documented, and the single most important flag for running Claude Code unattended.

The second gotcha killed V1 entirely. Claude Code's OAuth tokens eventually expire. A 6am cron job doesn't care that your session died at midnight. No error, no Telegram alert, just silence. The skills stopped running and I didn't notice for two days.

OAuth is designed for interactive sessions. If you're running anything headless, you need auth that doesn't expire. Cron and OAuth are fundamentally incompatible.

V2: Claude Desktop and Cowork

V2 solves the auth problem by replacing cron entirely. The stack now:

Claude Desktop (unofficial Linux build via aaddrick/claude-desktop-debian) running persistently on the XFCE desktop
Cowork scheduled tasks replacing cron. Claude Desktop fires each skill on a schedule, no OAuth expiry, no hanging permission prompts
MCP servers wired into Claude Desktop:
- rclone MCP: custom-built, ~50 lines of Node.js, exposes rclone_cat, rclone_lsf, rclone_copyto against the existing gdrive: remote
- Perplexity MCP: official @perplexity-ai/mcp-server, replaces the built-in WebSearch with better recency filtering and citation quality
Google Drive still via rclone, no FUSE mount, same explicit command pattern
Dashboard on :8080 via FastAPI/uvicorn

No cron. No Docker. No orchestration platform.

What broke in V2 (and the fixes)

Cowork's bubblewrap sandbox can't shell out. The rclone CLI worked fine in V1 because cron has no sandbox. Cowork runs inside bubblewrap on Linux, which blocks arbitrary binary execution. The fix: a minimal MCP server wrapping the three rclone commands Claudio actually uses. About 50 lines of Node.js, hardcoded to the gdrive: remote.

The Telegram plugin path is fragile. The plugin lived in a versioned cache directory (telegram/0.0.1/). A plugin update would silently break the Desktop config by changing the path. The fix: copy anything you depend on out of versioned caches into a stable location you control.

Bubblewrap has opinions about your filesystem. The sandbox mounts home as read-only. Any MCP server that tries to write to its own directory on startup (like the Telegram plugin's chmodSync call) will silently fail and take the whole server down with it. Patch it or move it somewhere the sandbox can write.

Telegram bot token conflicts. Claude Desktop and the Claude Code CLI cannot both run the Telegram MCP simultaneously. They fight over the same bot token long-poll (Telegram 409 conflict). The solution: Desktop owns outbound scheduled tasks, the CLI owns inbound Telegram. They share MCP servers but can't share stateful connections.

What's still the same

Drive is not mounted. Every file operation is still an explicit rclone command. Still deliberate, still more reliable than a FUSE mount that can go stale.

Absolute dates only. Every log entry uses YYYY-MM-DD. No "yesterday", no "Thursday". Small discipline, big difference when you read logs a week later.

Stage before writing to Drive. Skills write to /tmp/ first, then rclone copyto the finished file. Half-written files don't land on Drive.

Telegram is still one-way. Cowork tasks send, they don't receive. The reply loop is V3 territory.

What I learned

The biggest lesson across both versions: the right auth model matters more than the right scheduler. V1's cron was fine as a scheduler. It was the OAuth dependency that killed it. V2's Cowork is a fine scheduler too, but the reason it works is that it's tied to the Claude Max account session, which doesn't expire as long as the Desktop app stays open.

The second lesson: MCP servers need stable paths and sandbox awareness. Versioned cache directories, read-only home mounts, and stateful connection conflicts are all things you hit only in production. None of this shows up when you test interactively.

Claudio runs my business operations while I sleep, commute, or focus on client work. V1 proved the concept. V2 made it reliable.

ctrlaltautomate.com

How I Migrated 6 Skills From Manus AI to a OpenClaw VPS

Ben Utting — Mon, 13 Apr 2026 08:00:00 +0000

A client was paying $200 a month for Manus AI to run six automation skills: CRM management, web scraping, property lookups, AI avatar videos, meeting transcripts and investment reports. The skills worked, but the subscription added up and the platform locked him into their infrastructure.

We moved everything to a self-hosted OpenClaw instance on a Hostinger VPS. Pay-per-use API costs instead of a flat monthly fee. The migration took a day. Most of that day was spent on two bugs that had nothing to do with the skills themselves.

The setup

The client runs a real estate investing and marketing business. His stack centred on GoHighLevel for CRM, Apify for web scraping, HeyGen for avatar videos, Melissa Data and Rentcast for property intelligence, and a custom meeting processor for Zoom calls. All six of these ran as skills inside Manus AI.

The target was a Hostinger VPS running Ubuntu 24.04 with OpenClaw deployed in Docker. The model backend switched to OpenRouter, which meant he'd pay per token instead of a flat subscription. For the volume he was running, that's a significant saving.

The six skills we migrated:

apify-actor-finder: scrapes any site (Google Maps, Instagram, LinkedIn) and returns CSV
gohighlevel-api: full CRM control, contacts, opportunities, SMS, appointments, workflows
heygen-avatar-video: generates AI avatar videos from a text script
rei-ai-zoom-processor: turns meeting transcripts into structured summaries with PDF output
melissa-data-information: property ownership lookups
rentcast-property-report: property value, rent estimates and market stats

Each skill had its own Python scripts, API keys and reference docs. The skill definitions (SKILL.md files) translated cleanly to OpenClaw's format. The scripts needed minor patching, not rewrites.

What actually worked

Model selection mattered more than I expected. OpenClaw's default model was Kimi K2.5, which is free tier on OpenRouter. It handles chat fine but does not reliably execute tool calls, which is exactly what skill scripts need. Every skill failed silently or returned garbage output.

Switching to Claude Sonnet 4.6 fixed it immediately. Every skill executed correctly on the first attempt. The cost difference is real ($3 per million tokens vs free) but reliability is not optional when you're running production automations for a client.

The tools profile setting is easy to miss. OpenClaw has a tools.profile setting in its config. The default is "messaging", which gives the model text-only capabilities. Skills that run Python scripts need "full", which enables bash execution and file access. Without it, the model can see the skill definition but can't actually run the scripts. No error message, just nothing happens.

One config line: "tools": { "profile": "full" }. That's it. But if you don't know to look for it, you'll spend an hour wondering why perfectly valid skills produce no output.

Patching Manus-specific dependencies was straightforward. The meeting processor skill referenced gemini-2.5-flash as its LLM (not accessible via a standard OpenAI client) and manus-md-to-pdf for PDF generation (a Manus-internal tool). Two lines changed: the model switched to gpt-4o-mini and the PDF engine switched to pandoc with weasyprint. Everything else in the script stayed the same.

What broke

Bug one: legacy API credentials. The GoHighLevel skill wouldn't authenticate. Every API call returned 401. The skill script had the correct API key hardcoded, but OpenClaw's config file (openclaw.json) had an older JWT token stored in its environment variables section, left over from a previous contractor's setup.

The environment variable took precedence over the key in the script. So the skill was sending a dead v1 legacy token on every request, ignoring the valid key entirely.

The fix: replace the env var with a current Private Integration Token from the GoHighLevel dashboard. But the lesson is broader. When you migrate skills between platforms, check what credentials the platform injects via environment. Skill-level credentials and platform-level credentials can collide, and the platform usually wins.

Bug two: conflicting skill versions. The same previous contractor had installed three older GoHighLevel skills (ghl-v1-api, ghl-v1-contacts, ghl-v1-tasks) that used the v1 API. When the client asked the model to "pull my contacts", it would sometimes pick one of the old skills instead of the new gohighlevel-api skill.

The model doesn't know which skill is current. It sees four skills that all claim to handle GoHighLevel and picks one. Sometimes it picks wrong.

The fix was simple: disable the three old skills in the gateway dashboard. They're still in the config but marked enabled: false. The model now only sees one GoHighLevel skill and uses it every time.

This is the kind of bug that only shows up in real environments. In a clean test install, there are no legacy skills to conflict with. In a client's actual system, there's always history.

The result

Six skills running on a self-hosted VPS. No monthly subscription. API costs scale with actual usage instead of a flat fee. The client has full control of the server, the model, and the skills.

Total migration time was about six hours. Four of those were the two bugs above. The actual skill porting (copying files, installing Python dependencies, testing each skill with real data) took around two hours.

If I did this migration again, I'd add two checks to the start of every engagement: audit the existing environment variables for stale credentials, and list all installed skills to catch version conflicts before they surface as mysterious failures.

Why I'm writing this up

I'm an AI Automation Engineer. I build Claude Code, OpenClaw, N8N and MCP systems for real clients. Every project gets written up here: what worked, what broke, what I'd do differently. No demos, no prototypes.

If you're running AI skills on a managed platform and the subscription doesn't make sense for your volume, self-hosting is viable. The migration is not complicated, but the gotchas are real and they're not in the documentation.

ctrlaltautomate.com

From 2 Hours of Research to a Script in 10 Minutes: Building a Custom OpenClaw Skill for a Content Creator

Ben Utting — Sat, 11 Apr 2026 08:06:05 +0000

A client came to me on Upwork with a straightforward problem: too much time spent before they even hit record.

Their content workflow involved manually hunting for pain points on Reddit and X, pulling inspiration from creators they admired, writing hooks, structuring scripts, all before they could sit down in front of a camera. Solid process, but slow. An hour to two hours per piece of content, just in prep.

They'd heard about OpenClaw and had a rough sense it could help. They just weren't sure how to make it actually do what they needed. That's where I came in.

What We Built

The engagement ran for about a week. The centrepiece was a custom OpenClaw skill: a Content Research Assistant that turns a raw idea — a topic, a brain dump, a link, a vague prompt, into a researched, structured Instagram Reel script, all inside a chat interface.

The skill runs four stages in sequence:

1. Content Brief
Before any research happens, the input gets converted into a structured brief: topic, angle, target audience pain point, desired viewer outcome, medium. This keeps everything focused and prevents the AI from going wide when it should go deep.

2. Platform Research
Web searches run across Reddit, X/Twitter, YouTube, and LinkedIn — in parallel where possible — to surface how real people describe their problems. The goal is raw language: the exact phrases people use when they're frustrated, confused, or searching for answers. That's where good hooks come from.

3. Creator Inspiration
The client had a specific list of creators they studied — some in their niche, some outside it. The skill pulls recent content from relevant creators and extracts structural patterns: hook formats, script pacing, CTA styles. Outside-niche creators are used for format only, never topic. The distinction matters.

4. Script Writing
A full script gets written in the client's brand voice — three hook options (one creator-inspired, one adapted, one original), a 45-60 second core script broken into sections, and two CTA variants. Each hook option is labelled so they know where it came from and can make an informed choice.

Running It Through OpenClaw

We setup OpenClaw to run locally via Docker on the client's MacBook M4. The interface was WhatsApp so the entire workflow lives in a chat thread. They type a topic or paste a brain dump, and within minutes they have a researched brief, platform insights, and a ready-to-record script.

That context matters for how the skill was built. Output has to work in WhatsApp: short paragraphs, bold text where needed, no markdown tables. The skill sends the brief first, then research, then scripts as follow-up messages, not one wall of text.

The result: what used to take an hour or two of manual work now takes around 10 minutes.

What struck me during the engagement was how quickly the client grasped what was possible once OpenClaw was running. The content skill was the proof of concept, but they could immediately see how the same approach applied to managing relationships, managing their week, handling admin. The whole operating system, running in a chat app they already use.

What This Type of Build Looks Like

If you're a creator, solopreneur, or small team spending significant time on recurring research or prep work, this pattern applies directly to you. The specifics change, the platforms you research, the creators you study, the output format, but the structure doesn't.

A custom OpenClaw skill is a workflow with memory, structure, and your preferences baked in, built once, then triggered with a word or a phrase. It knows the research steps, the format you want, the creators you draw from. You get something useful at the end without rebuilding the context every time.