lfzds4399-cpu

Posted on May 16 • Originally published at github.com

Automated domain investing with hard budget walls and an AI council that has to agree before any money moves

#ai #python #opensource #automation

Most "domain finders" on GitHub stop at stage one — they spit out a list of names someone might want to buy. The real work is everything after that: deciding which ones are worth money, not buying the obvious trademark traps, not blowing the monthly budget on a hype day, and actually settling a sale.

I built domain-harness because I wanted to test a small thesis — "AI-generated brandable domains + automated valuation can clear $X / month" — without giving a script direct keys to a registrar and praying.

This post is about the architecture, not the thesis. Specifically: how do you build something that touches money and registrar APIs without losing money the moment the LLM hallucinates?

The pipeline

discover  ─►  value  ─►  acquire  ─►  list  ─►  negotiate  ─►  settle
   │           │           │           │           │
   AI gen +    local +     multi-     Dan /        AI reply
   expired     AI Council  registrar   Afternic    + counter
   feeds       (Claude     fallback   Sedo
               + DeepSeek)

Six stages, all gateable individually, all dry_run by default. The thing that makes it safe is what's between the stages, not the stages themselves.

Defense layer 1: two-tier valuation

Calling an LLM to value every candidate domain is how you waste $50 of tokens on garbage by 11pm. Local heuristic gate first:

Length penalty over 12 chars
Hard-cap punctuation / digit counts
Dictionary-word boost
Brandable phonotactics scoring (CVC patterns score better than xqkz)
Hard reject on hyphens-and-numbers cocktails

Only candidates above min_local_score get billed against the AI Council. In practice this filters about 90% of the discovery output, and the surviving 10% is the only thing worth a token spend.

Defense layer 2: AI Council, not AI Oracle

Every survivor goes to two independent providers — Claude and DeepSeek — with the same prompt:

council = Council(
    [function_voter("claude",   claude_valuator,   weight=1.0),
     function_voter("deepseek", deepseek_valuator, weight=1.0)],
    threshold=2,  # both must clear per-provider threshold
)

Both must clear their threshold before the harness will spend a cent. One model alone is a hallucination machine; two independent models with the same hallucination at the same time is much rarer (and when it does happen, your loss is capped by layer 3 anyway).

Caveat I learned the hard way in a different project: LLM scores are not alpha. In a trading harness I weighted council confidence directly into position size and the high-confidence picks systematically lost money. So in domain-harness the council is a gate, not a size multiplier. It can say "yes / no, the domain is worth it" — it cannot say "buy 5x more of this one."

Defense layer 3: hard budget walls

This is the part that lets me sleep:

daily_cap_usd: $50
monthly_cap_usd: $1000
per_domain_cap_usd: $15

These aren't soft warnings. Every acquisition.buy() call goes through budget_guard.check() and an over-cap call raises BudgetExceeded. The buyer literally cannot exceed them. If any single wall trips, the discovery loop stops and the daemon goes idle until a human re-enables it.

I keep a state file data/budget_state.json with today's spend and this month's spend. The CI smoke test verifies every guard:

[5] 预算守卫硬刹车   ✓ 单域名超过返回拦截（$200 > $15）
                  ✓ 正常金额通过（$10）

Defense layer 4: trademark + WHOIS

A surprising fraction of "great brandable domain" suggestions from LLMs are subtle trademark violations. mygoogle.com is obvious; paypall.io is less obvious; openai-clone.ai is comically obvious but the council will still rank it because it scores "ai".

So:

Static blocklist (Fortune 500 + Anthropic / OpenAI / Stripe / etc).
User-extensible blocklist.
Substring match against the blocklist (so mygoogle.com and openai-clone.ai fail).
WHOIS check before any buy attempt (some "available" feeds lie).

(Edit-distance / typo-variant matching is on the roadmap — goggle.com currently slips past unless you add it explicitly.)

[3] 商标黑名单     ✓ mygoogle.com 拦截
                  ✓ openai-clone.ai 拦截
                  ✓ paywall.com 通过（非商标）
[4] WHOIS 检查    ✓ 注册过的域名识别（google.com）

Defense layer 5: multi-registrar fallback

When the harness decides to buy, it doesn't fail-stop on a single registrar's API hiccup. Priority chain: Porkbun → Cloudflare. If Porkbun returns "unavailable" or 5xx, Cloudflare tries. If both fail, the candidate goes to a re-try queue, not a buy.

Each registrar is a thin adapter — same register(domain, contact) interface, different SDK underneath. The adapter shape is deliberately simple so adding Namecheap or GoDaddy later is a single new file, not a refactor.

Defense layer 6: dry_run by default, opt-in spend

The default mode in .env.example is MODE=dry_run. Every stage knows the mode and short-circuits actual money calls when it's dry_run. The full pipeline runs end-to-end in dry_run so I can backtest discovery → valuation → would-have-bought against a real registrar SDK without paying. To flip to live: change one line, re-read the smoke test results, then change it.

CI runs the 25-case smoke on every push:

汇总：25 PASS / 0 FAIL/ERROR / 共 25

That suite covers every gate — discovery → valuation → trademark → WHOIS → budget guard → duplicate guard → registration → portfolio write → listing → AI negotiation → settle. Failing any one of them blocks merge.

What I'd build differently if starting over

Faster registrar latency profiling. Currently the fallback is static priority. A version 2 should A/B by recent success rate and latency.
Per-vertical AI prompts. The council right now is generic; "this is a fintech domain" vs "this is a SaaS domain" probably wants different valuation framing.
Negotiation memory. The reply bot is stateless per inquiry; same buyer asking three times gets three independent counters.

When this is the wrong tool

You want to register vanity domains for personal projects. This is overkill.
You're already buying via a managed broker. They handle the registrar / settle layers.
You don't want any LLM cost. The council is the heart of the value layer.

When this is the right tool

You want a budget-walled testbed for a domain investing thesis.
You're tired of "I clicked buy and it took $50 because the dropdown defaulted to 5 years."
You want a CI-gated pipeline where every buy decision is auditable and reversible (well — except for the buy, but the decision is).

Get started

git clone https://github.com/lfzds4399-cpu/domain-harness.git
cd domain-harness
pip install -r requirements.txt
cp .env.example .env       # fill in only what you use
python tests/e2e_smoke.py   # 25 PASS in dry_run, no spend

Repo: github.com/lfzds4399-cpu/domain-harness. MIT.

If you're building anything that touches money + LLMs, I'd love to compare notes on which guards you ended up needing vs which ones you removed because they fired too often. Issues / PRs welcome.

DEV Community