lfzds4399-cpu

Posted on May 16 • Originally published at github.com

Stop reinventing 'ask GPT-4 and Claude and a regex, then count the votes'

#ai #llm #architecture #python

If you've ever wired up "ask GPT-4 and Claude and a regex, then count the votes" inside a moderation pipeline, an agent router, or a code-review bot — you've written this code before:

gpt_ok = call_gpt(prompt)
claude_ok = call_claude(prompt)
regex_ok = not contains_blocklist(prompt)

if (gpt_ok and claude_ok) or (gpt_ok and regex_ok):
    return APPROVE
else:
    return REJECT

Then someone asks "what about weighting the senior model 2x?" and the if-tree doubles. Then "what if regex sees a hard policy violation — that should veto everything?" and you bolt on another branch. Then the audit log is a print() you forgot to wire to a file.

I built ai-council because I've written this glue four times across four projects (trading bot, moderation layer, code-review CI, agent router) and the fifth time I gave up and made it a primitive.

The whole API in 10 lines

from ai_council import Council, Vote, function_voter

def gpt4_voter(p, ctx, peers):   return Vote("gpt4",   approve=p["safe_gpt4"],   score=90)
def claude_voter(p, ctx, peers): return Vote("claude", approve=p["safe_claude"], score=85)
def regex_voter(p, ctx, peers):
    bad = "kill" in p["text"].lower()
    return Vote("regex", approve=not bad, score=0 if bad else 100, veto=bad)

council = Council([function_voter("gpt4", gpt4_voter),
                   function_voter("claude", claude_voter),
                   function_voter("regex", regex_voter)], threshold=2)
decision = council.deliberate({"text": "...", "safe_gpt4": True, "safe_claude": True})
print(decision.approved, decision.final_score)

That's it. No subclassing, no orchestrator process, no config files. Each Voter is just an object with a name, a weight, and a vote(proposal, context, peers) method. Inside, the voter can call an LLM, run a regex, hit a database — the framework only cares about the Vote it returns.

What the framework actually owns

Four things stay tangled inside every hand-rolled implementation:

What is being decided (proposal shape)
Who votes (the voter set)
How votes combine (threshold, weights, veto)
Where the audit log lives

ai-council separates them. You bring 1 and 2. The framework owns 3 and 4.

Without ai-council	With ai-council
80 lines of `if a and b and not c:` per project	1 `Council(...)` line + N small voter functions
Threshold logic re-invented (and bugged) every time	`threshold=2` (absolute) or `threshold=0.6` (ratio)
Veto / hard-policy bolted on with extra ifs	`Vote(..., veto=True)` from any voter blocks approval
Weighting senior voters means rewriting the aggregator	`function_voter("senior", fn, weight=2.0)`
Audit log is a `print()`	`JsonMeetingStore("meetings.jsonl")` persists every decision
One flaky LLM call crashes the pipeline	Exceptions captured as `approve=False, score=0` (or `strict=True` to re-raise)

A real-shape example: order approval

Pricing, reviews, stock — three independent signals, one decision:

def cheap_voter(p, ctx, peers):
    cheap = p["price_usd"] < 100
    return Vote("cheap", approve=cheap, score=80 if cheap else 20)

def reviewed_voter(p, ctx, peers):
    rated = p.get("rating", 0) >= 4.5
    return Vote("reviewed", approve=rated, score=85 if rated else 30)

def stocked_voter(p, ctx, peers):
    in_stock = p.get("stock", 0) > 0
    return Vote("stocked", approve=in_stock,
                score=90 if in_stock else 0,
                veto=not in_stock)  # out of stock → hard veto

council = Council(
    [function_voter("cheap", cheap_voter),
     function_voter("reviewed", reviewed_voter),
     function_voter("stocked", stocked_voter)],
    threshold=2,
)
decision = council.deliberate({"price_usd": 79, "rating": 4.7, "stock": 12})

Out-of-stock vetoes the order regardless of price and rating. 2-of-3 must approve otherwise. No nested ifs.

What I learned wiring LLMs as voters

One painful lesson from production: LLM scores are not alpha. In one of my projects (an automated trading harness) I made the council's average score gate the trade size. Two months later the audit showed high-confidence council picks were systematically worse than low-confidence ones. The model "knew" which trades were obvious, and obvious had no edge left.

So: in ai-council you can use LLM voters for veto — "this looks like a scam, block" — but I'd argue against using LLM score as a weighted input to a financial or safety-critical aggregate. The framework lets you do either; the choice is yours, but the bias is real.

When this is the wrong tool

You only have one voter. Just call it.
You need streaming partial votes (e.g., multi-round Socratic deliberation). ai-council is single-round; for streaming agent debate you want something like AutoGen or CrewAI.
You need a UI for human voters in real time. This is a Python library, not a service.

When this is the right tool

Moderation gates ("approve if 2 of {gpt4, claude, regex} agree, but any single one can veto").
Agent routing decisions ("which sub-agent handles this — let three rankers vote").
Code-review bots ("approve if linter, model, and security scanner all agree").
Trading or financial guard rails (use as veto layer, not scoring layer).
Anywhere "ask multiple independent signals, combine, log" is the shape.

Get started

pip install git+https://github.com/lfzds4399-cpu/ai-council.git

Requires Python 3.11+. Zero runtime dependencies (the Optional[LLM voter examples] pull in whatever LLM SDK you want). MIT.

Repo: github.com/lfzds4399-cpu/ai-council

If you've been writing this glue too, I'd love feedback on the API and edge cases that bit you in production. Issues / PRs welcome.

DEV Community