DEV Community

lfzds4399-cpu
lfzds4399-cpu

Posted on • Originally published at github.com

Stop reinventing 'ask GPT-4 and Claude and a regex, then count the votes'

If you've ever wired up "ask GPT-4 and Claude and a regex, then count the votes" inside a moderation pipeline, an agent router, or a code-review bot — you've written this code before:

gpt_ok = call_gpt(prompt)
claude_ok = call_claude(prompt)
regex_ok = not contains_blocklist(prompt)

if (gpt_ok and claude_ok) or (gpt_ok and regex_ok):
    return APPROVE
else:
    return REJECT
Enter fullscreen mode Exit fullscreen mode

Then someone asks "what about weighting the senior model 2x?" and the if-tree doubles. Then "what if regex sees a hard policy violation — that should veto everything?" and you bolt on another branch. Then the audit log is a print() you forgot to wire to a file.

I built ai-council because I've written this glue four times across four projects (trading bot, moderation layer, code-review CI, agent router) and the fifth time I gave up and made it a primitive.

The whole API in 10 lines

from ai_council import Council, Vote, function_voter

def gpt4_voter(p, ctx, peers):   return Vote("gpt4",   approve=p["safe_gpt4"],   score=90)
def claude_voter(p, ctx, peers): return Vote("claude", approve=p["safe_claude"], score=85)
def regex_voter(p, ctx, peers):
    bad = "kill" in p["text"].lower()
    return Vote("regex", approve=not bad, score=0 if bad else 100, veto=bad)

council = Council([function_voter("gpt4", gpt4_voter),
                   function_voter("claude", claude_voter),
                   function_voter("regex", regex_voter)], threshold=2)
decision = council.deliberate({"text": "...", "safe_gpt4": True, "safe_claude": True})
print(decision.approved, decision.final_score)
Enter fullscreen mode Exit fullscreen mode

That's it. No subclassing, no orchestrator process, no config files. Each Voter is just an object with a name, a weight, and a vote(proposal, context, peers) method. Inside, the voter can call an LLM, run a regex, hit a database — the framework only cares about the Vote it returns.

What the framework actually owns

Four things stay tangled inside every hand-rolled implementation:

  1. What is being decided (proposal shape)
  2. Who votes (the voter set)
  3. How votes combine (threshold, weights, veto)
  4. Where the audit log lives

ai-council separates them. You bring 1 and 2. The framework owns 3 and 4.

Without ai-council With ai-council
80 lines of if a and b and not c: per project 1 Council(...) line + N small voter functions
Threshold logic re-invented (and bugged) every time threshold=2 (absolute) or threshold=0.6 (ratio)
Veto / hard-policy bolted on with extra ifs Vote(..., veto=True) from any voter blocks approval
Weighting senior voters means rewriting the aggregator function_voter("senior", fn, weight=2.0)
Audit log is a print() JsonMeetingStore("meetings.jsonl") persists every decision
One flaky LLM call crashes the pipeline Exceptions captured as approve=False, score=0 (or strict=True to re-raise)

A real-shape example: order approval

Pricing, reviews, stock — three independent signals, one decision:

def cheap_voter(p, ctx, peers):
    cheap = p["price_usd"] < 100
    return Vote("cheap", approve=cheap, score=80 if cheap else 20)

def reviewed_voter(p, ctx, peers):
    rated = p.get("rating", 0) >= 4.5
    return Vote("reviewed", approve=rated, score=85 if rated else 30)

def stocked_voter(p, ctx, peers):
    in_stock = p.get("stock", 0) > 0
    return Vote("stocked", approve=in_stock,
                score=90 if in_stock else 0,
                veto=not in_stock)  # out of stock → hard veto

council = Council(
    [function_voter("cheap", cheap_voter),
     function_voter("reviewed", reviewed_voter),
     function_voter("stocked", stocked_voter)],
    threshold=2,
)
decision = council.deliberate({"price_usd": 79, "rating": 4.7, "stock": 12})
Enter fullscreen mode Exit fullscreen mode

Out-of-stock vetoes the order regardless of price and rating. 2-of-3 must approve otherwise. No nested ifs.

What I learned wiring LLMs as voters

One painful lesson from production: LLM scores are not alpha. In one of my projects (an automated trading harness) I made the council's average score gate the trade size. Two months later the audit showed high-confidence council picks were systematically worse than low-confidence ones. The model "knew" which trades were obvious, and obvious had no edge left.

So: in ai-council you can use LLM voters for veto — "this looks like a scam, block" — but I'd argue against using LLM score as a weighted input to a financial or safety-critical aggregate. The framework lets you do either; the choice is yours, but the bias is real.

When this is the wrong tool

  • You only have one voter. Just call it.
  • You need streaming partial votes (e.g., multi-round Socratic deliberation). ai-council is single-round; for streaming agent debate you want something like AutoGen or CrewAI.
  • You need a UI for human voters in real time. This is a Python library, not a service.

When this is the right tool

  • Moderation gates ("approve if 2 of {gpt4, claude, regex} agree, but any single one can veto").
  • Agent routing decisions ("which sub-agent handles this — let three rankers vote").
  • Code-review bots ("approve if linter, model, and security scanner all agree").
  • Trading or financial guard rails (use as veto layer, not scoring layer).
  • Anywhere "ask multiple independent signals, combine, log" is the shape.

Get started

pip install git+https://github.com/lfzds4399-cpu/ai-council.git
Enter fullscreen mode Exit fullscreen mode

Requires Python 3.11+. Zero runtime dependencies (the Optional[LLM voter examples] pull in whatever LLM SDK you want). MIT.

Repo: github.com/lfzds4399-cpu/ai-council

If you've been writing this glue too, I'd love feedback on the API and edge cases that bit you in production. Issues / PRs welcome.

Top comments (0)