If you've ever wired up "ask GPT-4 and Claude and a regex, then count the votes" inside a moderation pipeline, an agent router, or a code-review bot — you've written this code before:
gpt_ok = call_gpt(prompt)
claude_ok = call_claude(prompt)
regex_ok = not contains_blocklist(prompt)
if (gpt_ok and claude_ok) or (gpt_ok and regex_ok):
return APPROVE
else:
return REJECT
Then someone asks "what about weighting the senior model 2x?" and the if-tree doubles. Then "what if regex sees a hard policy violation — that should veto everything?" and you bolt on another branch. Then the audit log is a print() you forgot to wire to a file.
I built ai-council because I've written this glue four times across four projects (trading bot, moderation layer, code-review CI, agent router) and the fifth time I gave up and made it a primitive.
The whole API in 10 lines
from ai_council import Council, Vote, function_voter
def gpt4_voter(p, ctx, peers): return Vote("gpt4", approve=p["safe_gpt4"], score=90)
def claude_voter(p, ctx, peers): return Vote("claude", approve=p["safe_claude"], score=85)
def regex_voter(p, ctx, peers):
bad = "kill" in p["text"].lower()
return Vote("regex", approve=not bad, score=0 if bad else 100, veto=bad)
council = Council([function_voter("gpt4", gpt4_voter),
function_voter("claude", claude_voter),
function_voter("regex", regex_voter)], threshold=2)
decision = council.deliberate({"text": "...", "safe_gpt4": True, "safe_claude": True})
print(decision.approved, decision.final_score)
That's it. No subclassing, no orchestrator process, no config files. Each Voter is just an object with a name, a weight, and a vote(proposal, context, peers) method. Inside, the voter can call an LLM, run a regex, hit a database — the framework only cares about the Vote it returns.
What the framework actually owns
Four things stay tangled inside every hand-rolled implementation:
- What is being decided (proposal shape)
- Who votes (the voter set)
- How votes combine (threshold, weights, veto)
- Where the audit log lives
ai-council separates them. You bring 1 and 2. The framework owns 3 and 4.
| Without ai-council | With ai-council |
|---|---|
80 lines of if a and b and not c: per project |
1 Council(...) line + N small voter functions |
| Threshold logic re-invented (and bugged) every time |
threshold=2 (absolute) or threshold=0.6 (ratio) |
| Veto / hard-policy bolted on with extra ifs |
Vote(..., veto=True) from any voter blocks approval |
| Weighting senior voters means rewriting the aggregator | function_voter("senior", fn, weight=2.0) |
Audit log is a print()
|
JsonMeetingStore("meetings.jsonl") persists every decision |
| One flaky LLM call crashes the pipeline | Exceptions captured as approve=False, score=0 (or strict=True to re-raise) |
A real-shape example: order approval
Pricing, reviews, stock — three independent signals, one decision:
def cheap_voter(p, ctx, peers):
cheap = p["price_usd"] < 100
return Vote("cheap", approve=cheap, score=80 if cheap else 20)
def reviewed_voter(p, ctx, peers):
rated = p.get("rating", 0) >= 4.5
return Vote("reviewed", approve=rated, score=85 if rated else 30)
def stocked_voter(p, ctx, peers):
in_stock = p.get("stock", 0) > 0
return Vote("stocked", approve=in_stock,
score=90 if in_stock else 0,
veto=not in_stock) # out of stock → hard veto
council = Council(
[function_voter("cheap", cheap_voter),
function_voter("reviewed", reviewed_voter),
function_voter("stocked", stocked_voter)],
threshold=2,
)
decision = council.deliberate({"price_usd": 79, "rating": 4.7, "stock": 12})
Out-of-stock vetoes the order regardless of price and rating. 2-of-3 must approve otherwise. No nested ifs.
What I learned wiring LLMs as voters
One painful lesson from production: LLM scores are not alpha. In one of my projects (an automated trading harness) I made the council's average score gate the trade size. Two months later the audit showed high-confidence council picks were systematically worse than low-confidence ones. The model "knew" which trades were obvious, and obvious had no edge left.
So: in ai-council you can use LLM voters for veto — "this looks like a scam, block" — but I'd argue against using LLM score as a weighted input to a financial or safety-critical aggregate. The framework lets you do either; the choice is yours, but the bias is real.
When this is the wrong tool
- You only have one voter. Just call it.
- You need streaming partial votes (e.g., multi-round Socratic deliberation). ai-council is single-round; for streaming agent debate you want something like AutoGen or CrewAI.
- You need a UI for human voters in real time. This is a Python library, not a service.
When this is the right tool
- Moderation gates ("approve if 2 of {gpt4, claude, regex} agree, but any single one can veto").
- Agent routing decisions ("which sub-agent handles this — let three rankers vote").
- Code-review bots ("approve if linter, model, and security scanner all agree").
- Trading or financial guard rails (use as veto layer, not scoring layer).
- Anywhere "ask multiple independent signals, combine, log" is the shape.
Get started
pip install git+https://github.com/lfzds4399-cpu/ai-council.git
Requires Python 3.11+. Zero runtime dependencies (the Optional[LLM voter examples] pull in whatever LLM SDK you want). MIT.
Repo: github.com/lfzds4399-cpu/ai-council
If you've been writing this glue too, I'd love feedback on the API and edge cases that bit you in production. Issues / PRs welcome.
Top comments (0)