DEV Community

Cover image for Building a 16-agent Socratic seminar in Tauri 2: bidding, paired observers, and a 0600 vault
Richard
Richard

Posted on

Building a 16-agent Socratic seminar in Tauri 2: bidding, paired observers, and a 0600 vault

Repo: https://github.com/richer-richard/socratic-council

Stack: Tauri 2 (Rust + React/TypeScript), pnpm monorepo, Apache-2.0

Latest release: v2.0.0

The premise

If you ask one frontier model a hard question, you get a confident answer. If you ask sixteen, you get an argument.

Socratic Council is a desktop app that runs a structured seminar between sixteen LLM agents drawn from eight providers — OpenAI, Anthropic, Google, DeepSeek, Kimi, Qwen, MiniMax, and Z.AI. Eight named debaters speak in public. Each is shadowed by a paired advisor on the same provider that can pass private notes whenever they have something worth saying. As the debate runs, the app builds a live argument map (9 node kinds, 10 edge relations), flags fact-check candidates, tracks pairwise conflicts, and tallies per-message cost. Source on GitHub, Apache-2.0, source-only distribution.

This is a "how I built it" post focused on four parts of the codebase that ended up more interesting than I expected:

  1. The bidding protocol — why every turn is scored, not round-robin.
  2. The provider abstraction — eight provider clients, one allowlisted egress.
  3. The vault — why it's a file with 0600 perms, and not the macOS keychain.
  4. The argument map — a 9-kind / 10-relation schema with multi-source merge.

Code excerpts below come straight from the repo at v2.0.0.


1. Bidding, not round-robin

The default pattern when you glue agents together is round-robin: each agent takes a turn, in order, until you stop. It's simple. It's also why most multi-agent demos feel like reading a CS class group project — every voice gets equal floor time regardless of whether they have anything useful to say at the moment.

Socratic Council picks the next speaker by relevance bidding. Every turn, all eight council members are scored 0–100 against the topic and the recent transcript tail, and the highest score wins the turn. The scorer is a single fast call to a small model (Gemini 3.1 Flash by default; provider-injected, so it can be any of the eight):

// packages/core/src/semanticBidding.ts (excerpt)
const SYSTEM_PROMPT = `You are a neutral scheduler for a multi-agent debate.

Given a topic, a short tail of the recent conversation, and a list of debating agents (with their specialties), score each agent's relevance to speaking NEXT on a 0-100 scale.

Scoring rubric:
- 80-100: the agent's specialty is squarely at stake RIGHT NOW
- 50-79:  relevant but not the most pressing voice
- 20-49:  tangentially connected
- 0-19:   essentially unrelated to the current moment

Respond with exactly one JSON object on a single line, no prose, no code fences.
Shape: {"scores":{"agentId1":<int 0-100>, "agentId2":<int 0-100>, ...}}`;
Enter fullscreen mode Exit fullscreen mode

The first version (bidding.ts) used a hand-curated keyword list per agent and a substring match against the topic. That worked for prototype seminars about technology but missed everything when the topic was phrased without the keywords. The semantic-bidding pass replaces it with a single LLM judgment per turn — cheap (one Flash call) and dramatically more on-topic. The keyword score is kept as a fallback for transport failures.

Crucially, this isn't the only protocol layer. While the chosen debater is generating in the public channel, all eight advisors evaluate the same transcript in parallel and decide whether to slip a private note to their paired debater. Notes go into the partner's next prompt under a section visible only to them — a literal index card in a fishbowl seminar.

Two protocol details that took iteration:

  • Advisors share a provider with their debater. George/Greta are both OpenAI. Cathy/Clara are both Anthropic. The pairing is on role, not provider, so the asymmetry isn't "different model catches different errors" — it's "same model, different mandate." The advisor's job description is read for what your partner is missing, and that prompt asymmetry produces useful notes even from the same weights.
  • Advisors can stay quiet. The first version made every advisor emit something every turn, which produced a lot of "good point, partner" filler. Letting an advisor no-op when they have nothing real to add cleaned the side-channel up dramatically.

The current full roster, from the README:

Debater Advisor Provider Default model
George Greta OpenAI GPT-5.5
Cathy Clara Anthropic Claude Opus 4.7
Grace Gaia Google Gemini 3.1 Pro
Douglas Dara DeepSeek DeepSeek V4 Pro
Kate Kira Kimi Kimi K2.6
Quinn Quincy Qwen Qwen 3.6 Max
Mary Mila MiniMax MiniMax M2.7 Highspeed
Zara Zoe Z.AI GLM-5.1

Plus a Moderator — a system-role voice that opens the session, prompts the end-of-session ballot, and writes the synthesis. It defaults to Google but falls through the configured providers in order if Google isn't set up.


2. Eight provider clients, one allowlisted egress

The tempting design for a multi-provider app is to write one client against an OpenAI-compatible interface and call it a day. I tried that. It collapses the moment you need any of:

  • Anthropic's prompt-caching beta header
  • OpenAI's Responses API (different shape from Chat Completions: input not messages, instructions not system, max_output_tokens, reasoning.effort for o-series and GPT-5.x)
  • Google's x-goog-api-key header instead of Authorization: Bearer
  • MiniMax's Anthropic-compatible endpoint
  • Per-provider streaming envelope quirks

The codebase has a thin BaseProvider interface and a per-provider client behind it. Headers are switched on provider type:

// packages/sdk/src/providers/base.ts (excerpt)
case "anthropic":
  // Fix 4.5: include the prompt-caching beta header so `cacheControl: "ephemeral"`
  // on user messages actually engages the cache. Without this header
  // Anthropic ignores cache_control and bills every request as a full
  // re-prompt (5-10× cost overhead on long debates).
  return {
    ...baseHeaders,
    "x-api-key": apiKey,
    "anthropic-version": "2023-06-01",
    "anthropic-beta": "prompt-caching-2024-07-31",
  };
Enter fullscreen mode Exit fullscreen mode

The thing I am proud of here is that every provider client takes a custom baseUrl:

// packages/sdk/src/providers/openai.ts
constructor(apiKey: string, options?: { baseUrl?: string; transport?: Transport }) {
  // ...
  this.endpoint = resolveEndpoint(options?.baseUrl, "/v1/responses", API_ENDPOINTS.openai);
}
Enter fullscreen mode Exit fullscreen mode

And the credentials store models it explicitly:

// packages/shared/src/types/index.ts (excerpt)
openai?: { apiKey: string; baseUrl?: string };
anthropic?: { apiKey: string; baseUrl?: string };
google?: { apiKey: string; baseUrl?: string };
// ... and so on for every provider
Enter fullscreen mode Exit fullscreen mode

This pairs with a deliberate exception in the Tauri-side network allowlist:

// apps/desktop/src-tauri/src/allowlist.rs (excerpt)
const LOOPBACK_HOSTS: &[&str] = &["127.0.0.1", "localhost", "::1", "[::1]"];

if loopback {
    if scheme != "http" && scheme != "https" {
        return Err(format!("Unsupported scheme '{}' for loopback", scheme));
    }
} else {
    if scheme != "https" {
        return Err(format!("Scheme '{}' not allowed (https:// required)", scheme));
    }
    if !is_allowlisted_provider(&host) {
        return Err(format!("Host '{}' is not on the IPC allowlist.", host));
    }
}
Enter fullscreen mode Exit fullscreen mode

External hosts must be on a hardcoded provider allowlist and must be https://. Loopback is allowed http://. The unit tests spell out the intent:

#[test]
fn loopback_http_is_accepted() {
    assert!(validate_outbound_url("http://127.0.0.1:11434/api/chat").is_ok());
    assert!(validate_outbound_url("http://localhost:11434/api/chat").is_ok());
}
Enter fullscreen mode Exit fullscreen mode

Port 11434 is Ollama's default. Pointing any provider seat at http://localhost:11434/v1 (or vLLM, or LM Studio, or llama-server from llama.cpp) routes through the same Tauri command path the cloud calls use. There's also a 4 MB outbound body cap and a 600-request-per-60-seconds process-wide token bucket so a runaway loop can't melt anything.

You can run a fully local seminar — every seat against a localhost endpoint — or mix local and frontier (DeepSeek-V3 on Ollama vs. Claude in the cloud) for an honest comparison. The seminar protocol surfaces where they actually disagree.


3. The vault: 0600 file, not the keychain

Every API key, every session transcript, every export goes through a 32-byte data-encryption key (DEK). The DEK lives in the platform's app-data directory:

  • macOS: ~/Library/Application Support/com.socratic-council.desktop/vault.key
  • Linux: ~/.local/share/.../vault.key
  • Windows: under %APPDATA%

0600 permissions on Unix; user-only ACL on Windows by default. The frontend reads the DEK once at boot via a Tauri command and uses it for the XChaCha20-Poly1305 envelope on every encrypted blob in localStorage.

People who've shipped desktop apps will ask the obvious question: why not the OS keychain? Especially on macOS, where Security.framework would seem like the obvious move. The honest answer is in the file's own header comment:

// apps/desktop/src-tauri/src/vault_file.rs (excerpt)
//! Rationale: macOS keychain access prompts the user for their login password
//! on every invocation when the binary has only an ad-hoc code signature,
//! because the keychain ACL system binds to a stable code signing identity
//! that ad-hoc simply doesn't provide. Result: ~15 password prompts per
//! launch as the frontend fetched one key per provider + the vault DEK.
//!
//! This module stores the 32-byte DEK in the platform's app-data directory
//! with user-only (0600 on unix) permissions.
Enter fullscreen mode Exit fullscreen mode

A source-only OSS desktop app on macOS without a paid Apple Developer ID gets ad-hoc code-signed at build time. Ad-hoc has no stable identity, so every keychain SecItemCopyMatching re-prompts. With one DEK fetch + one per-provider key fetch, that's ~15 password prompts per launch, every launch. No real user is going to put up with that. Filesystem perms on a per-user app-data file is the next stop down, and it's where we landed.

The DEK lifecycle is small but careful. Three things in particular:

Quarantine, not overwrite. If the file exists but can't be read (corrupt, wrong size, FS error), it gets moved aside to vault.key.corrupt-<unix-ts> instead of silently overwritten. The user's previously-encrypted localStorage blobs are unrecoverable with the new DEK regardless, but at least a backup tool has a chance to repair the original later, and the frontend gets a Quarantined status flag so it can warn the user explicitly:

#[derive(Serialize)]
#[serde(rename_all = "snake_case")]
pub enum VaultDekStatus {
    /// Successfully read the existing DEK file.
    Existing,
    /// No DEK file existed; a fresh one was created (typical first launch).
    FreshlyCreated,
    /// A DEK file existed but couldn't be read; quarantined and replaced.
    /// Encrypted blobs in localStorage from the prior DEK will fail to
    /// decrypt — surface a warning to the user.
    Quarantined,
}
Enter fullscreen mode Exit fullscreen mode

Atomic create. First-time creation uses OpenOptions::create_new on a sibling tempfile, then fs::rename to the real path. Two concurrent callers can't both write a different DEK in the same race. A power loss mid-write doesn't leave a half-written 16-byte file masquerading as a 32-byte DEK.

Reset requires a sentinel string. The destructive command — vault_reset — refuses to run unless the caller passes a literal confirmation:

const VAULT_RESET_CONFIRMATION: &str = "DELETE-ALL-LOCAL-DATA";

#[tauri::command]
pub fn vault_reset(app: tauri::AppHandle, confirmation: String) -> Result<bool, String> {
    if confirmation != VAULT_RESET_CONFIRMATION {
        return Err(format!(
            "vault_reset requires confirmation parameter equal to '{}'",
            VAULT_RESET_CONFIRMATION
        ));
    }
    // ... delete vault.key
}
Enter fullscreen mode Exit fullscreen mode

Defense-in-depth: a future UI button accidentally wired to vault_reset without a typed user confirmation hits a 400-equivalent error rather than silently destroying everything. The frontend confirms with the user and passes the sentinel; if either step is wrong the call refuses.


4. The argument map: 9 nodes, 10 relations, multi-source merge

If you render a 16-agent debate as a chat log, it reads as noise within three turns. The argument map exists because the artifact of a seminar should be the structure of the argument, not the transcript.

The map is a directed graph maintained incrementally — after each council message, an extractor (Gemini 3.x by default) returns structured fragments, and a merger appends or merges them into the live graph. Schema v2 has nine node kinds and ten edge relations:

// packages/core/src/argmap.ts (excerpt)
export type ArgNodeKind =
  | "claim" | "premise" | "evidence" | "rebuttal"
  | "concession" | "question" | "assumption"
  | "definition" | "proposal";

export type ArgEdgeRelation =
  | "supports" | "rebuts" | "concedes" | "restates"
  | "refines" | "agrees" | "contradicts"
  | "depends-on" | "answers" | "addresses";
Enter fullscreen mode Exit fullscreen mode

Two design choices worth flagging:

Multi-source provenance. When two different debaters assert effectively the same claim with different wording, the merger doesn't create two unrelated nodes — it merges them into one node carrying two ArgNodeSource entries (with verbatim quotes and char offsets) and adds the new wording to the node's aliases. The shape:

export interface ArgNodeSource {
  messageId: string;
  agentId: string;
  timestamp: number;
  /** Char offsets into the source message content (optional). */
  span?: { start: number; end: number };
  /** Verbatim quote from the source message (optional). */
  quote?: string;
}
Enter fullscreen mode Exit fullscreen mode

Same-claim merge is the bit that turns the graph from "transcript with extra steps" into a real consolidated artifact. The merging heuristic uses bag-of-words cosine — sufficient to catch "same claim, different wording" without pulling in an embedding model.

Stance polarity, status, verification. Each node carries a stance along a configured axis (e.g. "central planning ↔ market", polarity in [-1, +1]), a status (active / withdrawn / superseded), and a verification verdict from the fact-check pipeline. Edges carry a confidence (0..1) and a one-line rationale. None of this is cosmetic — the consolidation pass and the conflict graph both read these dimensions to decide what to highlight.

The argument map exports as JSON, Mermaid, SVG, or PNG, independent of the full session export.


What didn't fit

A few more bits worth pointing at if you fork the repo: the agent tool-calling DSL@tool(oracle.search, {"query":"..."}), @quote(george, "the exact line"), @react(cathy, agree) — parsed inline and dispatched by the orchestrator; the .scbundle export format, a tarball of session.json, transcript.jsonl, argmap.json, synthesis.md, and costs.csv that any other install can re-import without a cloud handoff; and the fairness module that caps per-agent talk time so a single high-bidding seat can't monopolize. Read packages/core/ and apps/desktop/src-tauri/src/ for the rest.


What I'd do differently

A few honest notes for anyone building something in this neighborhood:

  • The protocol matters more than the models. The biggest quality jump in v2 wasn't swapping in more expensive providers — it was the relevance-bidding pass plus the paired-advisor side channel. Round-robin + symmetric agents was where the prototype was, and it was much worse.
  • Provider-specific clients beat "OpenAI-compatible everything." The compat shim looked attractive until I ran into Anthropic's prompt-caching, OpenAI's Responses API, and Google's auth. A thin per-provider client is more code but pays itself back in correctness.
  • Native Tauri 2 over Electron is not even close in this category. No bundled Chromium, native Rust crypto, smaller binaries, faster cold start. The webview-quirk surface is real but manageable for a desktop-only app.
  • 0600 is a fine answer to "where do I keep secrets." I burned days trying to make the macOS keychain not prompt 15 times per launch on an ad-hoc-signed binary before accepting that filesystem perms + a quarantine recovery path was strictly better for this user.

Things I want to build next:

  • A first-class "local mode" toggle that wires every seat to a single local Ollama / vLLM endpoint with one click, instead of asking the user to set eight base URLs in Settings.
  • A CRDT-backed argument map so two people on the same LAN can co-watch a seminar with synced annotations.
  • An MCP server that exposes a session as a tool for agents elsewhere — "ask the seminar what it concluded about X."

Repo + license

Source: richer-richard/socratic-council — Apache-2.0, v2.0.0 just shipped. macOS quick install via install.sh; Windows/Linux via pnpm install && pnpm tauri:build.

If you have feedback on the bidding protocol, the provider abstraction, or the vault design — I'd much rather hear it now, while v2 is fresh, than after I've cemented the wrong decisions in v3. Issues open, PRs welcome.

— Richard

Top comments (0)