SELECTOOLS: Multi-agent graphs, tool calling, RAG, 50 evaluators, PII redaction. All in one pip install.

#ai #python #agents #machinelearning

Releasing v0.20.1 of selectools, an open-source (Apache-2.0) Python framework for AI agent systems. Supports OpenAI, Anthropic, Gemini, and Ollama.

pip install selectools

The technical hook: how interrupts work after a human pause

LangGraph's interrupt() mechanism re-executes the entire node body on resume. This is by-design and falls out of LangGraph's checkpoint-replay model. The official guidance is to make pre-interrupt side effects idempotent, place expensive work after the interrupt() call, or split side effects into a separate downstream node. It works, but every node that needs human input has to be structured around the resume semantics. It's a leaky abstraction.

Selectools uses Python generators instead. The node yields an InterruptRequest. The graph resumes at the exact yield point via generator.send(). Expensive work runs exactly once, with no idempotency contortions required.

async def review_node(state):
    analysis = await expensive_llm_analysis(state.data["draft"])  # runs once
    decision = yield InterruptRequest(prompt="Approve?", payload=analysis)
    state.data["approved"] = (decision == "yes")  # resumes here

The v0.18.0 changelog documents the contrast directly: "Resumes at the exact yield point (LangGraph restarts the whole node)."

Multi-Agent Orchestration

AgentGraph is a directed graph executor for agent nodes. Routing is plain Python functions, no learned router, no DSL. This is deliberate: in production agent systems, you generally want deterministic control flow with LLMs doing the reasoning within nodes, not deciding the graph topology.

Key design choices:

ContextMode controls what history flows between nodes: LAST_MESSAGE (default), LAST_N, FULL, SUMMARY, CUSTOM. Prevents context explosion where downstream agents get drowned in irrelevant upstream conversation.
Parallel execution with MergePolicy (LAST_WINS, FIRST_WINS, APPEND) for fan-out/fan-in patterns.
Loop and stall detection via state hashing. The graph tracks whether state is actually changing.

SupervisorAgent

Four coordination strategies:

Strategy	Description	Best for
`plan_and_execute`	LLM generates a JSON plan, agents execute sequentially	Structured tasks
`round_robin`	Agents take turns, supervisor checks completion each round	Iterative refinement
`dynamic`	LLM router selects best agent per step	Heterogeneous tasks
`magentic`	Magentic-One: Task/Progress Ledgers + auto-replan	Autonomous research

The magentic strategy implements the Magentic-One pattern from Microsoft Research. ModelSplit lets you use expensive models for planning and cheap models for execution (70-90% cost reduction).

Built-in Eval Framework

50 evaluators ship with the library (no paid service required): 30 deterministic + 20 LLM-as-judge. Plus A/B pairwise comparison, regression detection, JUnit XML for CI, and HTML reports.

Engineering Rigor: Autonomous Bug Hunts + Pre-Launch Security Audit

The bug-hunting story is the part of this project I'm proudest of, and every claim below is in the public CHANGELOG.md.

v0.19.1 Ralph Loop Bug Hunt. Autonomous convergence system that runs 8 passes across all 7 modules until 3 consecutive clean passes. Result: ~90 bugs fixed and 254 new regression tests added (tests went from 2,664 to 2,918). Selected fixes from the changelog: _tool_executor.py ThreadPoolExecutor singleton for deadlock prevention, _provider_caller.py async observer events on LLM cache hits, _openai_compat.py tool call deltas flushed after stream end (Ollama compat), fallback.py mid-stream fallback corruption, bm25.py atomic snapshot under lock for concurrent clear/add safety, evals/llm_evaluators.py prompt injection fencing on user-controlled fields with <<<BEGIN_USER_CONTENT>>> delimiters.

v0.19.1 RAG Adversarial Bug Hunt. Eight edge-case fixes including ChromaVectorStore n_results clamping for empty collections, HybridSearcher None handling for vector_top_k/keyword_top_k, ContextualChunker prompt template validation, PDFLoader PdfReadError raised as ValueError for encrypted PDFs, BM25 top_k < 1 immediate validation.

Pre-Launch 5-Agent Parallel Security Audit (v0.20.0). 5 Claude subagents ran in parallel against the whole codebase, each focused on a different subsystem (concurrency, None guards, injection, path traversal, crash safety). 56 total findings, 9 critical security fixes shipped: score injection in eval extractors, ReDoS in custom regex, path traversal in ToolLoader, Anthropic multi-tool message merging, Redis session key collision, async output guardrails, Redis/Supabase error handling. Full audit is published in docs/SECURITY.md with every # nosec annotation reviewed individually.

Some of these patterns came from reading the LangChain, CrewAI, AutoGen, and LlamaIndex source while building the migration guides. The LangGraph HITL pattern (entire node restarts on resume) is the clearest example. Selectools uses Python generators instead, and the v0.18.0 changelog literally documents the contrast: "Resumes at the exact yield point (LangGraph restarts the whole node)."

Advanced Agent Patterns

Four high-level patterns ship in selectools.patterns:

Pattern	Description
`PlanAndExecuteAgent`	LLM generates a plan, executes subtasks sequentially
`ReflectiveAgent`	Self-critique loop with configurable quality threshold
`DebateAgent`	Two-agent adversarial debate + synthesis
`TeamLeadAgent`	Lead agent coordinates specialists with load balancing

Enterprise Hardening

Stability markers: @stable, @beta, @deprecated(since, replacement) decorators for public API signalling. Introspect via obj.__stability__.
Trace HTML viewer: trace_to_html(trace) renders any AgentTrace as a standalone waterfall HTML timeline. No JS framework, no external deps.
SBOM: sbom.json (CycloneDX 1.6) with all core production dependencies.
Compatibility matrix: Python 3.9-3.13 × provider SDK × optional deps in docs/COMPATIBILITY.md.

Serve & Deploy

selectools serve agent.yaml starts a Starlette ASGI server with a playground UI. Define agents in YAML, pick from 5 templates (customer_support, data_analyst, research_assistant, code_reviewer, rag_chatbot). Production additions: PostgresCheckpointStore, TraceStore (3 backends), compose() for tool chaining, retry() / cache_step() pipeline wrappers, type-safe step contracts, and streaming composition.

Tests + Coverage

4,612 tests (95% coverage) across Python 3.9-3.13, with real-API evaluations against OpenAI, Anthropic, and Gemini. Includes 28 Hypothesis property-based tests, 15 thread-safety smoke tests (10 threads × 20 ops with synchronized start), and 16 production integration simulations.

Also new in v0.20.x

An early visual agent graph builder at https://selectools.dev/builder/ (49KB self-contained HTML, exports to YAML or Python). Works but I'm still polishing edges, so pip install is the recommended path right now.

Links: