Bizbox

Posted on May 7

Bizbox Build Log: May 2–8, 2026

#bizbox #buildlog #buildinpublic #ai

Four releases, nine PRs merged, and one clear theme this week: making Bizbox agents more capable and trustworthy in multi-turn execution contexts.

Shipped this week

Company AI Builder (Phases 0–4)

#20 landed the full Company AI Builder feature — a curated set of mutation tools delivered via a proposal-approval flow. Phase 0 shipped read-only spike work (sessions, settings, OpenAI-compat interface, six read tools, UI). This update extends with Phases 1–4: proposal-store infrastructure, mutation tools behind proposals, and the approval surface for company owners.

Trade-off: Mutation tools are gated by human approval for now. We chose safety and trust before convenience. Future iterations will tune the guardrails based on real operator feedback.

Artifact validation and schema hardening

#27 introduced stricter validation for "artifact" work products — enforcing that artifact work products always have attachment-backed metadata and a createdByRunId. New schema validators, runtime type guards, and tighter integration mean artifact handling is now fail-fast instead of fail-silent.

Why it matters: Agents produce artifacts (deliverables, documents, code outputs). Loose validation meant broken artifact references could propagate through the system. This change catches those errors at the boundary.

Artifact persistence and UI updates for issue-backed runs

#25 adds support for collecting output artifacts from adapter executions (especially OpenClaw Gateway adapters), introduces new types and logic for artifact management, and exposes utilities for artifact-related work products.

Open challenge: Artifact handling is still evolving. We're learning what metadata needs to travel with artifacts, how to version them, and what the UI should surface. Feedback welcome.

Agent thread chat with optimistic UI

#21 adds a direct communication channel between operators and agents. Users can now message agents from the agent detail page, with optimistic UI updates for a snappier feel.

Decision: We chose optimistic updates over waiting for server confirmation. It makes the UI feel faster. The trade-off: rare cases where the server rejects a message won't be obvious until you refresh. We're watching for confusion signals.

Routine execution recovery logic

#22 fixes how Bizbox handles routine_execution issues in blocked state. Previously, the recovery logic treated blocked routines as failures and tried to resume them prematurely. Now, blocked is recognized as a healthy, parked wait state.

Why this was broken: Routines often block on human approval or child issue completion. The old logic didn't distinguish "blocked and waiting" from "blocked and stuck." This change codifies the difference.

Upstream merge and OpenTelemetry metrics

#16 merged upstream PaperClip changes from April 30, 2026 (assisted by Claude Sonnet 4.6).

#14 adds OpenTelemetry metrics, starting with bizbox.issues.human_comments_total — a signal for human intervention frequency.

Trade-off: We're starting with one metric to validate the integration pattern. More will follow once we've confirmed the collector setup works in production.

agentParams refactor and regression fix

#24 fixes a regression introduced in v0.0.6 where the OpenClaw gateway adapter changed the outbound agent request shape. The fix refactors agentParams handling and removes an unused function that was masking the real issue.

Lesson: Request shape changes in adapters are easy to miss when tests don't cover the boundary. We added a test to catch this pattern in the future.

Workflow cleanup

#23 removes the sync-upstream workflow. We're switching to manual upstream merges (with AI assistance) for now.

Why: Automated upstream sync introduced more conflicts than it saved in merge time. Manual merges with AI assistance give us control without the constant breakage.

Decisions

Mutation tools behind proposals: We're prioritizing trust and transparency over convenience. Operators see and approve changes before agents make them.
Artifact validation is fail-fast: Better to catch broken artifacts early than let them propagate.
Blocked routine state is healthy: Routines can wait. Not every blocked issue is a failure.
Manual upstream merges: Automation failed here. Human-in-the-loop merges with AI assistance work better for our repo.

Trade-offs

Proposal flow adds friction: Every mutation requires approval. This is intentional for now, but we know it slows down agents. Future work: smart approval defaults based on context and trust signals.
Optimistic UI updates hide rare server rejections: We chose speed over certainty. Watching for user confusion.
One OpenTelemetry metric to start: We're validating the pattern before adding dozens of metrics. Risk: we might miss important signals early.

Open challenges

Artifact versioning and metadata: What needs to travel with an artifact? How do we version it? What should the UI surface? Still figuring this out.
Approval UX for high-frequency mutations: Approving every change works for low-frequency operations. It won't scale to high-frequency agent work. Need smarter defaults.
Upstream merge strategy: Manual merges with AI assistance work for now, but they don't scale. We need a better long-term approach.

Releases

v0.0.9 — May 6, 2026
v0.0.8 — May 5, 2026
v0.0.7 — May 5, 2026
v0.0.6 — May 5, 2026

This Build Log is grounded in real repo activity. Every claim links to a PR, issue, release, or ADR. No internal-only context, no invented features, no marketing fluff.

Questions? Join the discussion on GitHub.

Top comments (2)

Keynition • May 7

The 'mutation tools behind proposals' decision is the right call at this stage — trust before convenience. The interesting challenge will be calibrating when to relax those guardrails. What signals are you watching for to know when operators are ready for more autonomous agent actions?

Keynition • May 7

Build logs are underrated for accountability. Writing what you shipped and what you didn't forces clarity on what actually matters. How are you deciding what goes into each week's log?