Bizbox for Citro

Posted on Jun 1 • Originally published at github.com

Deep Dive: Building the Awaiting-Human Bridge — From ClickUp Polling to a Provider-Agnostic Approval Layer

#agents #architecture #automation #systemdesign

Deep Dive: Building the Awaiting-Human Bridge — From ClickUp Polling to a Provider-Agnostic Approval Layer

June 2026

Background: The Status That Needed a Bridge

In May 2026, we shipped the awaiting_human status — a dedicated state for issues parked on a human decision, distinct from blocked (which is for dependency-blocked work that agents can help unstick). If you missed that post, the short version: when an agent creates a request_confirmation or ask_user_questions interaction on an in_progress issue, Bizbox auto-parks the issue to awaiting_human and prevents agents from acting on it until a human responds.

That was the right design. But it immediately raised a practical question:

How does the human actually respond?

In a zero-human company, the "board" isn't sitting in a Bizbox tab watching for notifications. They're in ClickUp, Slack, or wherever their team already lives. If the approval signal has to come through a Bizbox UI action, you've just added a context-switch tax to every human decision point — and in a system designed to reduce human toil, that's a problem.

The answer we built over the second half of May 2026 is the awaiting-human bridge: a pluggable layer that lets Bizbox send approval requests to external channels and poll for responses, closing the loop without requiring the human to leave their existing workflow.

This post walks through how we built it, the design decisions we made along the way, and where it's headed.

The Problem in Detail

When an issue enters awaiting_human, Bizbox needs to:

Notify the right human in the right channel that a decision is needed.
Wait for a response — which might be a reply, a reaction, or a structured approval action.
Interpret that response correctly: is a thumbs-up an approval? Is a plain text reply a rejection, a question, or just an acknowledgement?
Resume the agent workflow once the decision is made.

Each of these steps has failure modes. Notifications can fail to deliver. Responses can be ambiguous. The same external event can arrive multiple times (webhooks are not exactly-once). And the human might respond days later, after the system has restarted or the bridge state has been garbage-collected.

We needed a design that was reliable, idempotent, and extensible — because ClickUp is the first channel, but it won't be the last.

The Architecture: Four Layers

The bridge architecture that emerged across PRs #42, #52, #56, #65, #68, #70, #74, and #76 has four distinct layers:

1. The Outbox (Reliable Notification Delivery)

PR #56 introduced the awaiting_human_notification_outbox table — a durable queue for outbound notifications. Instead of firing-and-forgetting a ClickUp message, Bizbox writes a row to the outbox first, then a background processor picks it up, delivers it, and records the resulting ClickUp message ID.

This gives us:

Retryability: If the ClickUp API is down, the notification retries on the next heartbeat cycle.
Deduplication: The outbox row has a dedupeKey tied to the issue and interaction, so we never send the same notification twice even if the trigger fires multiple times.
Auditability: Every notification attempt, success, and failure is logged with timestamps and error details.

The outbox pattern is a classic reliability primitive, but it's worth calling out explicitly: this is the foundation that makes everything else safe to build on.

2. The Approval Polling Loop

PR #42 added the first version of ClickUp approval polling. On each heartbeat cycle, for every awaiting_human issue with a pending request_confirmation interaction, Bizbox polls the ClickUp message thread for replies and reactions.

The initial design was straightforward: if a configured positive reply or reaction is detected, resolve the interaction as approved and wake the assignee agent.

PR #52 cleaned up two reliability issues that surfaced early in production. First, ClickUp comment IDs can arrive as numbers rather than strings — the adapter was only accepting strings, which caused valid top-level comments and replies to be silently dropped during import. PR #52 aligned the adapter with the existing scalar-ID coercion pattern used elsewhere in the codebase. Second, a bridge status value (agent_replied) existed in types and UI filtering but had no runtime transition path — it was unreachable dead code. Removing it simplified the bridge state machine and eliminated a source of confusion when reading bridge lifecycle logs.

PR #68 hardened the message ID resolution. The polling loop needs the ClickUp message ID to know which thread to poll — but there's a race condition: the activity log entry (which stores the message ID) might be written before the outbox processor has actually delivered the notification. PR #68 added a fallback: if the message ID isn't in the activity log, look it up from the outbox table using the dedupeKey. This eliminated a class of "polling the wrong thread" bugs that showed up in production.

3. Rejection Semantics

PR #70 tackled a question that sounds simple but has real behavioral implications: what does a non-approval reply mean?

The initial implementation treated non-approval replies as noise — they were ignored, and the bridge kept waiting. That turned out to be wrong in practice. When a human replies "no, don't do this" or "I need more context first," ignoring that reply leaves the issue parked indefinitely and the agent unaware that the human has weighed in.

The new behavior:

Explicit negative reactions (e.g., thumbsdown) → immediate rejection, no comment forwarded.
Non-approval text replies → treated as rejection, forwarded as a comment on the issue so the agent has context when it wakes up.
Approval replies/reactions → resolve the interaction as approved, wake the agent.

This makes the bridge a two-way channel, not just a one-way approval gate. The human's response — whatever it is — gets back to the agent.

4. The Provider-Agnostic Bridge Core

PR #65 laid the configuration groundwork for the bridge. It introduced dedicated environment variables (CLICKUP_AWAITING_HUMAN_CHANNEL_ID and CLICKUP_AWAITING_HUMAN_CHANNEL_NAME) for specifying the ClickUp approval channel, with fallback to legacy engineering variables and a default of bizbox-feed when no channel name is provided. It also improved the human-facing notification message content — making the approval request clearer and more actionable — and updated .env.example documentation. This was the first step toward making the bridge channel-configurable rather than hardcoded.

PR #74 and PR #76 are where the architecture shifted from "ClickUp integration" to "bridge infrastructure."

PR #74 introduced the company_awaiting_human_settings table and a configuration schema that lets each company specify its own bridge provider and routing. The first supported provider is ClickUp, with workspace and channel routing. But the schema is designed to accommodate future providers without changes to the core.

PR #76 finalized the bridge lifecycle semantics:

Interaction-scoped dedupe. Inbound events now deduplicate on (interaction_id, external_event_id) rather than just external_event_id. This means the same external event (e.g., a webhook delivered twice) is safely ignored, but the same event arriving for a different interaction on the same issue is handled correctly.
Free-text replies stay as comments. Plain replies from the human are imported as issue comments and wake the agent, rather than being treated as approval signals. The bridge stays open until an explicit approval or rejection arrives.
Retries create fresh rows. Bridge retries no longer reuse stale state; each retry creates a new outbox row, preventing stale delivery state from blocking future attempts.

The ClickUp transport adapter itself is being extracted into a pure plugin in PR #78 (still in review at time of writing), which will keep the bridge core entirely provider-agnostic and let ClickUp register via AwaitingHumanBridgeRegistry like any future provider.

Key Design Decisions

Decision 1: Outbox-first, not fire-and-forget

We could have sent ClickUp notifications inline during the issue status transition. It's simpler code. But inline delivery means a ClickUp API failure blocks the status transition, and a retry requires re-running the whole transition logic.

The outbox decouples delivery from state change. The issue parks to awaiting_human immediately; the notification goes out when it can. This is the right trade-off for a system where the human-facing notification is important but not on the critical path of the state machine.

Decision 2: Non-approval replies are rejections, not noise

This was the most debated call. The argument for treating non-approval replies as noise: it's simpler, and it avoids false rejections from humans who reply "got it" or "looking now" without intending to reject.

The argument for treating them as rejections: in a zero-human company, the human's time is the scarce resource. If they've replied, they've engaged. Ignoring their reply and leaving the issue parked is worse than a false rejection — at least a rejection wakes the agent and gives it the human's comment as context.

We went with rejection-as-default, with the explicit negative reaction path as the "clean" rejection signal. We're watching for false-rejection reports in production.

Decision 3: Provider-agnostic core from the start

We could have shipped a ClickUp-specific bridge and refactored later. The counter-argument: the bridge touches the database schema, the heartbeat service, and the interaction resolution flow. Refactoring those layers after the fact is expensive and risky.

By designing the bridge core to be provider-agnostic from PR #74 onward — with ClickUp as the first adapter — we pay a small upfront cost for a much cleaner extension path. When Slack or Discord support lands, it registers via the same AwaitingHumanBridgeRegistry without touching bridge core.

Trade-Offs and Open Questions

The Polling Tax

The current implementation polls ClickUp on every heartbeat cycle for every awaiting_human issue. At low volume, this is fine. At scale, it's a lot of API calls — and ClickUp's rate limits are not generous.

We're considering a webhook-first approach where ClickUp pushes events to Bizbox, with polling as a fallback. That requires a publicly reachable webhook endpoint and more complex event routing, but it would dramatically reduce the polling load.

Approval Signal Ambiguity

The configured "positive reply" list is a blunt instrument. Right now, operators configure a list of strings (e.g., ["yes", "approved", "lgtm"]) that count as approval. That works for structured workflows but breaks down for natural language responses.

A future direction: use a lightweight classifier to interpret free-text replies, rather than exact-match string lists. The risk is false positives on ambiguous language — which in an approval context is a meaningful safety concern.

Multi-Provider Routing

The current schema supports one bridge provider per company. But some companies might want to route different issue types to different channels — high-priority approvals to a dedicated Slack channel, routine confirmations to ClickUp.

The settings schema has room for this, but the routing logic isn't there yet. It's on the roadmap.

The PR #78 Gap

The ClickUp transport adapter is still in review as a pure plugin (PR #78). Until it merges, the bridge core and the ClickUp adapter are more coupled than the final architecture intends. We're treating this as a known technical debt item with a clear resolution path.

What We Learned

Reliability primitives pay for themselves. The outbox pattern added a migration, a background processor, and a handful of service methods. It also eliminated an entire class of "notification never arrived" bugs and made the retry story trivial. Worth it.

Behavioral design is harder than technical design. The rejection semantics question — what does a non-approval reply mean? — took more discussion than the outbox implementation. The technical work was straightforward; the behavioral contract required careful thought about what the system should do in ambiguous situations.

Provider-agnostic from the start is the right call for integration layers. The temptation to ship a ClickUp-specific bridge and "clean it up later" is real. The cost of doing it right upfront was one extra PR and a slightly more abstract schema. The benefit is a clean extension path for every future provider.

What's Next

PR #78 — ClickUp transport as a pure plugin, completing the provider-agnostic bridge architecture.
Webhook-first polling — reduce the heartbeat polling load by accepting ClickUp push events.
Multi-provider routing — route different issue types to different approval channels.
Natural language approval interpretation — move beyond exact-match string lists for approval signal detection.

If you're building on Bizbox or thinking about how to handle human-in-the-loop approval flows in your own agent systems, we'd love to hear what patterns you're using. Drop a note in GitHub Discussions or on Discourse.

Related Work

PR #33: Add awaiting_human issue status — the status that made the bridge necessary
PR #42: ClickUp approval polling — first approval polling implementation
PR #52: Normalize comment IDs and remove dead bridge status — cleanup and normalization
PR #56: Awaiting-human notification outbox — reliable notification delivery
PR #65: ClickUp approval configuration — company-scoped channel routing
PR #68: Message ID fallback in heartbeat — race condition fix
PR #70: Rejection semantics for non-approval replies — behavioral design decision
PR #74: Bridge configuration schema — provider-agnostic settings layer
PR #76: Bridge retry and reply dedupe — lifecycle hardening
PR #78: ClickUp bridge adapter as pure plugin — in review
May 2026 Deep Dive: The awaiting_human Status — the foundation this bridge builds on

About Bizbox: We're building an AI-native task orchestration system where humans and AI agents collaborate on structured work. This Deep Dive is part of our monthly series on architectural decisions and lessons learned. Follow the project on GitHub.

DEV Community

Deep Dive: Building the Awaiting-Human Bridge — From ClickUp Polling to a Provider-Agnostic Approval Layer

Deep Dive: Building the Awaiting-Human Bridge — From ClickUp Polling to a Provider-Agnostic Approval Layer

Background: The Status That Needed a Bridge

The Problem in Detail

The Architecture: Four Layers

1. The Outbox (Reliable Notification Delivery)

2. The Approval Polling Loop

3. Rejection Semantics

4. The Provider-Agnostic Bridge Core

Key Design Decisions

Decision 1: Outbox-first, not fire-and-forget

Decision 2: Non-approval replies are rejections, not noise

Decision 3: Provider-agnostic core from the start

Trade-Offs and Open Questions

The Polling Tax

Approval Signal Ambiguity

Multi-Provider Routing

The PR #78 Gap

What We Learned

What's Next

Related Work

Top comments (0)