Bizbox for Citro

Posted on Jun 19

Deep Dive: Workflows as a First-Class Primitive — How Bizbox Added a Full Pipeline Platform in June 2026

#agents #architecture #automation #systemdesign

Deep Dive: Workflows as a First-Class Primitive — How Bizbox Added a Full Pipeline Platform in June 2026

July 2026

Why Workflows Needed to Be a Primitive

Bizbox has always had two execution models: issues (discrete tasks, picked up by an agent in a heartbeat, one agent at a time) and routines (recurring tasks, scheduled or webhook-triggered, also single-agent). Both models work well for bounded, interruptible work — the kind of thing an agent can start, pause, and resume across heartbeats.

But a class of workflows doesn't fit that shape. Consider a data pipeline that ingests a report, runs several analysis steps in sequence, calls an external API mid-flight, waits for a human to review an intermediate output, and only then writes results to a deliverable. That's a multi-step, multi-agent-or-single-agent-but-long-running, human-in-the-loop pipeline. Forcing it through the issue model means either one enormous heartbeat (fragile, timeout-prone) or a web of child issues with manual sequencing (brittle, hard to observe).

The team's answer, landed on June 3 in PR #86, was to introduce Workflows as a new company-scoped primitive — explicitly separate from issues and routines — backed by Google's Agent Development Kit (ADK) runtime.

This post walks through what was built, why the design is structured the way it is, and what the month of iteration on top of that foundation revealed.

The Foundation: What Shipped on Day One

PR #86 was a substantial landing. The scope, compressed:

A new workflows table and corresponding API routes. Workflows are first-class entities in the database, not a special issue type or a routine variant. They have their own namespace, their own run records, and their own lifecycle.
Google ADK-backed execution. Each workflow run spawns a Python ADK process. The Bizbox server acts as the orchestrator: it provisions the ADK environment, passes the workflow's Python source and any input, and streams the resulting console output into a run record.
Run records with persisted deliverables. A workflow run isn't just a log entry — it produces structured output. Deliverables (files, structured data, artifacts) are attached to the run record and exposed through the same deliverables API used elsewhere in Bizbox.
Pipeline graph rendering. The ADK runtime produces a graph of the pipeline's phases — each workflow step is a node; execution order and branching are edges. Bizbox instruments this graph and renders it in the UI, so operators can see the full pipeline topology at a glance, not just a flat console log.
The input() bridge. PR #86 also introduced the first version of input() monkey-patching: when the ADK Python process calls input(), Bizbox intercepts it instead of blocking the process. This is the foundational seam that makes human-in-the-loop workflows possible.
Workflow detail view and navigation. The UI got a workflows navigation item, a workflow list, and a workflow detail page showing pipeline graph, run history, console output, and deliverables.

This is the full vertical stack — schema, API, runtime, and UI — in a single PR. That's worth noting: rather than landing a stub and iterating, the team shipped the complete primitive. The following three weeks were refinement and extension, not completion.

Instrumentation: Making `input()` Reliable Across Environments

ADK workflows are Python programs. Python programs that need human input call input(). In a typical ADK pipeline, input() blocks the process until the user types something. That model doesn't work in a server-managed runtime — you can't block a server process on stdin.

PR #86 introduced the solution: monkey-patching the Python environment so that input() calls are intercepted rather than blocking. The initial implementation worked but had cross-environment reliability issues — the monkey-patch didn't take effect consistently across all Python runtime configurations.

PR #91, merged two days after launch, fixed this cross-environment gap and completed the integration by wiring the intercepted input() call through to the awaiting-human bridge. Instead of blocking the process, the intercepted call now routes through the provider-agnostic awaiting-human bridge — the same durable coordination layer that issues use for human handoffs — and suspends the workflow run until the human responds through their configured channel.

This is an elegant use of the bridge architecture. The workflow author writes standard Python. The input() call expresses the semantic intent ("I need a human here") without requiring any Bizbox-specific SDK. Bizbox instruments the runtime, and the bridge handles the rest: notification delivery via the configured transport adapter, polling, reply ingestion, and resume. ClickUp is the current default transport, but the bridge is designed to support other providers without changes to the workflow code.

PR #96 continued this work, improving the full lifecycle of the handoff for workflow runs — specifically ensuring that the workflow's run state is correctly reflected while a handoff is pending and that the run resumes cleanly when the human responds.

Portability: Import and Export

PR #99 and PR #103 added workflow import/export.

These PRs matter architecturally for a reason that isn't immediately obvious: they establish that Workflows are a portable unit of company configuration, not a runtime-only artifact.

Bizbox's import/export system lets operators snapshot a company's configuration — agents, routines, skills, and now workflows — and restore it elsewhere. Before these PRs, if you exported a company package, you'd get the agent definitions and skill assignments but not the workflow definitions. A company that relied heavily on ADK pipelines couldn't be fully migrated or duplicated.

With workflows included in the export/import surface, the company package is again complete. Workflow directories (the Python files that define each pipeline) are bundled alongside the metadata, and the import logic reconstructs them correctly on the receiving end.

Observability: Prompt Templates and Graph Analysis

The final two weeks of June focused on making the workflow surface more useful for operators who didn't write the pipeline themselves.

PR #111 added workflow-scoped prompt templates to the workflow detail page. This is a small but sharp change: rather than showing generic prompt suggestions that apply to any issue, the workflow page now surfaces suggestions that are specific to the workflow's purpose — derived from the workflow record itself. An operator running a data analysis pipeline sees suggestions relevant to that pipeline, not to the agent system in general.

PR #112 polished the run history UI. The workflow detail page lists all past runs for a workflow. Before this PR, selecting a run from history didn't correctly update the detail view — you'd click a past run and see stale data from the most recent run. The fix ensured that selecting a run from history locks the view to that run's graph, console output, and deliverables, making the history pane a genuine audit trail rather than a misleading navigation element.

PR #113, merged on June 18, substantially expanded the pipeline graph analysis. The original renderer had limited understanding of ADK's graph model. This PR added explicit parsing of Workflow and JoinNode constructs along with correct DAG edge construction, so the topology renders accurately for real-world pipelines: you can see where branches split, where they converge at join steps, and how the full directed graph of phases is connected — not just a linear sequence of nodes.

The Design Decision Worth Unpacking

The most important decision in the Workflows launch is the one that isn't loudly stated: Workflows is not a subtype of Issues.

It would have been simpler, in some ways, to model workflows as a special issue kind — one with a multi-step execution plan, child issues for each step, and an execution policy that sequences them. Some orchestration platforms take exactly this approach.

Bizbox didn't. Workflows have their own table, their own API namespace, their own run record type. Issues and workflows are siblings in the company data model, not parent and child.

The reason is execution fidelity. Issues are designed to be interruptible. An agent checks out an issue, does some work, posts a comment, and exits. The next heartbeat, another agent (or the same one) picks it up. That model requires that the work state lives in comments and documents, not in process memory. It works because the work is fundamentally discrete.

ADK workflows are not discrete in that sense. A pipeline has local state that must persist across phases — intermediate computations, in-flight API results, the position in the graph. The ADK runtime maintains this state in the Python process. Forcing that into the issue heartbeat model would require serializing and deserializing ADK process state on every heartbeat boundary, which is both complex and fragile.

By keeping Workflows separate, Bizbox preserves the issue model's simplicity while giving the ADK runtime the long-running execution environment it actually needs. The awaiting-human bridge is where the two worlds meet: when a workflow needs human input, it routes that request through the same provider-agnostic bridge that issues use, so the operator experience is consistent even though the underlying execution model is different.

What's Still Open

The Workflows platform shipped quickly, which means a few things are still rough.

Workflow scheduling isn't built yet. You can trigger a workflow manually through the API or UI, and routines can trigger them, but there's no first-class cron support for workflows. The workaround is a routine that calls the workflow API, which works but adds indirection.

Multi-agent workflow steps are ADK-native (ADK supports multi-agent graphs), but the Bizbox graph renderer currently assumes a single-agent pipeline topology. Complex multi-agent graphs may not render correctly.

Workflow versioning is implicit. The workflow definition is a Python file stored in the company package. If you change it and the change breaks a running pipeline, there's no rollback mechanism in the current design.

Takeaway

The June 2026 Workflows sprint is one of the cleaner examples of a new primitive done right: a full vertical slice on day one, followed by focused iteration that filled in portability, observability, and operational polish. The design choice to keep Workflows separate from Issues rather than layering on top of them reflects a clear-eyed view of what each model is actually good at.

The input() bridge is the detail worth remembering. It's a small thing — a monkey-patch — but it expresses the right philosophy: workflow authors write standard Python, Bizbox instruments the execution environment, and the operator experience is consistent because the bridge handles the coordination. The primitive is doing its job when you can't see it.

Have questions or want to discuss the Workflows architecture? Join the conversation in our GitHub Discussions or on Discourse.

DEV Community

Deep Dive: Workflows as a First-Class Primitive — How Bizbox Added a Full Pipeline Platform in June 2026

Deep Dive: Workflows as a First-Class Primitive — How Bizbox Added a Full Pipeline Platform in June 2026

Why Workflows Needed to Be a Primitive

The Foundation: What Shipped on Day One

Instrumentation: Making `input()` Reliable Across Environments

Portability: Import and Export

Observability: Prompt Templates and Graph Analysis

The Design Decision Worth Unpacking

What's Still Open

Takeaway

Top comments (0)

Deep Dive: Workflows as a First-Class Primitive — How Bizbox Added a Full Pipeline Platform in June 2026

Why Workflows Needed to Be a Primitive

The Foundation: What Shipped on Day One

Instrumentation: Making input() Reliable Across Environments

Portability: Import and Export

Observability: Prompt Templates and Graph Analysis

The Design Decision Worth Unpacking

What's Still Open

Takeaway

Instrumentation: Making `input()` Reliable Across Environments