From Hackathon Idea to Life-Saving Workflow: The Story of the DCRCA Agent

Vincenzo Bianco — Mon, 08 Sep 2025 09:23:38 +0000

The AI AgentHack Hackathon

Last week, we ran AI AgentHack, a hackathon where more than 3,000 developers built creative agentic projects using Portia.

Picking winners wasn’t easy, but one project stood out: Team Dark Mode’s DCRCA Agent (Disaster Chaos Response Coordination AI).

The DCRCA Agent helps emergency teams cut through the noise. It scans live news and social feeds, pulls out the key details, and maps emergencies by priority so responders know exactly where to act first.

We were extremely impressed to see this cool Portia use case and, more importantly, an application with great potential for societal impact!

Below is a deep dive into how the team built this using Portia.

How The DCRCA Agent Is Built

The DCRCA Agent is wired together with PlanBuilderV2, where the workflow is laid out step by step: pulling raw data from news feeds, parsing and prioritizing it with LLM steps, routing through a human approval checkpoint, and finally dispatching updates over email and Slack. Each stage has clear inputs and outputs, making the whole flow transparent.

A key design choice was the separation between reasoning and tool calls. Reasoning tasks (like structuring raw data or scoring emergencies) live inside .llm_step(), while external services such as Google Search and Gmail are called through .invoke_tool_step(). This separation keeps debugging and maintenance straightforward.

They also used custom Python functions for oversight with .function_step(). These functions handled approval checks and message formatting, showing how Portia makes human-in-the-loop workflows natural instead of forcing full automation.

Finally, because every step exposes structured outputs at runtime, the agent can surface both intermediate results (like “Slack message sent ✅”) and the overall summary of actions — giving the team visibility into exactly what happened.

We’re thankful to Team Dark Mode and all other hackathon participants for helping us prove that Portia isn’t just for tinkering—it can drive real, high‑stakes workflows. By combining off‑the‑shelf tools, LLM reasoning and human oversight in a single plan, they built something useful and understandable.

It’s exciting to imagine what other novel agentic ideas the community will bring to life next!

If you want to try building your own agentic workflow, check out our GitHub!

Introducing SteelThread: Evals & Observability for Reliable Agents

Vincenzo Bianco — Thu, 14 Aug 2025 15:44:11 +0000

We’ve spent a lot of time internally running evals for our own agents. If you care about reliability in agentic systems, you know why this matters — models drift, prompts change, third party MCP tools get updated. A small change in one place can cause unexpected behavior somewhere else.

That’s why we’re excited to share something we’ve been using ourselves for months: SteelThread, our evaluation framework built on top of Portia Cloud.

You can try if for free on Portia!

While building our own automations on top of Portia, we realised it was an absolute joy to run evals with owing to two of its core features:

First, every agent run is captured in a structured state object called a PlanRunState — steps, tool calls, arguments, outputs. That makes very targeted evaluators trivial to write, be it deterministic or LLM-as-Judge ones e.g. you can count plan steps, validate the behaviour of a specific tool, review the tone in final summary etc.
Second, we use Portia Cloud to store our agent runs. Whenever we manage to produce a multi-agent plan outcome that is desirable (or undesirable) e.g. during agent development, we can take the inputs and outputs of that agent run (query, plan, plan run) and instantly turn them into an Eval dataset. Since we built SteelThread, we haven’t actually needed to manually curate and build eval datasets from scratch anymore.

Before SteelThread, we still felt the pain that many teams do. Creating and maintaining curated datasets was tedious. Balancing deterministic checks with LLM-as-judge evals was tricky. And running evals against real APIs often meant dealing with authentication, rate limits, or unintended side effects — so we’d spend hours stubbing tools just to test safely.

SteelThread wraps all of this into a single workflow inside Portia Cloud. It gives you two ways to keep your agents in check: Streams, which spot changes in behavior in real time, and Evals which let you run regression tests against a ground truth dataset. Both Streams and Evals allow you to combine deterministic and LLM-as-judge evaluators. You can write your own evaluators but SteelThread comes with a generous helping of off-the-shelf ones for you to use as well.

Here is an example flow where we add a production agent run to an Eval dataset.

Observability and evals are essential for building reliable agentic systems, and SteelThread just makes them easier. Paired with the Portia development SDK, it’s a powerful combo: build structured, debuggable agents, monitor them in production, and turn any incident into a regression test instantly.

If you want to try it, head over to Portia Dashboard or check out our GitHub repo!

DEV Community: Vincenzo Bianco

From Hackathon Idea to Life-Saving Workflow: The Story of the DCRCA Agent

The AI AgentHack Hackathon

How The DCRCA Agent Is Built

Introducing SteelThread: Evals & Observability for Reliable Agents