Building PulseOps with Pydantic AI and Logfire: Predicting London Bus Delays Before They Happen

#pydantic #ai #hackathon

Bus control is still a very human, very reactive operation. A controller sees a late bus, a bunched pair, or a corridor starting to fail, then decides whether to hold, short-turn, gap-fill, or let the service recover naturally.

The issue is that by the time the delay is obvious, the best intervention window may already have passed.

For our hackathon project, we built PulseOps: an AI operations layer for London buses that predicts operational risk before it becomes visible on the road.

PulseOps ingests live transport signals such as TfL vehicle locations, Countdown predictions, GTFS schedules, stop sequences, road disruption context, weather, and JamCam CCTV. It then builds a per-vehicle risk score and projects that risk roughly fifteen minutes forward.

The goal is simple: help a human controller understand which bus is likely to become a problem, why it is happening, and what intervention is worth considering.

Why Pydantic mattered

PulseOps is not a chatbot. It is an operational decision-support system.

That changed how we approached the AI layer. We did not want an LLM returning loosely formatted text that the frontend had to parse and hope was correct. We needed structured, validated output that could move safely through the system.

This is where Pydantic became central.

We used Pydantic AI to build the PulseOps Copilot as a typed Python agent. The agent takes live operational context for a selected bus or corridor and returns a structured answer: the situation, the likely cause, the recommended action, the expected impact, and a confidence score.

That structure mattered because PulseOps is designed for operations. A bus controller cannot work with “mostly valid” model output. They need predictable fields, grounded reasoning, and clear failure modes.

Pydantic AI helped us treat the AI response as part of the software contract. Instead of hoping the model produced the right shape, we made structure and validation part of the system design.

For a Python backend, this felt natural. The same mental model we use for API request models, database records, and service boundaries also worked for agent outputs. The Copilot response became something the FastAPI service could validate, log, return to the frontend, and reason about consistently.

How Pydantic AI shaped the agent

The Copilot’s job was to answer a controller-style question: “What is happening with this bus?”

To answer that properly, the agent needed more than a route number and a delay value. We built the context around live operational signals: bus position, delay trend, headway, stop sequence, nearby vehicles, route-level status, and disruption context.

Pydantic AI gave us a clean separation of responsibilities. Python code prepared the grounded operational context. The agent reasoned over that context. The output schema defined what the application expected back. Validation decided whether the result was usable.

That structure was especially useful in a 24-hour build. It kept the AI layer from turning into a pile of prompts, string concatenation, and fragile response parsing.

We also used the Pydantic AI Gateway for inference. That gave us a single interface for model calls and made it easier to route requests without spreading provider-specific logic throughout the codebase.

Observability with Logfire

The other critical piece was Logfire.

AI systems are difficult to debug when they behave like black boxes. In PulseOps, we needed to inspect not just the final Copilot answer, but the full path that produced it.

We traced the main operations in the AI service: risk calculation, context assembly, recommendation generation, Copilot answer generation, schema validation, and controller-driver simulation rounds.

This let us inspect the full decision chain from incoming route data to final response. We could see how the risk score was calculated, what context was passed into the agent, how the model responded, whether validation passed, and what answer was returned to the user.

That mattered because a bad answer could come from many places. The model might reason poorly. The prompt might lack context. The backend might pass stale vehicle data. The risk score might be too sensitive. The output might fail validation.

With Logfire, we could see where the failure happened instead of guessing.

Architecture

PulseOps used a three-service architecture.

The frontend was built with Next.js 15 and handled the live map, route search, bus sidebar, disruption radar, and Copilot interface.

The backend was built with Fastify and handled TfL ingest, GTFS data, JamCam context, DuckDB snapshots, and worker processes.

The AI service was built with FastAPI and handled risk scoring, recommendation generation, Pydantic AI agents, and Logfire instrumentation.

That separation made the system easier to reason about. The frontend focused on the controller experience. The backend focused on transport data. The AI service focused on turning live operational context into structured decisions.

What we learned

The main lesson was that production-style AI needs software engineering discipline.

For PulseOps, the useful part of the AI was not just that it could generate an explanation. It was that the explanation was typed, validated, observable, and grounded in live data.

Pydantic gave us the structure.

Pydantic AI gave us the agent layer.

Logfire gave us the traceability.

Together, they made the system feel less like a demo chatbot and more like an inspectable Python service.

The thesis behind PulseOps is simple:

The bus that is late at 9:17 was already becoming late at 9:02. The signal was there. PulseOps reads it, reasons over it, and gives the controller a decision they can inspect before they act.

Built in 24 hours at To The Americas.

https://x.com/i/status/2048379494008869036