DEV Community: Boniface Alexander

Handling Retry vs Quota Errors Correctly in AI CLI Tools

Boniface Alexander — Tue, 10 Feb 2026 20:20:44 +0000

While working on an open-source contribution to the Gemini CLI, I ran into an interesting edge case involving retryable 429 errors.

The Gemini API can return multiple error signals in a single response — for example:

RetryInfo (retry after X seconds)

QuotaFailure (quota-related metadata)

A regression caused the CLI to prioritise quota exhaustion even when RetryInfo was present, leading to immediate failures instead of retries.

The fix required:

Parsing RetryInfo correctly

Giving retry hints precedence over terminal quota errors

Implementing exponential backoff with jitter

Adding tests to cover mixed error scenarios

The PR was reviewed by Google maintainers and taken for internal validation, which is expected for changes affecting retry and quota behaviour.

If you’re building client tooling for AI APIs, this is a good reminder that error classification logic matters just as much as request logic.

Happy to discuss retry strategies or similar edge cases others have run into.

Boniface.dev — An AI Operator Portfolio Built with Gemini on Cloud Run

Boniface Alexander — Sat, 24 Jan 2026 14:37:43 +0000

title: Boniface.dev — Mission Control AI Portfolio Built with Gemini & Cloud Run
published: true
tags: googleai, gemini, ai, cloudrun, portfolio
label:dev-tutorial=devnewyear2026
dev-tutorial: devnewyear2026

This is a submission for the New Year, New You Portfolio Challenge Presented by Google AI

About Me

Hi! I’m Boniface (Bon) — an AI Architect & Engineer focused on building production-grade AI systems that operate reliably at scale. My passion lies in designing and deploying advanced GenAI systems, agent orchestration frameworks, and retrieval-augmented workflows — all with strong attention to safety, clarity, and real-world usability.

Rather than listing technologies, I focus on solving real engineering problems — optimizing for performance, reliability, and real-world constraints. My portfolio reflects that mindset.

Portfolio

Here’s my live portfolio — deployed on Google Cloud Run as required for this challenge:

🤖 Nexus AI — Interactive Portfolio Guide

This portfolio includes Nexus AI, an AI-powered interaction layer built using Google Gemini.

Nexus AI allows reviewers to:

Ask questions about any project
Get architecture explanations in plain English or deep technical detail
Navigate the portfolio intelligently instead of manually scanning
Understand trade-offs, decisions, and system design choices

Example prompts reviewers can try:

“Explain this portfolio like I’m a Google AI judge”
“Walk me through your RAG architecture”
“What problem does your agent framework solve?”

This feature is designed to help reviewers understand not just what I built — but why.

💡 Tip for reviewers: Try asking Nexus AI “Explain this portfolio like I’m a Google AI judge.”

How I Built It

🧰 Tech Stack

Frontend: React / Next.js
Backend: Node / FastAPI-style services
AI Integration: Google Gemini via Cloud Run service account + IAM
Deployment: Google Cloud Run (serverless)
Containerization: Docker

🛠 Design & Development Approach

Purpose-first design: The UI uses a “Mission Control” metaphor to communicate my approach as an engineer — systematic, intentional, and operationally grounded.
Secure AI usage: I access Gemini server-side only using IAM & application default credentials (no exposed API keys).
Scalable deployment: Cloud Run provides reliable, autoscaled hosting with HTTPS built in and minimal ops overhead.
Responsive layout: The interface adapts to different screen sizes and focuses on readability and discovery.

🧠 Google AI Tools Used

Gemini models for backend contextual insights and content exploration
Google Cloud Run for secure hosting
Service Account IAM for authenticated AI access
Google Antigravity : IDE

What I'm Most Proud Of

💡 Innovation & Technical Implementation

Integrated Gemini in a secure, backend-only fashion
Designed portfolio UI that reflects an engineering mindset
Deployed on Google Cloud Run with autoscaling and HTTPS

🚀 User Experience

Fast, accessible navigation
Clear project storytelling and context
Mission Control theme that ties design and function

🎯 Demonstrated Skills & Depth

This portfolio isn’t just a showcase — it’s a living system demonstrating real architectural decisions and scalable cloud deployments, which directly reflects how I build production AI systems.

Thanks for reviewing my submission, and thank you to the Google AI team for hosting this challenge! 🚀

Why AI Agents Fail in Production Without an Execution Runtime

Boniface Alexander — Fri, 23 Jan 2026 10:48:48 +0000

LLMs reason well, but without a runtime that handles lifecycle, state, and governance, AI agents are unreliable in production.

That’s the pattern I kept running into while working with LLM-based agents.

Modern models like Google Gemini can reason, plan, and invoke tools impressively well. Interactive CLIs and agent frameworks make it easy to prototype workflows in minutes.

But once you try to use these agents for real operational work, cracks appear quickly.

This post explains:

Why agent systems break down in production
Why prompts and agent loops are not enough
What kind of infrastructure is actually missing

The problem: agents are good at thinking, bad at executing

Most agent systems today follow a familiar loop:

Generate a plan
Execute a step
Observe the result
Repeat

This works surprisingly well for demos.

It fails when:

A task spans multiple steps
A process takes minutes or hours
A failure occurs halfway through
An action requires approval
You need to know what actually happened

In practice, agents lack:

Durable task state
An explicit execution lifecycle
Governance and safety controls
Recovery and resume guarantees
Auditable behavior

When something goes wrong, the system usually does one of two things:

Restart everything from scratch
Fail silently

Neither is acceptable in production.

Why interactive CLIs and agent frameworks don’t solve this

Interactive tools and agent frameworks are not flawed — they’re just scoped differently.

They are optimized for:

Human-in-the-loop usage
One-off execution
Exploration and iteration
Fast feedback

They are not designed to be:

Long-running execution engines
Durable workflow systems
Policy-enforced runtimes
Auditable automation layers

This distinction matters.

An interactive agent loop is not the same thing as an execution runtime — just like a shell script is not the same thing as a workflow engine.

The missing layer: why AI agents need an execution runtime

What’s missing between LLM reasoning and real-world automation is a runtime layer that treats AI work like actual work.

That means introducing first-class concepts such as:

Task lifecycle (created → running → paused → completed / failed)
Persistent state and checkpoints
Explicit retries and failure handling
Approval and policy enforcement
Observability and traceability

Without this layer, agents remain:

Impressive
Unreliable
Unsafe to trust with real operations

A concrete example

Imagine an AI Ops Analyst tasked with generating a weekly incident report:

Read incident data
Analyze trends
Generate a report
Request approval
Send the report

If step 3 fails:

Should the system restart everything?
Retry only that step?
Pause and ask for human input?
Resume later from the last checkpoint?

Most agent systems today don’t know how to answer these questions.

A runtime does.

What an execution runtime actually does

An execution runtime is deliberately boring — and that’s a good thing.

It focuses on:

Lifecycle management instead of prompting tricks
State persistence instead of stateless loops
Governance instead of trust
Recovery instead of hope

The LLM still plans and reasons.

The runtime decides how and when actions happen.

This separation turns an assistant into something closer to a governed coworker.

A reference implementation: Taskcraft Runtime

While exploring these problems, I built Taskcraft Runtime — an open-source, Gemini-first execution runtime designed to explore this missing layer.

Taskcraft is intentionally not:

A chatbot
A UI
A prompt framework
A SaaS product

It is a runtime.

It provides:

Structured task lifecycles
Persistent state and resume
Policy enforcement and approval gates
Explicit execution boundaries
Observability by default

The current implementation runs on Gemini, but the architecture is deliberately model-agnostic.

The goal is not to replace existing agent tools — but to complement them with execution guarantees they intentionally don’t provide.

Why this matters now

As LLMs get more capable, the bottleneck is no longer reasoning.

It’s reliability.

The difference between:

“AI that can do things”

and

“AI you can trust with work”

is infrastructure — not prompts.

Execution runtimes are how we cross that gap.

Closing thoughts

Agent demos will keep getting better.

But production systems are built on:

Clear boundaries
Predictable behavior
Explicit failure handling
Governance and auditability

If we want AI coworkers — not just assistants — execution must be treated as a first-class problem.

Links

Taskcraft Runtime (v0.1.0)

https://github.com/BonifaceAlexander/taskcraft-runtime