DEV Community

Cover image for Building Offline-First AI Agents: Why "Always-Online" Architectures Fail in the Real World
Tobi Lekan Adeosun
Tobi Lekan Adeosun

Posted on

Building Offline-First AI Agents: Why "Always-Online" Architectures Fail in the Real World

The "Happy Path" Problem
If you look at the documentation for most AI agent frameworks (LangChain, AutoGPT, CrewAI), they all share a dangerous assumption: Abundance Connectivity.

They assume your API calls to OpenAI will always succeed. They assume your websocket will never drop. They assume your user has stable 5G.

But I build software for Lagos, Nigeria. Here, power flickers, fiber cuts happen, and latency is a physical constraint, not an edge case. When I tried deploying standard agentic workflows here, they didn't just fail, they failed catastrophically. Users lost data, workflows hallucinated, and API credits were wasted on timeouts.

I call this the "Agentic Gap", the massive divide between how AI works in a demo video in San Francisco and how it works in a resource-constrained environment.

We Need "Contextual Engineering"I spent the last year re-architecting how we build these systems. I call the approach Contextual Engineering. It’s not about making models smarter; it’s about making the system around them resilient.

Here are two architectural patterns I built to fix this, which you can use in your own Python projects today.

Pattern 1: The "Sync-Later" Queue
Most agents use a synchronous User -> LLM -> Response loop. If the network dies in the middle, the context is lost.

Instead, we treat every user intent as a Transaction.

  1. Serialize the Intent: When a user prompts the agent, we don't hit the API immediately. We serialize the request and store it in a local SQLite queue.

Cryptographic Signing: We sign the request to ensure integrity.

Opportunistic Sync: A background worker checks for connectivity (Ping/Heartbeat). Only when $N(t) = 1$ (network is available) do we flush the queue.

The Python Implementation:Instead of a direct requests.post, we use a local buffer. Here is the logic from the open-source framework:

import sqlite3
import uuid

def queue_action(user_input, intent_type):
    # 1. Create a transaction ID
    tx_id = str(uuid.uuid4())

    # 2. Store locally first (Offline-First)
    conn = sqlite3.connect('agent_state.db')
    cursor = conn.cursor()
    cursor.execute(
        "INSERT INTO pending_actions (id, input, status) VALUES (?, ?, 'PENDING')",
        (tx_id, user_input)
    )
    conn.commit()

    # 3. Try to sync (if online)
    if check_connectivity():
        sync_manager.flush()
    else:
        print(f"Network down. Action {tx_id} queued for later.")
Enter fullscreen mode Exit fullscreen mode

This ensures Zero Data Loss. The user can keep working, and the agent "catches up" when the internet comes back.

Pattern 2: The Hybrid Inference Router
Why route a simple "Hello" or "Summarize this text" to GPT-4? It’s slow, expensive, and requires a heavy internet connection.

I implemented a Router Logic Gate that inspects the prompt before it leaves the device.

  • Low Complexity? → Route to a local SLM (like Llama-3-8B or Phi-2) running on-device. (Cost: $0, Latency: Low).
  • High Complexity? → Route to the Cloud (GPT-4o).

The decision function looks like this:

# The Routing Logic
if network_is_down() or complexity < threshold:
    model = "Local Llama-3 (8B)" # Free, Fast, Offline
else:
    model = "GPT-4o"             # Smart, Costly, Online
Enter fullscreen mode Exit fullscreen mode

This simple check saved us about 40-60% on API costs and made the application feel "instant" for basic tasks, even on 3G networks.

The "Contextual Engineering" Framework
These patterns aren't just hacks; they are part of a broader discipline I’m trying to formalize called Contextual Engineering. It’s about building AI that respects the Contextual Tuple (C = {I, K, R}): Infrastructure, Knowledge (Culture), and Regulation.

I’ve open-sourced the entire reference architecture. It includes the routing logic, the SQLite queue wrappers, and the "Constitutional Sentinel" for safety.

Where to find the code
I want to see more engineers building specifically for the Global South. You can find the full Python implementation here:

👉 Star the GitHub Repository

The Deep Dive (Free Book)
For those who want the math and the full architectural theory, I also wrote a 90-page reference manuscript titled "Contextual Engineering: Architectural Patterns for Resilient AI." It covers the full "Agentic Gap" theory and detailed diagrams.

📖 Download the PDF (Open Access)

Let me know in the comments: How do you handle network flakes in your LLM apps?

Top comments (0)