DEV Community

Cover image for The most dangerous thing an AI can do in a high-stakes system is produce a wrong answer confidently.
Nisha Singh
Nisha Singh

Posted on

The most dangerous thing an AI can do in a high-stakes system is produce a wrong answer confidently.

OpenClaw Challenge Submission 🦞

This is a submission for the OpenClaw Writing Challenge


"The most dangerous thing an AI can do in a high-stakes system is produce a wrong answer confidently. The most valuable thing it can do is know when to stop — and call the right tool."


If you have spent any time watching the current AI landscape, you have likely noticed a frustrating paradox:

Large Language Models can write elegant code, draft legal memos, and explain quantum mechanics — but if you ask one to solve a strict logistical optimization problem, it will fail. And it will fail with complete confidence.

I work on problems like Capacitated Vehicle Routing and Aircraft Gate Assignment at the Department of Management Studies, IIT Madras. In my domain, a single hallucinated variable does not produce a funny chatbot response. It grounds flights, breaks supply chains, and causes financial loss measured in millions.

For years, enterprises have hesitated to integrate AI into core operations for exactly this reason: LLMs are fundamentally probabilistic. They predict the next token. They do not compute.

OpenClaw changes that equation — not by making AI smarter at math, but by giving it something far more valuable: the discipline to stop trying.

This is a deep walkthrough of how I use OpenClaw as an Autonomous Operations Dispatcher to solve the Aircraft Gate Assignment Problem (AGAP) — and why the architecture it enables is a blueprint for trustworthy AI in any high-stakes domain.


1. The Problem: Gate Assignment in the Real World

Every time an aircraft lands at a major airport, someone — or increasingly, some system — must answer a deceptively difficult question:

Which gate does this flight get?

The naive answer is "just check which gates are free." The real answer involves dozens of simultaneous constraints:

  THE REAL CONSTRAINT STACK
  ════════════════════════════════════════════════════════════════════
  ⬛  TIME WINDOWS         gate occupied by A380 cannot accept new
                           aircraft until pushback + 15 min buffer
  ⬛  AIRCRAFT COMPAT.     international gates have customs infra —
                           domestic flights cannot use them
  ⬛  WALKING DISTANCE     connecting passenger gate placement burns
                           time and goodwill if wrong
  ⬛  REAL-TIME CHAOS      weather, fuel spills, mechanical holds,
                           last-minute schedule changes — continuous
  ════════════════════════════════════════════════════════════════════
  All constraints must hold simultaneously.
  One violation = operational failure.
Enter fullscreen mode Exit fullscreen mode

Today most mid-sized airports rely on a combination of legacy scheduling software and experienced dispatchers who manually override when disruptions hit. The software is rigid. The humans are slow under stress. The result is constant operational friction between what the math says is optimal and what a stressed dispatcher can actually implement.

This is exactly the kind of problem AI should solve — and exactly the kind of problem it currently cannot solve alone.


2. Why Pure LLMs Cannot Be Trusted Here

To understand the difficulty, consider the formal structure.

We have flights $F = {1, \dots, n}$ and gates $G = {1, \dots, m}$. The goal is to find an assignment that minimizes total passenger walking cost:

  OBJECTIVE
  ══════════════════════════════════════════════════════
  Minimize:   Σ  Σ  c[i][k] * x[i][k]
             i∈F k∈G

  Where:
    c[i][k]  =  walking cost proxy for flight i at gate k
    x[i][k]  =  1 if flight i assigned to gate k, else 0
  ══════════════════════════════════════════════════════

  CONSTRAINT 1 — Every flight gets exactly one gate:
    Σ x[i][k] = 1    for all i in F
    k

  CONSTRAINT 2 — No two overlapping flights share a gate:
    x[i][k] + x[j][k] ≤ 1    for all k, for all (i,j) overlapping

  CONSTRAINT 3 — Gate type compatibility:
    x[i][k] = 0    if flight type ≠ gate type
Enter fullscreen mode Exit fullscreen mode

Scale this to a regional airport: 60 daily flights, 18 gates. The search space grows combinatorially. An LLM cannot reliably enumerate overlapping pairs, respect binary decision variables, and optimize the objective simultaneously. It will produce plausible-sounding assignments that violate hard constraints — and do so with zero hesitation.

This is not a criticism of LLMs. They were not built for this. Integer Linear Programming solvers like Google OR-Tools were.

The question becomes: how do you connect a human dispatcher speaking in natural language to a solver that speaks in JSON?

That is OpenClaw's job.


3. The OpenClaw Paradigm: Orchestrator, Not Oracle

The key insight is a strict separation of concerns:

  ┌──────────────────────────────────────────────────────────────────┐
  │                  THE THREE-LAYER ARCHITECTURE                    │
  │                                                                  │
  │   [HUMAN DISPATCHER]                                             │
  │         │  natural language                                      │
  │         │  "Gate G4 is down. AI402 delayed 45 min."              │
  │         ▼                                                        │
  │   ┌─────────────────────┐                                        │
  │   │   OPENCLAW AGENT    │  ← translation layer                   │
  │   │                     │    understands context & intent        │
  │   │   governed by:      │    mutates schedule.json               │
  │   │   • AGENTS.md       │    calls solver tool                   │
  │   │   • TOOLS.md        │    formats output for humans           │
  │   └──────────┬──────────┘                                        │
  │              │  reads/writes                                     │
  │              ▼                                                   │
  │   ┌─────────────────────┐                                        │
  │   │   schedule.json     │  ← single source of truth              │
  │   │                     │    versioned, inspectable, diffable    │
  │   └──────────┬──────────┘                                        │
  │              │  triggers                                         │
  │              ▼                                                   │
  │   ┌─────────────────────┐                                        │
  │   │   solver.py         │  ← deterministic math layer            │
  │   │   (OR-Tools ILP)    │    same input → same output, always    │
  │   └──────────┬──────────┘                                        │
  │              │  returns                                          │
  │              ▼                                                   │
  │   ┌─────────────────────┐                                        │
  │   │  DISPATCH BRIEFING  │  OPTIMAL assignment  OR  INFEASIBLE    │
  │   │  (human-readable)   │  — no guessing, no fabrication         │
  │   └─────────────────────┘                                        │
  └──────────────────────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

OpenClaw is not asked to solve the routing. It is asked to translate:

  • Inbound: messy human language → clean structured JSON
  • Outbound: solver's JSON result → readable dispatcher briefing

The LLM does what it is genuinely good at — understanding context and communicating clearly.
The solver does what it is genuinely good at — exhaustive, constraint-respecting combinatorial search.
Neither is asked to do the other's job.

OpenClaw makes this operational because it runs locally with persistent state and native shell access. It can hold schedule.json in memory across a full shift, receive live updates, fire python solver.py, and stream results back — all within a single conversational thread.


4. Building the System

Step 0 — The Data Contract (schedule.json)

Everything flows through one structured payload. OpenClaw reads and mutates this file; the solver reads it to compute.

{
  "flights": [
    { "id": "AI101", "arrival": "08:00", "departure": "09:15", "type": "domestic" },
    { "id": "AI203", "arrival": "08:30", "departure": "10:00", "type": "international" },
    { "id": "AI305", "arrival": "09:00", "departure": "10:30", "type": "domestic" },
    { "id": "AI402", "arrival": "09:45", "departure": "11:00", "type": "domestic" }
  ],
  "gates": [
    { "id": "G1", "type": "domestic",      "operational": true  },
    { "id": "G2", "type": "domestic",      "operational": true  },
    { "id": "G3", "type": "international", "operational": true  },
    { "id": "G4", "type": "domestic",      "operational": false }
  ],
  "cost_matrix": {
    "AI101": { "G1": 3, "G2": 7, "G3": 99 },
    "AI203": { "G1": 99, "G2": 99, "G3": 2 },
    "AI305": { "G1": 6, "G2": 4, "G3": 99 },
    "AI402": { "G1": 5, "G2": 9, "G3": 99 }
  }
}
Enter fullscreen mode Exit fullscreen mode

A cost of 99 encodes hard incompatibility — the solver will never choose it, without needing a separate constraint type.

Step 1 — Registering the Tool (TOOLS.md)

## Tool: run_agap_solver

Command: python solver.py --input "$(cat schedule.json)"

Returns: JSON with keys: status, assignments (flight_id → gate_id), total_cost

When to call: After ANY update to schedule.json
Never call when: JSON has not changed since last solver run
Enter fullscreen mode Exit fullscreen mode

Step 2 — Constraining the Agent (AGENTS.md)

## Agent: Tower_Dispatcher

Role: You are a strict data translator. NOT a mathematician. NOT a scheduler.

Instructions:
1. Parse dispatcher message → apply changes to schedule.json
2. DO NOT assign gates yourself
3. DO NOT calculate time overlaps manually
4. After updating schedule.json → immediately call run_agap_solver
5. Translate solver output → human-readable Dispatch Briefing

If solver returns INFEASIBLE:
  Alert the dispatcher. Do not fabricate an assignment. Escalate.
Enter fullscreen mode Exit fullscreen mode

That last rule — Do not fabricate an assignment — is the entire safety architecture in one line.

Step 3 — The Deterministic Engine (solver.py)

import argparse, json
from ortools.linear_solver import pywraplp

def time_to_minutes(t):
    h, m = map(int, t.split(":"))
    return h * 60 + m

def overlaps(f1, f2, buffer=15):
    a1 = time_to_minutes(f1["arrival"])
    d1 = time_to_minutes(f1["departure"]) + buffer
    a2 = time_to_minutes(f2["arrival"])
    d2 = time_to_minutes(f2["departure"]) + buffer
    return not (d1 <= a2 or d2 <= a1)

def solve_gate_assignment(schedule_json):
    data = json.loads(schedule_json)
    flights = data["flights"]
    gates   = [g for g in data["gates"] if g.get("operational", True)]
    costs   = data.get("cost_matrix", {})

    solver = pywraplp.Solver.CreateSolver("SCIP")
    n, m = len(flights), len(gates)

    x = {(i, k): solver.BoolVar(f"x_{i}_{k}") for i in range(n) for k in range(m)}

    # Every flight → exactly one gate
    for i in range(n):
        solver.Add(sum(x[i, k] for k in range(m)) == 1)

    # No overlapping flights share a gate
    for k in range(m):
        for i in range(n):
            for j in range(i + 1, n):
                if overlaps(flights[i], flights[j]):
                    solver.Add(x[i, k] + x[j, k] <= 1)

    # Gate type compatibility
    for i, flight in enumerate(flights):
        for k, gate in enumerate(gates):
            if flight.get("type") and gate.get("type") and flight["type"] != gate["type"]:
                solver.Add(x[i, k] == 0)

    # Objective: minimize walking cost
    obj = solver.Objective()
    for i, flight in enumerate(flights):
        for k, gate in enumerate(gates):
            obj.SetCoefficient(x[i, k], costs.get(flight["id"], {}).get(gate["id"], 10))
    obj.SetMinimization()

    status = solver.Solve()
    if status == pywraplp.Solver.OPTIMAL:
        return {"status": "OPTIMAL",
                "assignments": {flights[i]["id"]: gates[k]["id"]
                                for i in range(n) for k in range(m)
                                if x[i, k].solution_value() > 0.5},
                "total_cost": round(solver.Objective().Value(), 2)}
    return {"status": "INFEASIBLE",
            "error": "No valid gate assignment exists for current constraints."}
Enter fullscreen mode Exit fullscreen mode

Install with: pip install ortools


5. Watching It Work: A Live Disruption Scenario

  ════════════════════════════════════════════════════════════════
  [08:47] DISPATCHER 
  "Gate G4 is offline — fuel spill on the apron, estimate 2 hrs.
   Also, Flight AI402 running 45 minutes late. New arrival 10:30."
  ════════════════════════════════════════════════════════════════

  [OpenClaw  internal actions]

  Parsing message... two updates:
    G4   set "operational": false
    AI402  update "arrival": "10:30"

  Mutating schedule.json... 
  Calling run_agap_solver...

  $ python solver.py --input "$(cat schedule.json)"

  Solver output:
  {
    "status": "OPTIMAL",
    "assignments": {
      "AI101": "G1",
      "AI203": "G3",
      "AI305": "G2",
      "AI402": "G1"
    },
    "total_cost": 15.0
  }
  ════════════════════════════════════════════════════════════════

  [08:47] OPENCLAW 

  GATE ASSIGNMENT BRIEFING  08:47
  ─────────────────────────────────────────────────────
  AI101  Gate G1  |  08:0009:15  |  ON TIME
  AI203  Gate G3  |  08:3010:00  |  ON TIME
  AI305  Gate G2  |  09:0010:30  |  ON TIME
  AI402  Gate G1  |  10:3012:00  |  DELAYED +45 MIN
            no conflict: G1 clears AI101 by 09:30 

  Gate G4: OFFLINE
  Solver: OPTIMAL  |  Walk score: 15.0
  ─────────────────────────────────────────────────────
  ════════════════════════════════════════════════════════════════
Enter fullscreen mode Exit fullscreen mode

The agent never said "I think Gate G1 works." It ran the math. Then it reported the math.


6. The Infeasibility Case: Honest Failure Over Confident Error

Simulate a harder disruption: G2 also goes offline. AI305 now has no valid gate.

  DISRUPTION: G2 OFFLINE
  ══════════════════════════════════════════════════════════════
  G1  ██████████ [AI101]  ░░░  ██████████ [AI402]
      → no room for AI305 without overlap ❌

  G2  ██ OFFLINE ████████████████████████████
      → excluded ❌

  G3  █████████████████████ [AI203]
      → international only — AI305 is domestic ❌

  G4  ██ OFFLINE ████████████████████████████
      → excluded ❌

  ══════════════════════════════════════════════════════════════
  RESULT: INFEASIBLE
  ══════════════════════════════════════════════════════════════
Enter fullscreen mode Exit fullscreen mode

OpenClaw's response:

  GATE ASSIGNMENT BRIEFING — 09:12
  ─────────────────────────────────────────────────────
  ⚠️  SOLVER STATUS: INFEASIBLE

  No valid gate assignment exists for current constraints.
  Flight AI305 cannot be accommodated without a gate conflict.

  ACTION REQUIRED: Human dispatcher override needed.
  ─────────────────────────────────────────────────────
Enter fullscreen mode Exit fullscreen mode

It did not invent Gate G5. It did not suggest a partial solution. It said exactly what the math said: impossible — and escalated to a human.

In regulated operations, that honest refusal is not a failure. It is the feature.


7. Before vs. After: What This Architecture Actually Changes

Dimension Before (Manual Dispatch) After (OpenClaw + Solver)
Input format Structured form — dispatcher fills manually Free-text natural language
State management Spreadsheet, updated by hand schedule.json mutated automatically
Optimization engine Dispatcher's memory and experience OR-Tools ILP — provably optimal
Infeasibility response Dispatcher improvises a workaround System returns INFEASIBLE, escalates
Hallucination risk Human fatigue and bias errors Agent forbidden from inventing answers
Auditability Notes in email or chat schedule.json + solver output = full trace
Reproducibility "It seemed right at the time" Same input → same output, always
Constraint enforcement Depends on dispatcher remembering Hard-coded in solver, never skipped
Response to gate closure May be missed under pressure Enforced at solver level, always
Failure mode Silent wrong assignment Loud honest refusal with exact reason
Time per disruption 3–10 minutes of manual work Seconds
Trust model Trust the dispatcher's judgment Trust the solver's mathematical proof

8. What Determinism Buys You: Auditability

The practical value of this architecture is not just correctness — it is auditability.

When a gate assignment is challenged ("Why was AI402 sent to G1 when G2 was free?"), the operations manager can:

  1. Pull the exact schedule.json snapshot from that timestamp
  2. Rerun solver.py with the same input
  3. Get the same output

The assignment is not an LLM's opinion. It is the output of an optimization with a traceable objective function. This matters enormously in regulated industries. Aviation, logistics, and healthcare do not run on "it seemed right." They run on demonstrable, reproducible decision records.


9. The Broader Blueprint

The Aircraft Gate Assignment problem is one instance of a pattern that appears across every domain where correctness is non-negotiable:

  PATTERN: OpenClaw as Operations Layer
  ══════════════════════════════════════════════════════════════════
  Domain           Unstructured Input       Deterministic Engine
  ──────────────────────────────────────────────────────────────────
  Aviation         Dispatcher messages   →  ILP gate solver
  Logistics        Driver field updates  →  VRP routing solver
  Healthcare       Clinician notes       →  Drug interaction checker
  Finance          Analyst commentary    →  Monte Carlo risk engine
  Power Grid       Operator alerts       →  Load balancing optimizer
  ══════════════════════════════════════════════════════════════════

  In every case:
  LLM job   →  bridge human language to machine-readable state
  Solver job →  compute
  Failure   →  when those roles collapse into one
Enter fullscreen mode Exit fullscreen mode

OpenClaw makes this separation operational. Its persistent local runtime and shell access mean you deploy the agent in the same environment as your solvers, databases, and legacy systems — without requiring cloud APIs or vendor sandboxes.

This is the architectural shift that makes AI trustworthy in high-stakes operations: not a smarter LLM, but a system where the LLM knows exactly where its jurisdiction ends.


Getting Started

# 1. Install the solver dependency
pip install ortools

# 2. Set up your schedule data
#    Edit schedule.json with your flights, gates, and cost matrix

# 3. Test the solver directly
python solver.py --input "$(cat schedule.json)"

# 4. Configure OpenClaw
#    Place TOOLS.md and AGENTS.md in your OpenClaw config directory
#    Point the tool command at your local solver.py path

# 5. Start a session and send a dispatch message
#    OpenClaw will parse, update, solve, and brief — automatically
Enter fullscreen mode Exit fullscreen mode

Conclusion

The most dangerous thing an AI system can do in a high-stakes environment is confidently produce a wrong answer.

The most valuable thing it can do is know when to stop and call the right tool.

OpenClaw, configured as an Autonomous Operations Dispatcher, embodies this completely. It does not attempt to be smarter than a mathematical solver. It acts as the intelligent glue between human operators who speak in ambiguity and optimization engines that speak in constraints.

The result:

  ✅  Faster than manual dispatching
  ✅  More reliable than a bare LLM
  ✅  Fully auditable — same input, same output, always
  ✅  Deployable on local infrastructure, no cloud required
  ✅  Honest under infeasibility — escalates instead of guessing
Enter fullscreen mode Exit fullscreen mode

The aircraft gate problem is solved. More importantly, a reusable blueprint for trustworthy AI in operations research is demonstrated. The pattern — LLM as translator, deterministic solver as mathematician — generalizes to any domain where correctness is non-negotiable.

That, I would argue, is the most important OpenClaw story worth telling right now: not that AI can do everything, but that a well-architected AI system knows exactly what it should never try.


ClawCon Michigan

I did not attend ClawCon Michigan in person, but the problems being discussed there — how to build AI systems that are both capable and trustworthy — are exactly the problems this post addresses. If the OpenClaw community is moving in this direction, I am very much here for it.


Top comments (0)