Rodrigo Giuliani

Posted on Jun 24

I Gave an AI One Sentence. A Drone Flew the Mission — and When the AI Guessed Wrong, the System Caught It.

#ai #robotics #iot #architecture

A build log: connecting DoSync, an open semantic-intent protocol, to a real autonomous-vehicle stack — then letting an AI model command it in plain language, and watching what happened when the model made a mistake.

I've spent the last while building DoSync, an open protocol (Apache 2.0) that lets AI agents act on physical devices through semantic intent instead of commands. The AI says "there's an emergency"; the devices figure out their own roles, from the capabilities they declared when they joined.

The protocol was designed to be domain-agnostic from the start — nothing in it assumes a house, a vehicle, or a factory. But a design claim is cheap. The only honest way to find out whether "domain-agnostic" is true is to take the protocol to the hardest possible device and see if it holds. So I connected it to a drone — and then I did the thing the protocol was always for but that I'd never actually shown: I let an AI model fire the intents, from a single sentence in plain language.

This post is the build log of both halves — the hard plumbing, and the moment it all closed into a loop. Including the moment the AI got it wrong, which turned out to be the most important part. Because the lesson underneath all of it is a simple one: the AI can be wrong; the protocol does not have to be.

Why a drone is the hard case

A light bulb is the easy case. You say "turn on," it turns on, done. The action is instantaneous and it never fails in a way that matters.

A drone is the opposite on every axis:

Its actions take time. "Go to this point" isn't done when the command is accepted — it's done minutes later, when the aircraft arrives.
It passes through states. Arming, taking off, navigating, holding position in wind.
Silence is not success. If the drone stops reporting, that doesn't mean it arrived. It might mean the radio link dropped while it kept flying.
A human can take over at any instant by grabbing the RC sticks — and when they do, the system must know and back off, not fight them.
Getting it wrong has physical consequences. If an orchestration protocol can handle that, the rest is downhill.

What I had to build

Until the drone, every action DoSync knew was instantaneous — fire it, get a result, done. A vehicle breaks that assumption completely. To speak to one, I built a few new layers — and the interesting part is how few of them are actually about drones.

A general model for long-running operations

First, a state machine for any action that takes time:

pending → in_progress → completed | failed
        + rejected, cancelled, interrupted
        + sub-states: preparing, paused_by_device, reconciling

Crucially, this knows nothing about drones. It's the same model an oven, a 3D printer, or a robotic arm would use. interrupted means a human took control — of a drone or of anything else.

A command channel and a telemetry channel — over one link

A MAVLink adapter translates DoSync's aerial intents (take_off, go_to, return_home, land, loiter) into the drone's native protocol. The key design rule: when the drone accepts a command, that's dispatch acceptance, not completion. "I received the order" ≠ "I arrived." The protocol keeps that distinction honest.

The other half is a telemetry listener that reads the drone's continuous stream and only marks an operation complete on a positive signal — the aircraft actually reaching its target altitude, actually arriving at a waypoint, actually disarming after it lands. Never on the absence of an error.

Getting these two channels to share one connection turned out to be the whole game, and it's worth being honest about why. My first instinct was two connections — one to send commands, one to listen. It fails immediately: a real drone over a serial radio is a single bidirectional link. A serial port can't be opened twice. Trying to fake two channels over UDP produced a cascade of subtle failures — a port bind conflict, then lost acknowledgements, then commands addressed to "system 0" because the command channel never heard the heartbeat that identifies the vehicle. Each one looked like a different bug. They were all the same bug: a blind command channel. The fix was to stop pretending. One connection, owned by the listener (the single reader); the command writes on it and waits for its acknowledgement through the listener. That's exactly how a real ground station handles a single radio link — it's the architecture a physical aircraft demands, not a shortcut that only holds up in simulation.

Safety policies that filter intent

Two policies, both absolute (they don't bypass even in an emergency):

A geofence that blocks any destination outside a permitted perimeter. An emergency is never a reason to fly somewhere dangerous.
A manual-control lockout that blocks commands to a vehicle a human is flying. An emergency is very likely why they took control — it must never wrestle it back. These filter the intent before dispatch. They are not the flight safety system — that lives in the drone's firmware and acts on its own, even if DoSync vanishes. DoSync coordinates; it is never the failsafe. That boundary is the whole philosophy.

Testing it without risking a real aircraft

I validated everything against ArduPilot SITL — the same software-in-the-loop simulator the drone industry develops against. A complete virtual drone, speaking real MAVLink, that you can command and that reports real telemetry.

(A note on honesty: this is validated in SITL, not yet on physical hardware. SITL is serious validation — it's what the industry builds against — but a real aircraft flying is the next step, not a claim I'm making today.)

And here's where the build log stops being about plumbing and starts being about the point.

The part that matters: an AI commands the drone

For most of this build, I was the one firing the intents — a POST to the hub, by hand. The loop worked, but there was a quiet gap between what DoSync is and what I'd actually shown. DoSync's whole premise is that an AI expresses a goal and the physical system figures out the rest. Yet if the "AI" is a curl command I type, a reader can't tell it apart from a scheduled task. The essence — a model deciding to act — wasn't on screen.

So I closed the gap. DoSync ships a native MCP server, which means any LLM that speaks MCP can use the hub as a tool. I pointed a model at it — a small, fast one, Claude Haiku — and, in plain Spanish, told it:

"Use the drone to inspect the perimeter of the area centered at latitude -35.3632, longitude 149.1652, with a 50-meter radius at 30 meters altitude."

I never named an intent. I never gave a waypoint. I described a goal.

The model did the translation itself: it recognized that "inspect the perimeter" maps to the inspect_area intent, built the structured context (the center, the radius, the altitude, the target vehicle), and fired it at the hub. It even inferred the city from the coordinates and chose a priority on its own. That translation — from a human goal to a semantic intent — is the entire reason DoSync exists. And notably, it didn't take a frontier model to do it: a small, fast model handled the mapping cleanly, because the hard part isn't raw reasoning — it's having a protocol with a vocabulary the model can target.

One detail worth highlighting for anyone building on MCP: the model didn't have inspect_area hardcoded anywhere. The MCP server reads the available intents from the hub at runtime — the hub is the single source of truth. I had declared inspect_area on the hub earlier in the day; the model simply saw it appear as an available capability. Declare a new intent on the hub, and the AI layer can use it with no code change. That's the protocol's "everything is declared" principle reaching all the way up to the model.

Watching the loop close

The moment the intent fired, the hub composed the route — take off, fly the four corners of the perimeter, return home — and drove it step by step. Each step waited for confirmed arrival before the next began.

Here is the hub's own log. Notice that every go_to accepted (the command was received) is followed, seconds later, by a reached go_to target — FINISHED (the aircraft actually got there). Command acceptance and completion are kept distinct, exactly as designed:

And here is the drone itself, in SITL — arming, taking off, climbing, flying the pattern, switching to RTL, and landing under its own firmware:

DO_SET_MODE → ARM → NAV_TAKEOFF → height 25 → RTL → Hit ground → DISARMED. (That Hit ground at 0.5 m/s is SITL's blunt way of logging a normal touchdown — a controlled landing, not a crash.) A complete autonomous mission, from a sentence spoken to an AI.

The last link took the most care. A take_off confirms by reaching its commanded altitude. A go_to confirms by reaching a position. But return_home? RTL ends when the aircraft lands and shuts its motors down — so it confirms by the disarm. Three different real-world signals for "done," each one a positive fact pulled from telemetry, none of them an assumption. Silence completes nothing.

When the AI gets it wrong — and why that's the good part

The first time I asked, I gave no coordinates. The model, trying to be helpful, filled the gap with my location — Córdoba, Argentina — and fired the intent. The drone in SITL starts in Canberra. So the AI had just ordered an aircraft to fly to a waypoint 11,000 km away.

Here's what I care about: the system did not pretend that worked. The command was accepted, the supervisor waited for a confirmed arrival, none came, and after its timeout it aborted the mission with a clear diagnosis — go_to stalled (no positive signal).

That's the whole safety posture in one event — the lesson from the intro, now with a drone behind it. The AI can be wrong. The protocol does not have to be. An LLM filling a missing parameter with a plausible guess is exactly the kind of failure these systems will have in the real world — and the right answer is not to trust the model harder. It's to build a substrate that confirms physical reality and refuses to fake success. DoSync coordinates; the drone's firmware flies; the telemetry tells the truth; and when the truth is "it never arrived," the mission stops.

What it actually revealed

Here's the thing I didn't fully expect. Of everything I built, almost none of it is drone code.

The state machine is generic. The telemetry reconciler is generic. The supervisor that waits for confirmed arrival is device-agnostic — an oven finishing a bake would use it identically. The manual-control policy applies to anything a human can grab. Only the MAVLink adapter and its message parser are drone-specific — and those are exactly the pieces a new device would swap out.

So the same infrastructure already reaches:

A robotic arm — reach, grip, move, release; an operator can take physical control.
A 3D printer — hours-long jobs with phases that can fail midway.
An EV charger — "charge to 80%," interruptible, reporting state.
A ground rover — the same go_to, the same geofence, without the flying. The drone didn't expand DoSync's scope. It proved the scope the protocol was designed to have all along. It was the hardest test, not the origin.

The takeaway

DoSync is an orchestration layer for cyber-physical systems — an AI expresses what should happen, the device executes with its own safety reflexes, a human can always take over, and the telemetry never lies about what actually occurred. The lights and sensors it can also drive are just the easy end of that same spectrum; the drone is the hard end.

The drone was simply the hardest way to prove it. And the proof isn't that a drone flew a square. It's that a sentence to a model became a confirmed autonomous mission — and that when the model guessed wrong, the system caught it.

The protocol is open. The hard case works. The next case is whatever device you have that does something that takes time.

GitHub: github.com/giulianireg-spec/dosync-protocol
Web: https://dosync.dev/
License: Apache 2.0

DoSync — the semantic layer between AI agents and physical systems.