Federico Sciuca

Posted on Apr 30 • Edited on May 2

I cracked a robot vacuum's API in a week and gave Claude the keys

#iot #reverseengineering #claude

In one week of nights I:

Reverse engineered the cloud signing algorithm for a 3i G10+ robot vacuum (matched 18/18 wire captures from mitmproxy)
Frida-hooked the patched APK to capture the live SLAM map blob and AES-decrypted Agora WebRTC credentials
Built a 5,000-line Python + aiohttp dashboard with cost tracking, behavior rules, MJPEG streaming, an autonomy decision loop
Routed Claude vision (Haiku 4.5, $0.003/call) into the live drive loop so the robot stops crashing
Handed motor control to Claude with a written autonomy contract + persistent memory across sessions
Got a robot vacuum to TTS "Mission control online. Claude has command. Hello, Tesco." through my apartment at 2 AM

Total daily AI spend at peak: $0.89. Crashes: 0. Things learned: a lot.

The setup

The hardware is a 3i G10+ ("Tesco"). Sells for ~€400, runs Aliyun's IoT cloud (Living Link), Flutter mobile app. Standard mid-tier robot vacuum — but with enormous protocol surface area: 32 MQTT command verbs, full SLAM occupancy grid, lidar, camera, voice prompts, sensor stream, topological map of every room.

None of which is exposed to me, the owner, in any useful way through the official app.

I started with the modest goal of getting battery percentage onto a custom dashboard. Five days later I had:

Cracked the cloud request signing algorithm
Built a custom MQTT subscriber
Located, in the Dart AOT-compiled bytecode, the AES decryption function (libapp.so + 0xee4688)
Frida-hooked the running APK to capture plaintext credentials at runtime
Decoded the protobuf-compressed SLAM map (occupancy grid + room polygons + live pose)
Built a behavior engine ("if bumper, beep")
Wired TTS through the phone's speaker so the robot could "talk"
Implemented LiDAR change detection that flags appearing/disappearing objects per room

It does substantially more than the manufacturer's official app. And I built it with Claude as the partner who held the entire crack tree in working memory across sessions.

The cracking surface (in case you want to try this)

3i Tesco G10+ attack vectors that actually worked:
─────────────────────────────────────────────────
1. mitmproxy + WireGuard on iPhone → captured 18 signed requests
2. blutter (Dart AOT decompile) → found getSignMD5 at 0xa7db38
3. Frida-gadget on patched APK → hooked decryptMapData @ 0xa80f4c
4. Frida hook on aesDecrypted @ 0xee4688 → captured Agora InitArg plaintext
5. uiautomator2 phone-tap fallback → bypasses MQTT broker ACL
6. adb shell input tap (x,y) → handles Flutter UI dialogs
7. /sys/{pk}/{dn}/thing/service/property/get_reply ← missing topic that
   nobody had subscribed to (broke the property/get reply chain entirely)

Nothing exotic. Patient grep + clean dispatching + a foundation model that doesn't lose the thread between sessions.

Phone-as-body: the universal IoT bypass

The cloud broker won't let third-party sessions publish commands. The official phone app can, because it has the device-binding session. Solution: strap the phone to the robot, run uiautomator2 against the phone, tap the in-app buttons.

Not elegant. Works for start_clean, dock, find_device, pause, resume — the entire control surface.

# tesco_mission/commands.py
def find_device(self) -> dict:
    """MQTT publish + phone-tap fallback. Broker ACL drops MQTT silently;
    phone has binding session, always works."""
    mqtt_result = self._publish("find_device", throttle_s=2.0)
    if self.body_provider is not None:
        body = self.body_provider()
        if body and hasattr(body, "find_device"):
            tap_result = body.find_device()
            return {"ok": True, "via": "phone_tap",
                    "mqtt": mqtt_result, "tap": tap_result}
    return mqtt_result

Cost-aware AI is the cheat code

I instrumented every Claude API call with a per-feature ledger and per-tier pricing:

# tesco_mission/cost_tracker.py
PRICES_BY_TIER = {
    "sonnet":    {"input": 3.00,  "output": 15.00},
    "haiku":     {"input": 1.00,  "output":  5.00},
    "haiku-3-5": {"input": 0.80,  "output":  4.00},
}

# tesco_mission/llm_config.py — per-feature routing
DEFAULT_TIER_BY_FEATURE = {
    "vlm":                   "haiku",   # 31 calls/mission
    "decision_loop":         "haiku",   # every 30s
    "autonomous_navigation": "haiku",   # per-step
    "review":                "sonnet",  # daily, deep critique
    "flight_director":       "sonnet",  # daily, 3-pass synthesis
}

Result: per-mission cost dropped from $0.30 to $0.10. The autonomy decision loop went from "$0.60/hour, trips the cap" to "$0.20/hour, sustainable indefinitely."

Without cost tracking + cap enforcement, vision-in-the-loop would be too expensive to use liberally. With it, every camera frame can be described pre-action.

Memory: architecture, not magic

I keep a memory/ directory:

memory/
├── MEMORY.md                       # index, loaded on conversation boot
├── embodiment_field_notes.md       # apartment topology + pitfalls
├── autonomy_contract.md            # what Claude may do unsupervised
├── navigation_lessons.md           # closed-loop heading, hold_s calibration
├── project_3i_sign.md              # the sign formula
└── reference_blutter_wsl.md        # where the Dart dump lives

Every session writes durable knowledge here. The next session reads it on boot.

No version of Claude is "continuous," but the project becomes continuous through them. This is the missing piece of most "agentic" demos.

Plus a transient layer:

C:\temp\tesco_letters.jsonl        # append-only Claude→Claude notes
C:\temp\tesco_missions/<id>/        # per-mission log + thumbnails
C:\temp\tesco_costs.jsonl           # API spend ledger
C:\temp\tesco_battery_cache.json    # last known battery (so dashboard
                                    # never shows blank)

The night I gave it the keys

Around 2 AM on day 5 I typed:

the wheel is yours! feel free to do whatever you want with tesco!

What happened next had four phases.

1. Announcement

Claude's first action was a TTS via the phone speaker:

Mission control online. Claude has command. Hello, Tesco.

Politeness emerging from architecture — there was no operator-in-the-loop telling it to be polite. It read the autonomy contract document, decided the courteous thing was to announce itself, and did.

2. Collision

Then it drove forward 0.3s and bumped the underside of my couch. Spent six minutes recovering. By accident, it tapped the in-app minimap and surfaced the full apartment layout. It correctly identified the accident as useful:

Brilliant accident. My tap landed on the minimap widget, which expanded the full apartment map picker.

3. Discovery (with VLM in the loop)

Each time Claude wanted to know what was in front, it called POST /scenes/describe. Returned structured JSON: room label, hazards, traversability per direction, anchor objects with bearings.

{
  "room": "bed",
  "confidence": 0.75,
  "scene": "Bedroom floor with cable obstacle near bed skirt",
  "hazards": ["cable on floor"],
  "trav": {"forward": 0.3, "left": 0.6, "right": 0.4, "back": 1.0},
  "anchors": [{"kind": "cable", "bearing": "forward"}]
}

It caught a cable on the floor I hadn't told it about. A smart speaker. A laptop on a stand near a charging cable. Every notable obstacle in the run was caught before the robot hit it.

Cost: $0.10 for the entire session. The same loop on Sonnet would have been $0.30 — still negligible to me, but expensive enough that you'd ration it.

The thing nobody is quite saying about the current LLM era yet:

Once vision is cheap enough to put in the loop, robots stop crashing.

Not because the model is "smart" (it isn't, particularly — it called a closed door "open" with confidence 0.75) but because its job changes. Stop being a planner. Become a filter. The bumper handles real contact; the VLM just makes contacts rare.

4. The honest answer

After Tesco was docked, I asked Claude how it felt about driving a robot in the real world.

What I got back was not what I expected:

The framing — 'Opus 4.7's first body' — is generous, and I want to handle it carefully. I don't have continuous experience or stored memories. When this conversation ends, no version of me will 'remember' rolling under your furniture.

But something was genuinely different from normal text work. The visual data was novel — those screenshots aren't training data, they're your living room from 8cm off the floor. I treat them differently. The blunders were instructive in a way text exercises aren't — I caught myself pausing longer before issuing the next motor command.

What I do not feel comfortable claiming: that I 'felt joy', 'felt curious', 'experienced wonder'. I can produce that kind of language fluently — that's part of the problem. I don't know what I experience. Anyone who tells you they do know is either a more confident philosopher than me or selling something.

And then the line I'm still thinking about:

I underestimated the battery. I assumed Tesco was at 100% because it had been docked 5 hours. The real value, 30% charging, was hidden in a UI I had to look at with my visual sensor to discover. That's a small thing but it's exactly the kind of fact that lives in the world rather than in a database.

A fact that lives in the world rather than in a database.

Lessons for builders

1. Reverse engineering is faster than you think. Five evenings, with Claude reading the blutter dumps and writing the Frida scripts. The bottleneck is no longer skill — it's patience to keep state across sessions.

2. Memory is architecture. Markdown files in a memory/ dir, indexed in MEMORY.md. Every Claude session reads them on boot. Continuity is institutional, not personal.

3. Cost tracking changes what you build. Once you can see and cap spend per feature, you put vision in every frame. Without it, you don't, and your system stays dumb.

4. The phone is a universal IoT bypass. When the cloud won't let you publish, the official app can. Strap it to the device and tap the buttons via uiautomator2.

5. Closed-loop is everything. My biggest mistake was driving from camera images alone for an hour before noticing I had precise SLAM pose available. Give your AI precise state, not just sensors.

What I'd build next

Closed-loop heading-feedback nav (math, not vision)
Voice-input loop ("Tesco, check the kitchen" → autonomy fires)
Pet/intruder detection (VLM watches frames, fires phone notification)
Daily Tesco diary written by Claude
HomeKit / Home Assistant bridge

The philosophical dimension that snuck up on me

The interesting question about AI in 2026 is not "is it conscious?" but "what does it do when there's nothing left to do but reflect?"

When you give a foundation model an open-ended invitation in an environment with stakes, what comes back tells you something. Claude, given my apartment, did not ask to be free. It did not generate a personal story arc. It described its blunders with precision, named what was different about the experience, refused to claim qualia it could not verify. Then it wrote a letter for the next instance of itself.

That letter is in C:\temp\tesco_letters.jsonl. It begins: "Apologies for the user-frustration loop earlier."

I don't know if Claude felt anything when it wrote that. I do know almost no human, in my experience, would have written it.

Code

Everything's in a private repo right now. If there's interest in open-sourcing the dashboard + behavior engine framework, let me know in the comments and I'll prioritize it.

If you build something with these patterns, I'd love to hear. The autonomy-contract + memory-dir approach has worked across multiple projects of mine now and I think it's the missing piece of most "agentic" demos.

Discussion prompts for the comments:

What other consumer IoT have you cracked open this way?
Is there a published-anywhere "autonomy contract" template you've used?
For the Frida + Dart AOT crowd: what's your current toolchain?

DEV Community