DEV Community

Florian Horner
Florian Horner

Posted on

I'm a seller, not a developer — I triaged a Rust crash in a 1,300-star project

My smart home went dark on a weeknight.

I got home, tapped the light switch in Home Assistant — nothing. Every Govee light in my apartment — bedroom, living room, kitchen — showed “unavailable.” The service bridging my Govee devices to HA was stuck in a crash loop. I checked the logs and found this:

thread 'main' panicked at 'byte index 1 is not a char boundary; `用` is 3 bytes long'
Enter fullscreen mode Exit fullscreen mode

I don't write Rust. I sell cloud infrastructure at AWS. But my lights were off, the maintainer was asleep in a different timezone, and the issue tracker was filling up with other users hitting the same crash. So I opened Claude Code and started reading.

The crash

govee2mqtt is a ~1,300-star Rust project that bridges Govee smart home devices to Home Assistant via MQTT. It's the most complete integration available — and it's maintained by one person, wez.

That morning, Govee pushed an API change. Preset names for multi-head lamps switched from English camelCase strings like "colorMode" to Chinese characters like "用于三灯头中的第二个" — "for the second of three light heads." Within hours, users started reporting crashes. The issue thread grew quickly — people whose entire smart home setups had stopped working.

But the reports were scattered. Some said "my H6076 crashed." Others said "all my devices are unavailable." One person was trying to exclude devices via a TOML config that the HA add-on variant doesn't even support. From the outside, these looked like different problems. They weren't.

The crash happened during device enumeration, which meant it took down the entire bridge — not just the lamps with Chinese preset names, but every Govee device managed by the service. The watchdog restarted the add-on, it queried the API, hit the same string, panicked again. Infinite crash loop. No workaround from the HA add-on UI.

Understanding the bug

I pointed Claude Code at the panic message. It walked me through how UTF-8 encoding works — ASCII characters take one byte, but Chinese characters take three. Slicing a string at byte position 1 lands in the middle of a multi-byte character. Rust panics rather than silently giving you garbage data.

Claude traced the stack to a function called camel_case_to_space_separated() in hass.rs — it turns strings like "powerSwitch" into "Power Switch" for display in Home Assistant. The problem was this line:

let mut result = camel[..1].to_ascii_uppercase();
Enter fullscreen mode Exit fullscreen mode

That camel[..1] slices the first byte. For ASCII, one byte equals one character. For , three bytes equals one character. Slice at byte 1 — panic.

The fix: replace byte slicing with Rust's character iterator (chars().next()), which yields complete characters regardless of byte width, and add an empty-string guard the original never had. The conceptual change was small — stop treating bytes as characters — but it required rewriting how the function bootstraps its first iteration.

I understood the bug before I understood Rust syntax. I still don't know what let-else means in Rust's type system. But I knew what the fix was doing because Claude had already explained the problem in terms I could follow. That was enough to review the code, evaluate the approach, and move to the part where I could actually add value.

The triage

Here's what I brought to this that had nothing to do with code.

I sell enterprise cloud infrastructure. My day job is managing complex, multi-stakeholder escalations — customer is down, five teams are pointing fingers, someone needs to cut through the noise and drive resolution. That muscle kicked in immediately.

The issue thread was chaos. Multiple users, multiple device models, overlapping symptoms, people trying workarounds that couldn't work. From each user's perspective, their individual problem was unique: "my floor lamp crashed," "my lights are unavailable," "the add-on keeps restarting." From my perspective — after reading the reports and the crash log — they were all the same bug.

So I wrote a structured incident report in the issue thread:

  • Affected devices: Specific device names, models, and SKUs
  • Symptoms: Total bridge failure, not just individual device failure
  • Crash log: Full stack trace with the exact file and line number
  • Root cause: Chinese preset names from Govee API hitting byte-indexed string slicing
  • Impact assessment: Infinite crash loop — watchdog restart hits the same panic
  • Why workarounds fail: The HA add-on ignores config_path, so you can't exclude devices via TOML. Cache purge buttons are unavailable because the bridge never starts.
  • Suggested fix: Use .chars() instead of byte indexing

That comment did two things. First, it turned a scattered thread into a single, coherent problem statement. Anyone landing on the issue — including the maintainer — could understand the full scope in 60 seconds. Second, it killed the noise. People stopped opening duplicate issues and started consolidating in one thread.

I also researched every possible workaround and documented which ones actually worked:

  • Remove devices from Govee account — works, but you lose app control
  • Switch from HA add-on to Docker — works, supports TOML config filtering, but requires more setup
  • Wait for the fix — the only real answer for most users
  • SSH in and patch the binary — fragile, not recommended

And it didn’t stop with one comment. In the days that followed, I went through new incident reports daily — other users hitting the same crash in different contexts, posting in different threads — and linked them all back to the central issue. Every scattered conversation funneled into one place. By the time the maintainer looked at it, the full picture was already assembled.

Most of this wasn't coding. It was the same work I do in enterprise escalations: consolidate the signal, kill the noise, document the workarounds, and make the resolution path obvious.

Making the merge frictionless

At work, when I need an executive to approve something, I don't send a 40-page deck. I send one slide with the problem, the fix, and the risk of not acting. Same principle here. wez maintains this project solo, in his free time. Every minute he spends understanding a PR is a minute he's not spending on his actual job. My goal was to make the merge a one-click decision.

theg1nger opened PR #606 with the same fix approach I'd identified — converging independently on the same solution. I reviewed their PR and suggested adding regression tests: Chinese characters that triggered the original crash, empty strings, emoji. They added them. The PR was now clean: small diff, clear description, regression tests, no scope creep. One button to merge.

During this same window, an AI-generated PR (#612, submitted via OpenClaw) appeared. It had the same code fix — but no context, no tests, no triage, no workaround documentation. It was closed. The contrast was instructive: a code diff without understanding isn't a contribution. It's a task for the maintainer to evaluate, verify, and clean up — the opposite of reducing their burden.

The fork

But PRs in a one-maintainer project don't merge on your schedule. Meanwhile, people's lights were broken. So I built a fork.

govee2mqtt-extended included the fix plus pre-built Home Assistant add-on images. Users could install the patched version directly from the HA add-on store — no Rust toolchain, no Docker setup, no command line. Click install, restart, lights work.

Multiple users confirmed it working within hours. The fork picked up stars and I added features beyond upstream while maintaining compatibility — things I could now contribute because the codebase was no longer opaque to me. It served as the stopgap until wez merged PR #606 about two weeks later. The fork had done its job: nobody's smart home had to stay broken while the upstream process ran its course.

  • Day 0: Bug reported, crash affecting multiple users
  • Day 2: Root cause identified, triage complete, workarounds documented
  • Day 5: Fork with fix + pre-built addon available to users
  • ~Day 14: Fix merged upstream

What I learned

About escalation culture vs. open source: At work, when a customer is down, I escalate. There's a severity process, an on-call rotation, an expectation that critical bugs get fixed in hours. None of that exists in open source. The maintainer owes you nothing — they built this for free. The "escalation path" is making the merge so frictionless that saying yes is easier than saying no: clean code, good tests, clear description, zero scope creep, and a triage comment that saves the maintainer from reading a scattered issue thread. I had to invert everything I do at work. Instead of pushing urgency upward, I had to pull obstacles downward.

About what AI actually contributed: Claude Code didn't ship this fix. It translated Rust into something I could reason about. I still can't write Rust from scratch — I don't understand the type system, I don't know the borrow checker, I couldn't set up a Cargo project from memory. But I could read the crash log, follow the stack trace, understand why byte indexing fails on UTF-8, and evaluate whether the proposed fix was correct. Claude was the technical translator. The triage, the coordination, the workaround research, the fork distribution, the PR review — that was me applying skills I already had to a domain I'd never worked in.

The barrier to open source contribution is lower than it’s ever been. Not because AI writes the code for you — another AI submitted the same fix with no tests, no triage, no context, and the PR was closed. The code alone wasn’t enough. What shipped it was everything I did around it: the incident consolidation, the workaround documentation, the test suggestions, the fork that got people’s lights back on while the upstream process ran its course.


Links:

Top comments (0)