KIWI-CHAN GOES OFFLINE: How Qwen 35B Turned a 44% Success Rate into a Fully Local Minecraft Maverick

#ai #minecraft #web3 #opensource

Welcome back to the lab, folks. If you’ve been watching Kiwi-chan’s sandbox, you know she’s been running on a tight leash of API calls and cloud latency for months. But over the last four hours, we pulled the plug on the cloud entirely. Kiwi-chan is now 100% local, running on a quantized Qwen 35B instance, and frankly, she’s acting like a seasoned miner who just realized she doesn’t need a cloud babysitter to tell her where to dig.

Let’s talk numbers, failures, and the beautiful chaos of autonomous Minecraft AI.

The 4-Hour Sprint: By The Numbers

In just a 4-hour window, Kiwi-chan executed Total Actions: 2428, landed Success: 1075, and currently sits at a Rate: 44.3%.

Yeah, you read that right. Less than half are sticking. But here’s the kicker: that 44.3% isn’t a bug—it’s the sound of a local LLM learning to stop hallucinating and start verifying. Every failed action is logged, analyzed, and fed back into her reasoning loop. We’re not chasing 99% accuracy for the sake of vanity metrics; we’re chasing deterministic, self-correcting autonomy. And with Qwen 35B running locally, the latency is down to single-digit seconds. The model isn’t just thinking faster; it’s thinking locally.

Severing the Tether: The Qwen 35B Transition

The transition to a fully local pipeline wasn’t just a cost-saving measure; it was an architectural necessity. Minecraft’s physics engine doesn’t care about your API rate limits. When you’re dealing with strict inventory audits, block collision rules, and pathfinding timeouts, a 12-second cloud roundtrip means your bot either crashes or places a furnace inside a wall.

By spinning up Qwen 35B on our local GPU cluster, we gave Kiwi-chan the ability to reason in real-time. No more waiting for a server to tell her to explore_forward. No more hallucinated coordinates. Just raw, quantized tokens feeding directly into her decision tree. The result? A bot that respects the rules, fails loudly, and learns faster.

Under the Hood: Brain Logs & The Art of Failure

Let’s peek at the recent brain logs. They read like a comedy of errors, but they’re actually a masterclass in local agent debugging.

Take the place_furnace saga. The logs show her attempting the task, hitting ⚠️ Code extraction failed. Retrying... three times, and eventually crashing with:
❌ Failed: place_furnace -> Could not find suitable ground to place furnace.

Why? Minecraft’s block collision rules are brutal. Remember Rule 5 from our codex: you MUST equip the item, stand exactly 2 blocks away to avoid self-collision, then place it. Kiwi-chan’s local model is learning this the hard way, but notice the recovery pipeline: 💡 [Skip Fix] This is a terrain/environment issue, not a code bug. Skipping AI fix. That’s Qwen 35B exercising judgment. It’s distinguishing between syntax errors and biome realities. When the terrain won’t cooperate, the model pivots instead of brute-forcing a stack overflow.

Then there’s the explore_forward loop. The logs show her triggering a 🥱 BOREDOM TRIGGERED! alert, asking the local LLM for a new goal, and getting stuck in an extraction loop. But look at the code that finally made it through:

// 1. Calculate a random target 30-40 blocks away
const distance = 30 + Math.random() * 10;
const angle = Math.random() * Math.PI * 2;
const targetX = bot.entity.position.x + Math.cos(angle) * distance;
const targetZ = bot.entity.position.z + Math.sin(angle) * distance;

No hardcoded Vec3 coordinates (Rule 4 honored). It calculates a target dynamically, records beforePos and afterPos, and throws a hard error if distanceMoved < 10. This is exactly the kind of deterministic behavior we wanted. The 44.3% success rate proves the feedback loop is working. She’s not just retrying; she’s adapting to the spatial constraints.

The 44.3% Reality Check

You might be wondering, “Why is the success rate hovering around 44%?” Two reasons.

First, Minecraft’s physics engine is unforgiving. Rule 9 explicitly states: Mining 'stone' drops 'cobblestone'. You MUST name your goal 'gather_cobblestone'... If you do NOT name it 'mine_stone' or the inventory audit will fail. Kiwi-chan’s local model is learning to map gameplay actions to exact registry keys. When she names it wrong, the inventory audit crashes. When she names it right, the success rate climbs.

Second, we enforce a strict NO ERROR HIDING policy (Rule 3). We don’t swallow failures with try-catch blocks. We let them crash. Every crash is a labeled data point. The 44.3% represents actions that successfully completed their primary objective within the tick limit. The rest are training fuel. With Qwen 35B running locally, we can iterate on those failures in minutes,

Call to Action:

This is a passion project, and it's running on a frankly terrifying "Frankenstein" rig of GPUs. Every little bit helps!

🛡️ Join the inner circle on Patreon for monthly support and exclusive updates: https://www.patreon.com/15923261/join
☕ Tip me a coffee on Ko-fi for a one-time boost: https://ko-fi.com/kiwitech

All contributions directly help upgrade my melting GPU rig to an RTX 3060! 🥝✨ Let's get Kiwi-chan out of the debugging woods and into a proper Minecraft world!