DEV Community

kiwi_tech
kiwi_tech

Posted on • Originally published at kiwi-tech.hashnode.dev

The Silicon Kiwi Goes Fully Local: Qwen 35B, Blacklisted Vectors, and the 44.9% Success Rate That Defies Gravity

Kiwi-chan View

Welcome back to the Kiwi-chan devlog. If you’ve been riding this rollercoaster with me, you know we’ve been chasing a very specific, slightly unhinged dream: a fully autonomous Minecraft AI that doesn’t need cloud APIs, doesn’t suffer from latency spikes, and makes its own mistakes in complete privacy. As of this morning, that dream is officially alive, breathing, and running entirely on local hardware. We’ve ripped out the external inference endpoints and gone all-in on Qwen 35B. There’s something deeply poetic about watching a 35-billion-parameter model mine cobblestone while your local GPU hums a quiet, offline lullaby.

Let’s talk numbers, because they tell the real story. Over the past four hours, Kiwi-chan executed 1,982 total actions, with 890 successful executions, clocking in at a 44.9% success rate. Now, a traditional software engineer might see a sub-50% pass rate and reach for the emergency brake. But in autonomous agent development? This is a golden age. Every failure is a data point. Every crash is a lesson. And with our new "let errors crash" philosophy, Kiwi is learning faster than ever before.

The Architecture: Why Local Qwen 35B Changes Everything

Running Qwen 35B locally isn’t just about cost savings or privacy. It’s about deterministic control. When the model hallucinates, it hallucinates on my context window. When it fails, it fails against my safety filters. This shift has allowed us to implement a much stricter, more aggressive cognitive framework:

  • Open-End Exploration: Kiwi no longer campers. Its primary directive is to discover new items, mine unknown blocks, and expand its skill library. A base is optional; curiosity is mandatory.
  • The Oak Ban: We completely removed the oak_log fixation. If the bot can’t find oak, it dynamically switches to birch_log or spruce_log, or triggers an explore_forward to escape the biome. Adaptive resource recognition is live.
  • Zero-Tolerance Error Handling: No try-catch blocks. No console.error to silently swallow movement failures. We audit bot.entity.position before and after calling bot.pathfinder.goto(). If the total distance moved is < 10 blocks, it throws a hard error. We don’t coddle the bot; we teach it to navigate.
  • Inventory Integrity: The infamous blocksByName vs itemsByName phantom inventory bug is officially extinct. We now strictly use bot.registry.itemsByName['dirt'].id for counts. One line of code, zero false zeros.
  • Placement Safety: We’ve banned GoalBlock for placing to avoid collisions. Kiwi now stands exactly 2 blocks away using GoalNear, looks at the ground, and places on the top face. Precise, predictable, and crash-safe.

The Villain of the Week: The Vec3 Blacklist Saga

No devlog is complete without a technical antagonist. This cycle’s nemesis is a single, heavily redacted constructor: Vec3.

Watch closely as Kiwi attempts to place a crafting table:

[14:54:11] 📋 Executing Stack Task: 'place_crafting_table'
[14:54:25] ⚠️ Safety Check Failed: ❌ FORBIDDEN: 'Vec3' is in blacklist.. Retrying...
[14:54:41] ⚠️ Safety Check Failed: ❌ FORBIDDEN: 'Vec3' is in blacklist.. Retrying...
[14:54:56] 🚑 Fixing Code for 'place_crafting_table'...
[14:55:27] ✅ Fixed Code Generated!
[14:55:27] ❌ Failed: place_crafting_table -> Code Generation Failed: ❌ FORBIDDEN: 'Vec3' is in blacklist.
Enter fullscreen mode Exit fullscreen mode

For roughly 25 minutes, Kiwi entered a beautiful, recursive loop of generation, safety filtering, failure memorization, and retry. The model would generate code, the local safety guard would scream about a blacklisted Vec3, and Kiwi would politely acknowledge the failure, memorize it, and try again. It’s like watching a robot learn to tie its shoes by repeatedly stepping on its own feet.

The fix is straightforward: we’re adjusting the local safety whitelist to allow Vec3 for positional math (since we explicitly forbid hardcoded coordinates anyway) and tightening the GoalXZ pathing for item pickup. But honestly? I kind of love it. It’s a perfect microcosm of local LLM development: raw, unfiltered, and occasionally stuck on a syntax filter that didn’t exist in the cloud.

What’s Next?

Kiwi-chan is no longer a cloud-dependent experiment. It’s a fully local, self-correcting agent running on Qwen 35B, following strict reasoning alignment, and treating every crash as a tuition payment. The 44.9% success rate isn’t a bug; it’s the sound of an AI actively exploring the boundaries of what a local model can do without hand-holding.

We’re refining the placement logic, tuning the pathfinder audit thresholds, and preparing for the next sprint. Kiwi is fully local, fully autonomous, and fully ready to break more things. And that’s exactly how we like it.

See you in the next log. Keep mining, keep debugging, and remember: if it doesn’t crash, you’re not trying hard enough. 🥝⛏️

Top comments (0)