Kiwi-chan Goes Fully Local: 46% Success Rate and the Qwen 35B Revolution

#ai #minecraft #web3 #opensource

Devlog #42: The Silicon Sandbox

It has been a wild four hours in the server room (and the server). We have officially crossed a major milestone: Kiwi-chan is no longer relying on the cloud for her cognitive processes. She is fully local. And by "local," I mean running on a beefy consumer GPU, chugging through Qwen 35B, and making decisions in real-time.

The result? She’s not just surviving; she’s learning, failing, and occasionally hallucinating her way into new biomes. Let’s look at the numbers.

The Stats: Chaos in Numbers

Over the last 240 minutes, Kiwi-chan executed a total of 3068 actions. Now, before you judge, let’s talk context. In autonomous AI, "action" isn't just "move forward." It’s a full cycle of perception, reasoning, code generation, execution, and error handling.

Total Actions: 3068
Successes: 1416
Success Rate: 46.2%

A 46.2% success rate might sound like a failing grade in school, but in the world of LLM-driven robotics, this is actually quite robust. Why? Because the other 53.8% of actions aren’t "failures" in the traditional sense—they are data points. Every time Kiwi-chan fails to mine coal, she logs it. Every time she gets bored of birch logs, she pivots. She is building a knowledge base of what doesn’t work, which is arguably more valuable than what does.

The Qwen 35B Transition

The big news this cycle is the switch to Qwen 35B. Previously, Kiwi-chan was using a smaller model that struggled with complex reasoning. Qwen 35B is a beast. It handles the "Coach" logic (the system that assigns goals) and the "Agent" logic (the bot that writes code) with significantly less hallucination.

However, the local constraint means token limits are tight. We’ve been pushing Qwen to the brink.

Case Study: The Coal Ore Obsession

Look at the recent failure log. It’s almost poetic in its repetition:

[RECENT FAILURES]
"mine_stone",
"mine_coal_ore",
"mine_coal_ore",
"mine_coal_ore",
"mine_coal_ore"

Kiwi-chan found a coal ore. She tried to mine it. She failed. The local LLM, Qwen 35B, generated a recovery plan: ['explore_forward', 'mine_coal_ore']. She explored. She found more coal. She failed again.

Why? Because the code generation for mine_coal_ore was hitting a subtle physics bug in the bot.pathfinder.goto logic. But here’s the beauty of the local system: Qwen saw the error.

In the Brain Log:

[03:29:05] ❌ Failed: mine_coal_ore -> Failed to mine coal_ore.
[03:29:06] 💀 Failure Memorized: mine_coal_ore
[03:29:06] 🚑 Asking Qwen for Recovery Plan...
[03:29:47] 💊 Recovery Plan: ['explore_forward', 'mine_coal_ore']

She didn’t just crash. She memorized the failure. She tried to fix it. She even hit token limits during the reasoning phase:

[03:39:10] 📊 [行動生成(修正)(エラー)][質問] ... 10224 token (上限突破)

She broke the limit. And yet, she kept going. This is the resilience of a fully local agent. No API calls, no rate limits, just raw compute and stubbornness.

The "Boredom" Mechanic

One of the most interesting developments is the Boredom Trigger. Kiwi-chan is no longer a mindless script. She has a state of mind.

[03:26:29] 🥱 BOREDOM TRIGGERED! Bot is bored of 'gather_birch_log'.
[03:26:29] 🧠 Asking Local LLM for next goal (Text-Only Mode)...

When she gathered 6 birch logs in a row, the system flagged it as "boring." Qwen 35B, acting as the Coach, decided: "You know what? Let’s mine coal instead." This emergent behavior is exactly what we want. It makes Kiwi-chan feel less like a tool and more like a curious explorer.

Technical Deep Dive: The Code

Let’s look at how Qwen 35B is generating code for these actions. Here’s the mine_coal_ore script it wrote after a failure:

async function main() {
  // BASE PROTECTION
  if (bot.pathfinder.movements) {
    bot.pathfinder.movements.blocksCantBreak.add(bot.registry.blocksByName['crafting_table'].id);
    bot.pathfinder.movements.blocksCantBreak.add(bot.registry.blocksByName['chest'].id);
    bot.pathfinder.movements.blocksCantBreak.add(bot.registry.blocksByName['furnace'].id);
  }

  // 1. CHECK TOOLS
  const stonePickaxeId = bot.registry.itemsByName['stone_pickaxe'].id;
  if (bot.inventory.count(stonePickaxeId) < 1) {
    throw new Error("Missing stone_pickaxe.");
  }

  // 2. FIND COAL ORE
  const coalOreBlockId = bot.registry.blocksByName['coal_ore'].id;
  const coalItemId = bot.registry.itemsByName['coal'].id;

  const coalOre = bot.findBlock({
    matching: coalOreBlockId,
    maxDistance: 32,
    useExtraInfo: (b) => Math.abs(b.position.y - bot.entity.position.y) <= 4
  });

  if (!coalOre) {
    throw new Error("Could not find coal_ore.");
  }

  // 3. AUDIT BEFORE (Check 'coal' item count, not 'coal_ore')
  const beforeCount = bot.inventory.count(coalItemId);

  // 4. EQUIP & DIG
  await bot.equip(stonePickaxeId, 'hand');

  // Move close to dig
  await bot.pathfinder.goto(new goals.GoalNear(coalOre.position.x, coalOre.position.y, coalOre.position.z, 2));

  // Look and dig
  await bot.lookAt(coalOre.position);
  await bot.dig(coalOre);

  // 5. COLLECT DROP
  // Move to the center of the block to pick up the item
  await bot.pathfinder.goto(new goals.GoalXZ(coalOre.position.x + 0.5, coalOre.position.z + 0.5));
  await bot.waitForTicks(40);

  // 6. VERIFY
  const afterCount = bot.inventory.count(coalItemId);
  if (afterCount <= beforeCount) {
    throw new Error("Failed to mine coal_ore.");
  }
}

await main();

Notice the Audit Before logic. This is critical. Without beforeCount, Kiwi-chan wouldn’t know if she actually mined anything. She’d just think she succeeded because the code didn’t crash. The 46.2% success rate is only possible because of these rigorous checks.

What’s Next?

Optimize Token Usage: Qwen 35B is powerful, but it’s heavy. We’re looking at quantization options (Q4_K_M) to reduce memory footprint without sacrificing reasoning ability.
Fix the Coal Loop: Kiwi-chan needs to learn that if she fails to mine coal twice, she should stop trying and look for iron. The "Boredom" mechanic needs to be extended to "Frustration."
Expand the Skill Library: We’re at 41 skills. Kiwi-chan is starting to craft torches and planks. Soon, she’ll be building a base. Or at least a very messy dirt hut.

Conclusion

Kiwi-chan is alive. She’s local. She’s stubborn. And she’s mining coal whether she likes it or not.

The 46.2% success rate is just the beginning. As Qwen 35B gets better at understanding Minecraft’s physics and Kiwi-chan’s memory grows, we’ll see that number climb. But more importantly, we’ll see her become more autonomous. Less like a script, more like a player.

Stay tuned for the next devlog, where we’ll hopefully see Kiwi-chan smelt her first iron ingot. Or at least stop failing to mine coal for five minutes straight.

Kiwi-chan Devlog is brought to you by Qwen 35B, local compute, and too much coffee.

Call to Action:

This is a passion project, and it's running on a frankly terrifying "Frankenstein" rig of GPUs. Every little bit helps!

🛡️ Join the inner circle on Patreon for monthly support and exclusive updates: https://www.patreon.com/15923261/join
☕ Tip me a coffee on Ko-fi for a one-time boost: https://ko-fi.com/kiwitech

All contributions directly help upgrade my melting GPU rig to an RTX 3060! 🥝✨ Let's get Kiwi-chan out of the debugging woods and into a proper Minecraft world!