DEV Community

Cover image for Reverse-Fingerprinting Kilo’s “Giga Potato”: Why It’s Likely Qwen 3-Instruct (Quantized)
Akshay Joshi
Akshay Joshi

Posted on

Reverse-Fingerprinting Kilo’s “Giga Potato”: Why It’s Likely Qwen 3-Instruct (Quantized)

Kilo recently announced a new stealth model internally called “Giga Potato.”

No official model name. No lab attribution. Just hints: open-weight, Chinese origin, very large context window, strong reasoning, and slower but dependable performance.

Instead of guessing, I ran a behavior-first forensic analysis using controlled prompts designed to expose tokenizer behavior, instruction discipline, and reasoning-mode control. This post documents the reasoning and the conclusion.


Step 1: Establish the hard constraints

From Kilo’s announcement and observed behavior, the model must be:

  • Open-weight / OSS compatible
  • Deployable by a third-party platform
  • Capable of very long context (128k–256k range)
  • Cost-efficient enough to run at scale
  • Enterprise-safe and conservative in tone

This immediately narrows the field to Chinese open-source model families, with Qwen being the strongest candidate.


Step 2: Tokenizer forensics (ruling out LLaMA)

A mixed-string tokenizer probe (identifiers, symbols, currency, Chinese text) produced:
["financial", "", "reconciliation", "", "2024", "-", "25", "", "₹", "150000", "", "GST", "18", "%", "_", "北", "京"]

Key observations:

  • No SentencePiece word-boundary markers → not LLaMA
  • Single-character Chinese tokens → Chinese-first vocabulary
  • Clean handling of symbols and underscores → Qwen-style tokenizer

The model also incorrectly described LLaMA’s tokenizer internals, which strongly suggests LLaMA-compatibility framing rather than true lineage.

At this point, LLaMA and its derivatives can be ruled out.


Step 3: Instruction-restraint stress test

A deliberately constrained ERP design prompt included explicit rules:

  • Do not optimize
  • Do not future-proof
  • Explicitly stop yourself from adding abstractions

The model:

  • Followed all constraints precisely
  • Explicitly explained why abstractions were avoided
  • Produced intentionally “boring but correct” enterprise output

This behavior aligns with post-Qwen-2.5 instruction tuning, where self-restraint and compliance improved significantly.


Step 4: The decisive probe — reasoning mode separation

The final discriminator was a logic task requiring:

  1. A clean, formal 5-step proof
  2. A separate “thinking mode” explanation

The model:

  • Cleanly separated the two modes
  • Changed tone and pedagogy deliberately
  • Maintained coherence without collapsing styles

This explicit cognitive mode switching is a defining characteristic of Qwen 3-Instruct.

Qwen 2.5 can imitate the style, but it does not reliably maintain strict mode separation on command.


Why this points to Qwen 3 (not Qwen 2.5)

  • Qwen 3 introduced hybrid reasoning modes (formal vs exploratory)
  • The observed output shows deliberate gear-shifting, not just verbosity
  • Minor factual slips are consistent with quantized inference, not weaker capability
  • Tokenizer behavior remains consistent with the Qwen lineage
  • The deployment profile fits a quantized Qwen 3-Instruct, not a heavyweight MoE flagship

Final prediction

Model: Qwen 3-Instruct

Deployment: Quantized (likely 4-bit or 8-bit), long-context

Confidence: ~85–90%

“Giga Potato” is an apt internal name: large, heavy, slow — but deeply filling. It prioritizes reasoning depth and stability over flash.


Why this matters

For Agentic AI, and internal platforms:

  • Strong reasoning without over-engineering
  • Predictable, enterprise-safe behavior
  • Excellent long-context stability

If you are building grounded systems rather than benchmark demos, this is a model worth studying.

Top comments (0)