Akshay Joshi

Posted on Jan 16

Reverse-Fingerprinting Kilo’s “Giga Potato”: Why It’s Likely Qwen 3-Instruct (Quantized)

#ai #llm #opensource #systemdesign

Kilo recently announced a new stealth model internally called “Giga Potato.”

No official model name. No lab attribution. Just hints: open-weight, Chinese origin, very large context window, strong reasoning, and slower but dependable performance.

Instead of guessing, I ran a behavior-first forensic analysis using controlled prompts designed to expose tokenizer behavior, instruction discipline, and reasoning-mode control. This post documents the reasoning and the conclusion.

Step 1: Establish the hard constraints

From Kilo’s announcement and observed behavior, the model must be:

Open-weight / OSS compatible
Deployable by a third-party platform
Capable of very long context (128k–256k range)
Cost-efficient enough to run at scale
Enterprise-safe and conservative in tone

This immediately narrows the field to Chinese open-source model families, with Qwen being the strongest candidate.

Step 2: Tokenizer forensics (ruling out LLaMA)

A mixed-string tokenizer probe (identifiers, symbols, currency, Chinese text) produced:
["financial", "", "reconciliation", "", "2024", "-", "25", "", "₹", "150000", "", "GST", "18", "%", "_", "北", "京"]

Key observations:

No SentencePiece word-boundary markers → not LLaMA
Single-character Chinese tokens → Chinese-first vocabulary
Clean handling of symbols and underscores → Qwen-style tokenizer

The model also incorrectly described LLaMA’s tokenizer internals, which strongly suggests LLaMA-compatibility framing rather than true lineage.

At this point, LLaMA and its derivatives can be ruled out.

Step 3: Instruction-restraint stress test

A deliberately constrained ERP design prompt included explicit rules:

Do not optimize
Do not future-proof
Explicitly stop yourself from adding abstractions

The model:

Followed all constraints precisely
Explicitly explained why abstractions were avoided
Produced intentionally “boring but correct” enterprise output

This behavior aligns with post-Qwen-2.5 instruction tuning, where self-restraint and compliance improved significantly.

Step 4: The decisive probe — reasoning mode separation

The final discriminator was a logic task requiring:

A clean, formal 5-step proof
A separate “thinking mode” explanation

The model:

Cleanly separated the two modes
Changed tone and pedagogy deliberately
Maintained coherence without collapsing styles

This explicit cognitive mode switching is a defining characteristic of Qwen 3-Instruct.

Qwen 2.5 can imitate the style, but it does not reliably maintain strict mode separation on command.

Why this points to Qwen 3 (not Qwen 2.5)

Qwen 3 introduced hybrid reasoning modes (formal vs exploratory)
The observed output shows deliberate gear-shifting, not just verbosity
Minor factual slips are consistent with quantized inference, not weaker capability
Tokenizer behavior remains consistent with the Qwen lineage
The deployment profile fits a quantized Qwen 3-Instruct, not a heavyweight MoE flagship

Final prediction

Model: Qwen 3-Instruct

Deployment: Quantized (likely 4-bit or 8-bit), long-context

Confidence: ~85–90%

“Giga Potato” is an apt internal name: large, heavy, slow — but deeply filling. It prioritizes reasoning depth and stability over flash.

Why this matters

For Agentic AI, and internal platforms:

Strong reasoning without over-engineering
Predictable, enterprise-safe behavior
Excellent long-context stability

If you are building grounded systems rather than benchmark demos, this is a model worth studying.

DEV Community