Kilo recently announced a new stealth model internally called “Giga Potato.”
No official model name. No lab attribution. Just hints: open-weight, Chinese origin, very large context window, strong reasoning, and slower but dependable performance.
Instead of guessing, I ran a behavior-first forensic analysis using controlled prompts designed to expose tokenizer behavior, instruction discipline, and reasoning-mode control. This post documents the reasoning and the conclusion.
Step 1: Establish the hard constraints
From Kilo’s announcement and observed behavior, the model must be:
- Open-weight / OSS compatible
- Deployable by a third-party platform
- Capable of very long context (128k–256k range)
- Cost-efficient enough to run at scale
- Enterprise-safe and conservative in tone
This immediately narrows the field to Chinese open-source model families, with Qwen being the strongest candidate.
Step 2: Tokenizer forensics (ruling out LLaMA)
A mixed-string tokenizer probe (identifiers, symbols, currency, Chinese text) produced:
["financial", "", "reconciliation", "", "2024", "-", "25", "", "₹", "150000", "", "GST", "18", "%", "_", "北", "京"]
Key observations:
- No SentencePiece word-boundary markers → not LLaMA
- Single-character Chinese tokens → Chinese-first vocabulary
- Clean handling of symbols and underscores → Qwen-style tokenizer
The model also incorrectly described LLaMA’s tokenizer internals, which strongly suggests LLaMA-compatibility framing rather than true lineage.
At this point, LLaMA and its derivatives can be ruled out.
Step 3: Instruction-restraint stress test
A deliberately constrained ERP design prompt included explicit rules:
- Do not optimize
- Do not future-proof
- Explicitly stop yourself from adding abstractions
The model:
- Followed all constraints precisely
- Explicitly explained why abstractions were avoided
- Produced intentionally “boring but correct” enterprise output
This behavior aligns with post-Qwen-2.5 instruction tuning, where self-restraint and compliance improved significantly.
Step 4: The decisive probe — reasoning mode separation
The final discriminator was a logic task requiring:
- A clean, formal 5-step proof
- A separate “thinking mode” explanation
The model:
- Cleanly separated the two modes
- Changed tone and pedagogy deliberately
- Maintained coherence without collapsing styles
This explicit cognitive mode switching is a defining characteristic of Qwen 3-Instruct.
Qwen 2.5 can imitate the style, but it does not reliably maintain strict mode separation on command.
Why this points to Qwen 3 (not Qwen 2.5)
- Qwen 3 introduced hybrid reasoning modes (formal vs exploratory)
- The observed output shows deliberate gear-shifting, not just verbosity
- Minor factual slips are consistent with quantized inference, not weaker capability
- Tokenizer behavior remains consistent with the Qwen lineage
- The deployment profile fits a quantized Qwen 3-Instruct, not a heavyweight MoE flagship
Final prediction
Model: Qwen 3-Instruct
Deployment: Quantized (likely 4-bit or 8-bit), long-context
Confidence: ~85–90%
“Giga Potato” is an apt internal name: large, heavy, slow — but deeply filling. It prioritizes reasoning depth and stability over flash.
Why this matters
For Agentic AI, and internal platforms:
- Strong reasoning without over-engineering
- Predictable, enterprise-safe behavior
- Excellent long-context stability
If you are building grounded systems rather than benchmark demos, this is a model worth studying.
Top comments (0)