Broken Tokens: Heap Corruption in Google Sentencepiece
Vulnerability ID: CVE-2026-1260
CVSS Score: 8.5
Published: 2026-01-22
A classic C-string assumption error in Google's Sentencepiece library allows malicious model files to trigger a heap-based buffer over-read and potential overflow, leading to Arbitrary Code Execution.
TL;DR
Google Sentencepiece, the backbone of many NLP pipelines, was treating non-null-terminated string views as C-strings. By crafting a malicious model file, an attacker can trick the library into reading past heap boundaries, leading to crashes, information leaks, or RCE. Fixed in version 0.2.1.
⚠️ Exploit Status: POC
Technical Details
- CWE: CWE-119 (Improper Restriction of Operations within the Bounds of a Memory Buffer)
- CVSS v4.0: 8.5 (High)
- Attack Vector: Local (User Interaction Required)
- Impact: Arbitrary Code Execution / Information Disclosure
- Affected Component: src/normalizer.cc (PrefixMatcher)
- Patch Commit: d856b67fdb3492e035489abf9b3aaf486144b2c0
Affected Systems
- Google Sentencepiece < 0.2.1
- PyTorch (via bundled sentencepiece)
- TensorFlow (via bundled sentencepiece)
- HuggingFace Transformers (dependent on sentencepiece)
-
sentencepiece: < 0.2.1 (Fixed in:
0.2.1)
Code Analysis
Commit: d856b67
Fix memory leak and invalid memory access in PrefixMatcher
@@ -10,7 +10,8 @@
- if (trie_->build(key.size(), const_cast<char **>(&key[0]), nullptr, nullptr) != 0) {
+ if (trie_->build(key.size(), const_cast<char **>(key.data()), const_cast<size_t *>(lengths.data()), nullptr) != 0) {
Exploit Details
- Theory: Exploitation requires crafting a protobuf model with non-null-terminated dictionary entries.
Mitigation Strategies
- Upgrade Sentencepiece to version 0.2.1 or later.
- Implement strict model provenance checks (only load signed/trusted models).
- Run model loading and tokenization in a sandboxed environment (e.g., gVisor or nsjail).
Remediation Steps:
- Identify all services using
sentencepieceor Python bindingssentencepiece. - Check installed version:
pip show sentencepieceor inspect shared library versions. - Upgrade via pip:
pip install --upgrade sentencepiece. - Rebuild any C++ applications linking against
libsentencepiece.
References
Read the full report for CVE-2026-1260 on our website for more details including interactive diagrams and full exploit analysis.
Top comments (0)