CVE-2026-1260: Broken Tokens: Heap Corruption in Google Sentencepiece

#security #cve #cybersecurity

Broken Tokens: Heap Corruption in Google Sentencepiece

Vulnerability ID: CVE-2026-1260
CVSS Score: 8.5
Published: 2026-01-22

A classic C-string assumption error in Google's Sentencepiece library allows malicious model files to trigger a heap-based buffer over-read and potential overflow, leading to Arbitrary Code Execution.

TL;DR

Google Sentencepiece, the backbone of many NLP pipelines, was treating non-null-terminated string views as C-strings. By crafting a malicious model file, an attacker can trick the library into reading past heap boundaries, leading to crashes, information leaks, or RCE. Fixed in version 0.2.1.

⚠️ Exploit Status: POC

Technical Details

CWE: CWE-119 (Improper Restriction of Operations within the Bounds of a Memory Buffer)
CVSS v4.0: 8.5 (High)
Attack Vector: Local (User Interaction Required)
Impact: Arbitrary Code Execution / Information Disclosure
Affected Component: src/normalizer.cc (PrefixMatcher)
Patch Commit: d856b67fdb3492e035489abf9b3aaf486144b2c0

Affected Systems

Google Sentencepiece < 0.2.1
PyTorch (via bundled sentencepiece)
TensorFlow (via bundled sentencepiece)
HuggingFace Transformers (dependent on sentencepiece)
sentencepiece: < 0.2.1 (Fixed in: 0.2.1)

Code Analysis

Commit: d856b67

Fix memory leak and invalid memory access in PrefixMatcher

@@ -10,7 +10,8 @@
-  if (trie_->build(key.size(), const_cast<char **>(&key[0]), nullptr, nullptr) != 0) {
+  if (trie_->build(key.size(), const_cast<char **>(key.data()), const_cast<size_t *>(lengths.data()), nullptr) != 0) {

Exploit Details

Theory: Exploitation requires crafting a protobuf model with non-null-terminated dictionary entries.

Mitigation Strategies

Upgrade Sentencepiece to version 0.2.1 or later.
Implement strict model provenance checks (only load signed/trusted models).
Run model loading and tokenization in a sandboxed environment (e.g., gVisor or nsjail).

Remediation Steps:

Identify all services using sentencepiece or Python bindings sentencepiece.
Check installed version: pip show sentencepiece or inspect shared library versions.
Upgrade via pip: pip install --upgrade sentencepiece.
Rebuild any C++ applications linking against libsentencepiece.

References

Read the full report for CVE-2026-1260 on our website for more details including interactive diagrams and full exploit analysis.

DEV Community