DEV Community

CVE Reports
CVE Reports

Posted on • Originally published at cvereports.com

CVE-2026-1260: Broken Tokens: Heap Corruption in Google Sentencepiece

Broken Tokens: Heap Corruption in Google Sentencepiece

Vulnerability ID: CVE-2026-1260
CVSS Score: 8.5
Published: 2026-01-22

A classic C-string assumption error in Google's Sentencepiece library allows malicious model files to trigger a heap-based buffer over-read and potential overflow, leading to Arbitrary Code Execution.

TL;DR

Google Sentencepiece, the backbone of many NLP pipelines, was treating non-null-terminated string views as C-strings. By crafting a malicious model file, an attacker can trick the library into reading past heap boundaries, leading to crashes, information leaks, or RCE. Fixed in version 0.2.1.


⚠️ Exploit Status: POC

Technical Details

  • CWE: CWE-119 (Improper Restriction of Operations within the Bounds of a Memory Buffer)
  • CVSS v4.0: 8.5 (High)
  • Attack Vector: Local (User Interaction Required)
  • Impact: Arbitrary Code Execution / Information Disclosure
  • Affected Component: src/normalizer.cc (PrefixMatcher)
  • Patch Commit: d856b67fdb3492e035489abf9b3aaf486144b2c0

Affected Systems

  • Google Sentencepiece < 0.2.1
  • PyTorch (via bundled sentencepiece)
  • TensorFlow (via bundled sentencepiece)
  • HuggingFace Transformers (dependent on sentencepiece)
  • sentencepiece: < 0.2.1 (Fixed in: 0.2.1)

Code Analysis

Commit: d856b67

Fix memory leak and invalid memory access in PrefixMatcher

@@ -10,7 +10,8 @@
-  if (trie_->build(key.size(), const_cast<char **>(&key[0]), nullptr, nullptr) != 0) {
+  if (trie_->build(key.size(), const_cast<char **>(key.data()), const_cast<size_t *>(lengths.data()), nullptr) != 0) {
Enter fullscreen mode Exit fullscreen mode

Exploit Details

  • Theory: Exploitation requires crafting a protobuf model with non-null-terminated dictionary entries.

Mitigation Strategies

  • Upgrade Sentencepiece to version 0.2.1 or later.
  • Implement strict model provenance checks (only load signed/trusted models).
  • Run model loading and tokenization in a sandboxed environment (e.g., gVisor or nsjail).

Remediation Steps:

  1. Identify all services using sentencepiece or Python bindings sentencepiece.
  2. Check installed version: pip show sentencepiece or inspect shared library versions.
  3. Upgrade via pip: pip install --upgrade sentencepiece.
  4. Rebuild any C++ applications linking against libsentencepiece.

References


Read the full report for CVE-2026-1260 on our website for more details including interactive diagrams and full exploit analysis.

Top comments (0)