GHSA-8RFP-98V4-MMR6: Protocol-Filtering Bypass via Unicode Obfuscation in Mozilla Bleach
Vulnerability ID: GHSA-8RFP-98V4-MMR6
CVSS Score: 0.0
Published: 2026-06-16
Mozilla Bleach is an open-source HTML sanitizing library for Python. Versions up to and including 6.3.0 contain an incomplete filtering implementation in the URI validation logic ('sanitize_uri_value'). This logic fails to detect disallowed protocols, such as 'javascript:', if they contain Unicode invisible characters, whitespace characters, or characters with a code point greater than U+00A0. While standard-compliant web browsers do not directly execute invalid URI schemes containing these non-standard characters, downstream systems that normalize Unicode text by stripping invisible or non-ASCII characters can unintentionally reactivate the 'javascript:' prefix, causing Cross-Site Scripting (XSS). Additionally, this behavior violates Bleach's core sanitization contract by outputting URIs that bypass protocol allowlists configured by the caller.
TL;DR
Mozilla Bleach versions up to 6.3.0 fail to sanitize URLs containing high-plane Unicode or invisible characters in the scheme prefix. This allows blocked protocols like 'javascript:' to bypass sanitization filters, creating stored Cross-Site Scripting (XSS) risks in downstream environments that normalize or strip Unicode data.
⚠️ Exploit Status: POC
Technical Details
- CWE ID: CWE-184 (Incomplete List of Disallowed Inputs)
- Attack Vector: Network (AV:N)
- CVSS v3.1 Score: 0.0 (Low due to indirect downstream dependency)
- Impact: Bypass of protocol validation filters / Secondary stored XSS
- Exploit Status: Proof-of-Concept (PoC) available
- KEV Status: Not listed in CISA KEV
Affected Systems
- Mozilla Bleach <= 6.3.0
-
bleach: <= 6.3.0 (Fixed in:
6.4.0)
Code Analysis
Commit: 7c4867c
Fix protocol bypass vulnerability with unicode characters
@@ -488,14 +488,15 @@ def sanitize_uri_value(self, value, allowed_protocols):
# Convert all character entities in the value
normalized_uri = html5lib_shim.convert_entities(value)
- # Nix backtick, space characters, and control characters
+ # Strip backtick, whitespace, and control characters
normalized_uri = re.sub(r"[`\000-\040\177-\240\s]+", "", normalized_uri)
- # Remove REPLACEMENT characters
- normalized_uri = normalized_uri.replace("\ufffd", "")
+ # Strip non-ASCII characters so that urlparse can parse the url into
+ # components correctly. This drops invisible and whitespace unicode
+ # characters among other things.
+ normalized_uri = re.sub(r"[^\x00-\x7f]", "", normalized_uri)
- # Lowercase it--this breaks the value, but makes it easier to match
- # against
+ # Lowercase value to make matching easier
normalized_uri = normalized_uri.lower()
Exploit Details
- Research Context: Demonstrates Bleach v6.3.0 bypass using Zero-Width Space in the protocol scheme to evade urlparse and trigger XSS on downstream normalization.
Mitigation Strategies
- Upgrade to Mozilla Bleach version 6.4.0.
- Migrate from the deprecated Bleach library to active alternatives such as nh3.
- Preprocess untrusted strings to remove high-plane Unicode whitespace and invisible characters before passing them to the sanitizer.
- Deploy a strong Content Security Policy (CSP) restricting 'unsafe-inline' scripts.
Remediation Steps:
- Locate and audit your application's dependencies for 'bleach' configurations.
- Upgrade Bleach to 6.4.0: 'pip install bleach==6.4.0'.
- If utilizing a downstream processor or database normalization, ensure characters are normalized before validation rather than after.
- Transition application codebase to 'nh3' for ongoing security support and HTML sanitization.
References
Read the full report for GHSA-8RFP-98V4-MMR6 on our website for more details including interactive diagrams and full exploit analysis.
Top comments (0)