GHSA-8RFP-98V4-MMR6: GHSA-8RFP-98V4-MMR6: Protocol-Filtering Bypass via Unicode Obfuscation in Mozilla Bleach

#security #cve #cybersecurity #ghsa

GHSA-8RFP-98V4-MMR6: Protocol-Filtering Bypass via Unicode Obfuscation in Mozilla Bleach

Vulnerability ID: GHSA-8RFP-98V4-MMR6
CVSS Score: 0.0
Published: 2026-06-16

Mozilla Bleach is an open-source HTML sanitizing library for Python. Versions up to and including 6.3.0 contain an incomplete filtering implementation in the URI validation logic ('sanitize_uri_value'). This logic fails to detect disallowed protocols, such as 'javascript:', if they contain Unicode invisible characters, whitespace characters, or characters with a code point greater than U+00A0. While standard-compliant web browsers do not directly execute invalid URI schemes containing these non-standard characters, downstream systems that normalize Unicode text by stripping invisible or non-ASCII characters can unintentionally reactivate the 'javascript:' prefix, causing Cross-Site Scripting (XSS). Additionally, this behavior violates Bleach's core sanitization contract by outputting URIs that bypass protocol allowlists configured by the caller.

TL;DR

Mozilla Bleach versions up to 6.3.0 fail to sanitize URLs containing high-plane Unicode or invisible characters in the scheme prefix. This allows blocked protocols like 'javascript:' to bypass sanitization filters, creating stored Cross-Site Scripting (XSS) risks in downstream environments that normalize or strip Unicode data.

⚠️ Exploit Status: POC

Technical Details

CWE ID: CWE-184 (Incomplete List of Disallowed Inputs)
Attack Vector: Network (AV:N)
CVSS v3.1 Score: 0.0 (Low due to indirect downstream dependency)
Impact: Bypass of protocol validation filters / Secondary stored XSS
Exploit Status: Proof-of-Concept (PoC) available
KEV Status: Not listed in CISA KEV

Affected Systems

Mozilla Bleach <= 6.3.0
bleach: <= 6.3.0 (Fixed in: 6.4.0)

Code Analysis

Commit: 7c4867c

Fix protocol bypass vulnerability with unicode characters

@@ -488,14 +488,15 @@ def sanitize_uri_value(self, value, allowed_protocols):
         # Convert all character entities in the value
         normalized_uri = html5lib_shim.convert_entities(value)

-        # Nix backtick, space characters, and control characters
+        # Strip backtick, whitespace, and control characters
         normalized_uri = re.sub(r"[`\000-\040\177-\240\s]+", "", normalized_uri)

-        # Remove REPLACEMENT characters
-        normalized_uri = normalized_uri.replace("\ufffd", "")
+        # Strip non-ASCII characters so that urlparse can parse the url into
+        # components correctly. This drops invisible and whitespace unicode
+        # characters among other things.
+        normalized_uri = re.sub(r"[^\x00-\x7f]", "", normalized_uri)

-        # Lowercase it--this breaks the value, but makes it easier to match
-        # against
+        # Lowercase value to make matching easier
         normalized_uri = normalized_uri.lower()

Exploit Details

Research Context: Demonstrates Bleach v6.3.0 bypass using Zero-Width Space in the protocol scheme to evade urlparse and trigger XSS on downstream normalization.

Mitigation Strategies

Upgrade to Mozilla Bleach version 6.4.0.
Migrate from the deprecated Bleach library to active alternatives such as nh3.
Preprocess untrusted strings to remove high-plane Unicode whitespace and invisible characters before passing them to the sanitizer.
Deploy a strong Content Security Policy (CSP) restricting 'unsafe-inline' scripts.

Remediation Steps:

Locate and audit your application's dependencies for 'bleach' configurations.
Upgrade Bleach to 6.4.0: 'pip install bleach==6.4.0'.
If utilizing a downstream processor or database normalization, ensure characters are normalized before validation rather than after.
Transition application codebase to 'nh3' for ongoing security support and HTML sanitization.

References

Read the full report for GHSA-8RFP-98V4-MMR6 on our website for more details including interactive diagrams and full exploit analysis.

DEV Community