Abstract
Modern messaging platforms enforce message length limits in terms of characters rather than bytes. Due to Unicode variability, emojis can represent a single “character” while consuming multiple bytes in UTF-8 encoding. This paper introduces the SCP Emoji Attack (Single Code Point Emoji Attack), a method that leverages high-byte, single-code-point emojis to inflate storage usage and cause denial-of-service conditions in databases.
1. Introduction
Messaging systems typically restrict messages to a maximum number of characters (e.g., 100 characters). Developers often assume that one character ≈ one byte. However, Unicode breaks this assumption:
- ASCII letters/numbers → 1 byte each
- Single Code Point (SCP) Emojis (🚀, 🔥, ❤️) → 4 bytes each
- Complex multi-code-point emojis (e.g., 🏴☠️) → up to 13+ bytes each
This discrepancy creates an amplification gap between perceived limit and actual storage. By choosing SCP emojis, attackers achieve a balance of high byte-per-character efficiency and maximum scalability, making the SCP Emoji Attack more practical than complex multi-code-point payloads.
2. Why SCP Emojis Instead of Complex Emojis?
At first glance, the largest possible byte payload (e.g., 🏴☠️ = 13 bytes) seems like the strongest attack vector. However, SCP emojis are superior in practice for three reasons:
1- Character Limit Constraints
- Most messaging systems enforce limits by code points or characters.
- A complex emoji like 🏴☠️ consumes 4 code points, reducing how many can fit inside a 100-character message (25 maximum).
- SCP emojis consume only 1 code point, allowing a full 100 to fit, maximizing throughput.
2- Simplicity & Reliability
- SCP emojis are universally supported and rarely normalized or split differently by databases or programming languages.
- Complex emojis may behave inconsistently across platforms (some count them as multiple characters, others collapse them).
3- Amplification Efficiency
- 🏴☠️: 25 × 13 bytes = 325 bytes in a 100-character field.
- 🚀: 100 × 4 bytes = 400 bytes in the same field. → Despite lower per-emoji size, SCP emojis produce a larger final payload due to character-limit efficiency.
Thus, SCP emojis are the optimal choice for maximizing stored bytes within character-limited messaging fields.
3. Attack Example: MariaDB with 50 GB Memory
Setup:
- Database: MariaDB,
utf8mb4
, columnVARCHAR(100)
- Message limit: 100 characters
- Payload: 🚀 (single-code-point emoji, 4 bytes each)
Calculation:
- Per message: 100 × 4 = 400 bytes
If an attacker inserts 150 million messages:
- Naïve developer expectation (1 byte/char): ~15 GB
- Actual usage (SCP Emoji Attack): ~60 GB
This overshoots a 50 GB memory allocation, triggering:
- Buffer pool exhaustion
- Slow queries and replication lag
- Backup inflation
- Eventual denial-of-service
4. Mitigation Strategies
- Enforce byte-based limits (e.g.,
VARCHAR(400)
vsVARCHAR(100)
). - Validate and normalize emojis at input.
- Monitor for emoji-dense traffic patterns.
- Rate-limit message creation per user.
5. Conclusion
The SCP Emoji Attack demonstrates that even “simple” single-code-point emojis can be weaponized to amplify database storage consumption. By carefully choosing SCP emojis over complex grapheme clusters, attackers maximize per-message payload efficiency, achieve broader compatibility, and maintain stealth.
This highlights the necessity of byte-aware validation and Unicode-conscious design in modern messaging platforms.
Top comments (1)
Interesting and accurate