Drawbacks of Traditional WAFs
Traditional WAFs typically use regular expressions to define attack patterns. Taking the well-known ModSecurity engine as an example, 80% of WAFs in the world are powered by it. Let's analyze what these rules look like.
Example Rules
- union[\w\s]?select: This rule defines an SQL injection attack pattern while the traffic contains the words "union" and "select."
- \balert\s: This rule defines an XSS attack pattern while the traffic contains the word "alert" followed by a left parenthesis "(".
Bypass Examples
Real attackers can easily bypass these keywords, thus circumventing the protection of the WAF. Here are a couple of examples of false negatives:
- union // select**: By inserting a comment character between "union" and "select," the keyword pattern is disrupted, making the attack undetectable.
- window'\x61lert: By replacing the letter "a" with "\x61," the keyword pattern is disrupted, making the attack undetectable.
From these examples, we can conclude that traditional regex-based WAFs cannot effectively prevent attacks, as they can always be bypassed by hackers.
False Positives
Furthermore, regular expressions can also cause a high rate of false positives, resulting in genuine website users being affected. Here are a couple of examples of false positives:
- The union select members from each department to form a committee: This triggers the above-mentioned rule and is mistakenly identified as an SQL injection attack, though it's just a simple English sentence.
- Her down on the alert(for the man) and walked into a world of rivers: This triggers the above-mentioned rule and is mistakenly identified as an XSS attack, though it's just a simple English sentence.
How Attackers Bypass Regex-Based WAFs
We share two readings that explore how experts from the Black Hat conference automate bypassing regex-based WAF protections:
- AutoSpear: Towards Automatically Bypassing and Inspecting Web Application Firewalls
- Web Application Firewalls: Attacking detection logic mechanisms
How to Use Syntax Analysis in WAF
Syntax Analysis Algorithm in SafeLine WAF
Syntax analysis is the core capability of SafeLine WAF. Instead of using simple regex patterns to match attack traffic, it truly understands the user inputs in the traffic and deeply analyzes potential attack behaviors.
Example: SQL Injection
To successfully carry out SQL injection attacks, attackers need to meet two conditions:
-
The traffic contains an SQL statement, and it must be syntactically valid.
- Example:
union select xxx from xxx whereis a syntactically valid SQL statement fragment. - Example:
union select xxx from xxx xxx xxx xxx xxx whereis not a syntactically valid SQL statement fragment.
- Example:
-
SQL statements must have malicious behavior, not just meaningless statements.
- Example:
union select xxx from xxx wherehas the potential for malicious behavior. - Example:
1 + 1 = 2has no practical meaning.
- Example:
SafeLine WAF detects attacks based on the essence of SQL injection attacks, using the following process:
- Parsing the HTTP traffic to find positions with potential inputs.
- Deeply recursive decoding of the parameters, embracing the most primitive user input.
- Checking if the user input conforms to SQL syntax.
- Detecting the possible intentions behind the SQL syntax.
- Scoring the malicious intentions and deciding whether to intercept.
How SafeLine WAF Works
SafeLine WAF has built-in compilers covering common programming languages. By deeply decoding the payload content of HTTP, it matches the corresponding syntax compiler based on the language type and then matches the threat model to obtain the threat rating, allowing or blocking access requests.
Why Semantic Analysis is More Powerful
Students majoring in computer science have studied compiler principles, and Chomsky's grammar system is often discussed. He divides formal languages into four types:
- Type 0 Grammar (Unrestricted Grammar): Recognizable by Turing Machines
- Type 1 Grammar (Context-Sensitive Grammar): Recognizable by Linear Bounded Automata
- Type 2 Grammar (Context-Free Grammar): Recognizable by Pushdown Automata
- Type 3 Grammar (Regular Grammar): Recognizable by Finite State Automata
The expressive power of these grammars weakens from level 0 to level 3. Programming languages such as SQL, HTML, and JavaScript typically fall under Type 2 grammars. Regular expressions, however, correspond to the weakest expressive power of Type 3 grammars.
Limitations of Regular Expressions
Regular expressions are fundamentally weak because they cannot perform tasks like counting or recognizing a valid string of matched parentheses. This makes them unsuitable for detecting dynamically changing attack payloads. The inherent limitations of rule-based attack recognition methods based on regular expressions are the primary reason why traditional WAFs have lower protection effectiveness.
Why Syntax Analysis is Superior
Compared to regex-based pattern matching threat detection methods, syntax analysis offers higher accuracy and lower false positive rates. By understanding the underlying structure and intentions behind the attack traffic, SafeLine WAF provides much more robust protection.
Final Recommendation
If you're looking for a powerful, open-source Web Application Firewall (WAF) to protect your website, I highly recommend SafeLine. Its semantic analysis algorithm, bot protection (human verification, dynamic protection, replay prevention), CC protection (rate limiting, waiting rooms), and authentication features provide comprehensive security for your site.
You can check it out on GitHub.
Official Website: https://ly.safepoint.cloud/eGtfrcF
Live Demo: https:https://ly.safepoint.cloud/DQywpL7
Top comments (0)