Semgrep provides a large number of rules, but sometimes you may want to customize a rule or create a new one.
For example, when a vulnerability is found in a product developed by the organization, we want to
- check if similar vulnerabilities exist in other products
 - detect similar vulnerabilities in the future
 
In such cases, rules for finding vulnerabilities will help maintain the security of the product.
This article will guide you through the process of creating a rule to detect DOM-Based XSS and help you understand the features and options required to create a rule.
Prerequisite
This tutorial uses semgrep 1.2.1.
$ semgrep --version
1.2.1
Search for similar rules
Semgrep Registry provides rules created by r2c, developer of Semgrep, and the community.
There may already be a rule that has been created that you are trying to create, or there may be similar rules, so search first.
In this tutorial, we want to create a rule to detect DOM-Based XSS, so we will search javascript rule for dom xss.
The search found the rule javascript.browser.security.dom-based-xss.dom-based-xss, so we will create a rule based on this.
part of javascript.browser.security.dom-based-xss.dom-based-xss:
pattern-either:
  - pattern: document.write(<... document.location.$W ...>)
  - pattern: document.write(<... location.$W ...>)
This rule seems to only detect document.write().
Create test cases
Before you start writing rules, you should first create test cases.
- test code you want to detect (unsafe)
 - test code you do not want to detect (safe)
 
The test case is as follows:
dom-based-xss.js
const qs = window.location.search;
const hash = window.location.hash;
// ok
document.write("<p>ok</p>");
// unsafe
document.write(qs);
document.write(hash);
Test cases do not need to cover all patterns from the beginning. You can add test cases as you create rules.
By the way, it still does not detect correctly when executed with the current rules.
$ semgrep --config dom-based-xss.yaml dom-based-xss.js
Scanning 1 file.
Ran 1 rule on 1 file: 0 findings.
Taint tracking
Injection attacks such as XSS are characterized by source and sink. The place where the attack code is placed is called the source, and the place where the attack code is executed is called the sink.
Semgrep has a feature called taint tracking that analyzes whether an untrusted source reaches a vulnerable sink.
Taint tracking may reduce false negatives and false positives.
Taint mode
To use taint tacking, set mode to taint and write pattan-sources and pattern-sinks.
dom-based-xss.yaml
rules:
- id: dom-based-xss
  mode: taint
  message: dom-xss
  languages:
  - javascript
  - typescript
  severity: ERROR
  pattern-sources:
  - pattern: window.location
  pattern-sinks:
  - pattern-either:
    - pattern: document.write(...)
This rule has the following settings:
mode: taint- 
pattan-sourcestowindow.location, the source of DOM-XSS. - 
pattan-sinkstodocument.write(...), the sink of DOM-XSS 
In this rule, The taint tracking analyzes the following
- 
const qs = window.location.search;- 
window.locationis tainted - 
window.location.searchis tainted too - constant 
qsis also tainted 
 - 
 - 
document.write(qs);- tainted 
qsis used in vulnerable sink 
 - tainted 
 
Run this rule on the previous test case and you will see that it is now detectable.
$ semgrep --config dom-based-xss.yaml dom-based-xss.js 
Scanning 1 file.
Findings:
  dom-based-xss2.js 
     dom-based-xss  
        dom-xss     
          5┆ document.write(qs);
          ⋮┆----------------------------------------
          6┆ document.write(hash);
Ran 1 rule on 1 file: 2 findings
Enhance Source and Sink
DOM-XSS source can be other than window.location and sink can be other than document.write().
For example,
Introducing DOM Invader: DOM XSS just got a whole lot easier to find : PortSwigger presented 11 sources and 86 sinks.
Since the article would be too long if we tried to cover all sources and sinks, 5 sources and sinks are selected.
sources
- location
- location.href
- location.hash
- location.search
- document.URL
sinks
- document.write()
- document.writeln()
- jQuery.html()
- element.innerHTML
- location.href
These sources and sinks are written in the rules as follows.
dom-based-xss.yaml
rules:
- id: dom-based-xss
  mode: taint
  message: dom-xss
  languages:
  - javascript
  - typescript
  severity: ERROR
  pattern-sources:
  - pattern-either:
    - pattern: location
    - pattern: window.location
    - pattern: document.location
    - pattern: document.URL
  pattern-sinks:
  - pattern-either:
    - pattern: document.write($PAYLOAD)
    - pattern: document.writeln($PAYLOAD)
    - pattern: $JQ.html($PAYLOAD)
    - pattern: $ELEMENT.innerHTML = $PAYLOAD
    - pattern: location.href = $PAYLOAD
Notes.
- Once 
locationis set,location.hreflocation.hashlocation.searchis also automatically set to source. - The 
locationis added becausewindow.locationanddocument.locationare also available. 
Add test cases to match the addition of the sources and sinks.
dom-based-xss.js
const qs = window.location.search;
const hash = document.location.hash;
const query = location.search;
const url = document.URL;
// ok
document.write("<p>ok</p>");
// unsafe
document.write("unsafe" + qs);
document.writeln("unsafe" + hash);
// unsafe
$("div.test").html(query)
// unsafe
const e1 = document.createElement('p');
e1.innerHTML = url;
// unsafe
location.href = qs
After adding the test cases, let's run the rule; there are 5 unsafe cases, so 5 should be detected.
$ semgrep --config dom-based-xss.yaml dom-based-xss.js
Scanning 1 file.
Findings:
  dom-based-xss.js
     dom-based-xss
        dom-xss
         10┆ document.write("unsafe" + qs);
          ⋮┆----------------------------------------
         11┆ document.writeln("unsafe" + hash);
          ⋮┆----------------------------------------
         14┆ $("div.test").html(query)
          ⋮┆----------------------------------------
         18┆ e1.innerHTML = url;
          ⋮┆----------------------------------------
         21┆ location.href = qs
Ran 1 rule on 1 file: 5 findings.
Properly detected!
Propagator
In taint tracking, tracking may be interrupted when some functions are used.
For example, the following cases will result in DOM-XSS, but will not be detected by Semgrep.
// unsafe
arr = [];
arr.push(url);
document.write(arr.join(' '));
This is because Semgrep does not know that arr is tainted by push(url). This is where propagators come in.
The propagators are set as follows
pattern-propagators:
- pattern: $ARR.push($E)
  from: $E
  to: $ARR
This will also detect the previous test case. In addition to push, shift and unshift need to be set as propagators well.
Sanitizer
If a variable is properly sanitized, DOM-XSS will not occur.
For example, if you sanitize using DOMPurify, DOM-XSS will not occur. But the current rules will detect it.
// ok
const sanitized = DOMPurify.sanitize(qs)
document.write(sanitized);
So, setting sanitizers will break the tracking assuming the variable is sanitized.
pattern-sanitizers:
- pattern: DOMPurify.sanitize(...)
This will prevent the previous test case from being detected.
Summary of taint tracking
We have now created a rule to detect DOM-Based XSS using taint mode.
For taint mode, we used the following settings
- mode: taint
 - pattern-sources
 - pattern-sinks
 - pattern-propagators
 - pattern-sanitizers
 
The completed YAML file and test code are as follows
dom-based-xss.yaml
rules:
- id: dom-based-xss
  mode: taint
  message: dom-xss
  languages:
  - javascript
  - typescript
  severity: ERROR
  pattern-sources:
  - pattern-either:
    - pattern: location
    - pattern: window.location
    - pattern: document.location
    - pattern: document.URL
  pattern-sinks:
  - pattern-either:
    - pattern: document.write($PAYLOAD)
    - pattern: document.writeln($PAYLOAD)
    - pattern: $JQ.html($PAYLOAD)
    - pattern: $ELEMENT.innerHTML = $PAYLOAD
    - pattern: location.href = $PAYLOAD
  pattern-propagators:
  - pattern: $ARR.push($E)
    from: $E
    to: $ARR
  pattern-sanitizers:
  - pattern: DOMPurify.sanitize(...)
dom-based-xss.js
const qs = window.location.search;
const hash = document.location.hash;
const query = location.search;
const url = document.URL;
// ok
document.write("<p>ok</p>");
// unsafe
document.write("unsafe" + qs);
document.writeln("unsafe" + hash);
// unsafe
$("div.test").html(query);
// unsafe
const e1 = document.createElement('p');
e1.innerHTML = url;
// unsafe
location.href = qs;
// unsafe
arr = [];
arr.push(url);
document.write(arr.join(' '));
// ok
const sanitized = DOMPurify.sanitize(qs)
document.write(sanitized);
Execution Result
$ semgrep --config dom-based-xss.yaml dom-based-xss.js
Scanning 1 file.
Findings:
  dom-based-xss.js
     dom-based-xss
        dom-xss
         10┆ document.write("unsafe" + qs);
          ⋮┆----------------------------------------
         11┆ document.writeln("unsafe" + hash);
          ⋮┆----------------------------------------
         14┆ $("div.test").html(query);
          ⋮┆----------------------------------------
         18┆ e1.innerHTML = url;
          ⋮┆----------------------------------------
         21┆ location.href = qs;
          ⋮┆----------------------------------------
         26┆ document.write(arr.join(' '));
Ran 1 rule on 1 file: 6 findings.
Extract javascript embedded in other languages
By default Semgrep does not scan javascript embedded in HTML.
Consider the following test code
dom-based-xss.html
<html>
    <body>
        <script>
const qs = window.location.search;
const hash = document.location.hash;
// ok
document.write("<p>ok</p>");
// unsafe
document.write("unsafe" + qs);
document.writeln("unsafe" + hash);
        </script>
    </body>
</html>
Let's run the rule we just created on this test code.
$ semgrep --config dom-based-xss.yaml dom-based-xss.html
Nothing to scan.
Ran 1 rule on 0 files: 0 findings.
It could not detect it.
If you want to detect another language embedded within one such language, you must use the extract mode.
Extract from HTML
The rules for extracting javascript from HTML are as follows
extract-html-to-javascript.yaml
rules:
- id: extract-html-to-javascript
  mode: extract
  languages:
    - html
  pattern: <script>$...SCRIPT</script>
  extract: $...SCRIPT
  dest-language: javascript
Extract mode requires the following five settings.
- mode: extract
 - languages
 - pattern
 - extract
 - dest-language
 
This rule allows us to detect javascript in HTML.
$ semgrep --config dom-based-xss.yaml --config extract-html-to-javascript.yaml dom-based-xss.html
Scanning 1 file.
Findings:
  dom-based-xss.html
     dom-based-xss
        dom-xss
         11┆ document.write("unsafe" + qs);
          ⋮┆----------------------------------------
         12┆ document.writeln("unsafe" + hash);
Ran 2 rules on 1 file: 2 findings.
- Note, the extract rule must be set "after" the normal rule; if the extract rule is set "before" it cannot be detected.
 
$ semgrep --config extract-html-to-javascript.yaml --config dom-based-xss.yaml dom-based-x
ss.html
Scanning 1 file.
Ran 2 rules on 1 file: 0 findings.
Extract from ERB
In addition, here is a rule to extract from ERBs used in Ruby on Rails.
extract-erb-to-javascript.yaml
rules:
- id: extract-erb-to-javascript
  mode: extract
  languages:
    - generic
  options:
    generic_ellipsis_max_span: 500
  pattern: ...<script>$...SCRIPT</script>
  extract: $...SCRIPT
  dest-language: javascript
  paths:
    include:
      - "*.erb"
There are two points to note
Point 1:
Use generic because ERB is not a supported langage, and targets files with the extension .erb.
Point 2:
generic omits the 11th line of extracted text by default. Therefore, if the body of a <script> tag exceeds 10 lines, it will not be extracted correctly. Therefore, the option generic_ellipsis_max_span is set to allow extraction of up to 100 lines. (Please adjust the value since it affects performance.)
Conclusion
Through the process of creating rules to detect DOM-Based XSS in Semgrep, the following features were introduced
- taint mode
- source
 - sink
 - propagator
 - sanitizer
 
 - extract mode
- pattern
 - extract
 - dest-language
 
 - option
- generic_ellipsis_max_span
 
 
Use it as a reference when creating your own rules.
The rules and test code created for this tutorial have been placed on GitHub.
https://github.com/takutoy/my-semgrep-rules/tree/master/javascript/browser/security
Trial and Error Records (in Japanese)
https://zenn.dev/takutoy/scraps/6c0f9c20bf1d86
    
Top comments (0)