DEV Community

Cover image for Implementing Hash-Based Strict CSP on AEM
Theo Pendle
Theo Pendle

Posted on

Implementing Hash-Based Strict CSP on AEM

As always, the full solution is available on Github. Scroll to the bottom of the article for the link.

Introduction to CSP

As a reminder, CSP stands for Content Security Policy: a security standard that helps prevent cross-site scripting (XSS), clickjacking, and other code injection attacks by controlling what resources a user agent is allowed to load for a given page using the Content-Security-Policy header.

Perhaps the most critical directive of CSP is the script-src directive, which controls what JS code the browser can load and execute.

The highest level of protection is achieved by using the so-called 'strict' CSP. If you want to know more about strict vs non-strict CSPs, including examples and rationales, please visit these excellent resources:

  1. CSP Is Dead, Long Live Strict CSP
  2. OWASP Content Security Policy Cheat Sheet

There are two methods for implementing a strict CSP: nonces or hashes.

Defining the requirements

Let's first list the features of the ideal CSP solution to help us pick the right approach:

  1. It should provide the highest level of security (ie: strict CSP)
  2. It should work for both author and publish instances (ie: it should not rely on a publish-only dispatcher)
  3. It should be easy to maintain
  4. It should be cacheable
  5. It should work for <script> tags that:
    1. point to internal clientlibs
    2. contain inline JS
    3. point to external clientlibs (eg: external analytics script)

Hashes vs Nonce-nse

While there are certainly cases where the nonce approach makes sense, in my opinion, hashes are a superior solution for most CMS use cases. That's because using nonces presents the following disadvantages:

  1. Because it requires the HTML document to contain these nonces that are unique to each request, it makes the result impossible to cache and fails requirement n.4. Since caching is a critical performance optimization, this is usually disqualifying.
  2. It's not a useful mechanism for protection against untrusted external scripts, so it fails requirement 5.iii. This kind of protection would require an integrity check which is, you guessed it, a hash (which we can re-use for our CSP!)

Hashes, by comparison, can be cached because they are specific to the script content, which typically only changes with a software release and can be used to validate the integrity of scripts from untrusted sources.

Therefore, we can conclude that a hash-based CSP is the best solution for most AEM use cases.

If the above rationale doesn't apply to your use case, then this Medium article by Saravana Prakash can show you how to achieve nonce-based CSP this in AEM.

Solution design

Now that we know what approach to take, let's design the solution.

Where do scripts come from anyway?

There are typically 3 ways in which <script> tags are added to a page's HTML:

  1. Added directly via HTL files that make up the Page component (eg: customfooterlibs.html or customheaderlibs.html)
  2. Added indirectly via the Page Policy: Page policy
  3. Added as dependencies to clientlibs defined in points 1 and 2.

So the sequence of events should be:

  1. Let AEM add all the <script> tags in the page HTML
  2. For each <script> tag:
    • Calculate the hash using one of the following:
      • The inline content of the script
      • The content of the references clientlib
      • The integrity attribute of the untrusted script
  3. Add it to the tag
  4. Add it to the CSP header

Transformers to the rescue!

transformer.png

Photo by Aditya Vyas on Unsplash

Unfortunately I'm not talking about Optimus Prime, but rather a SAX output pipeline that will use a Transformer to add the CSP hashes to the <script> tags on the HTML page.

Implementation

In this section I will highlight the most important parts of the solution. For a complete solution, see the Github link at the bottom of the article.

Creating the transformer

The transformer handles the following cases:

Inline scripts

Example:

<script>
    console.log('Hello, World!');
</script>
Enter fullscreen mode Exit fullscreen mode

If the <script> tag is inline, the hash can be calculated using the innerText of the element.

Clientlib scripts

Example:

   <script src="/etc.clientlibs/demo/clientlibs/clientlib-site.min.js">
Enter fullscreen mode Exit fullscreen mode

If the <script> tag points to a clientlib served from AEM, the hash can be calculated using the content of the clientlib.

External scripts

Example:

   <script src="https://cdn.jsdelivr.net/npm/bootstrap@5.3.3/dist/js/bootstrap.bundle.min.js" integrity="sha384-YvpcrYf0tY3lHB60NNkmXc5s9fDVZLESaAA55NDzOxhy9GkcIdslK1eN7N6jIeHz" crossorigin="anonymous"></script>
Enter fullscreen mode Exit fullscreen mode

Here is the code for the transformer. I've been very generous with the comments to explain the rationale behind each step. This snippet refers to some POJOs and configuration that you can see in the Github diff at the end of the article.

@RequiredArgsConstructor
@Slf4j
public class CspHashTransformer extends DefaultTransformer {

    @Getter
    private final String hashingAlgorithm;

    private final HtmlLibraryManager htmlLibraryManager;

    private SlingHttpServletRequest request;
    private SlingHttpServletResponse response;
    private ContentSecurityPolicy csp;
    private TransformerElement currentElement;

    @Override
    public void init(final ProcessingContext context, final ProcessingComponentConfiguration config) throws IOException {
        super.init(context, config);
        request = context.getRequest();
        response = context.getResponse();
        csp = new ContentSecurityPolicy();

        // We initialize the CSP with a strict-dynamic directive to allow for our trusted scripts to
        // load other scripts without being blocked by the browser.
        csp.addScriptSrcElem("'strict-dynamic'");
    }

    /**
     * Process the start of an element. If the element has a src attribute pointing to a clientlib, calculate the hash
     * and add it to the element as an integrity attribute and to the CSP header.
     *
     * @param namespaceUri  the namespace URI of the element
     * @param localName     the local name of the element
     * @param qualifiedName the qualified name of the element
     * @param attributes    the attributes of the element
     * @throws SAXException if an error occurs during processing
     */
    @Override
    public void startElement(final String namespaceUri, final String localName,
                             final String qualifiedName, final Attributes attributes) throws SAXException {

        currentElement = new TransformerElement(namespaceUri, localName, qualifiedName, attributes);

        log.debug("Start processing element {}", currentElement);

        addIntegrityAttributeAndCspForSrc();

        super.startElement(currentElement.namespaceUri(), currentElement.localName(), currentElement.qualifiedName(), currentElement.attributes());
    }

    /**
     * Called by the SAX parser when it encounters character data. Used to append the character data to the inner text
     * of the current element.
     *
     * @param ch     the character array being read
     * @param start  the start index of the character array
     * @param length the length of the character array
     * @throws SAXException if an error occurs during processing
     */
    @Override
    public void characters(final char[] ch, final int start, final int length) throws SAXException {
        if (currentElement != null) {
            currentElement.innerText().append(ch, start, length);
        }
        super.characters(ch, start, length);
    }

    /**
     * Process the end of an element. If the element has inner text, calculate the hash and add it to the CSP header.
     *
     * @param namespaceUri  the namespace URI of the element
     * @param localName     the local name of the element
     * @param qualifiedName the qualified name of the element
     * @throws SAXException if an error occurs during processing
     */
    @Override
    public void endElement(final String namespaceUri, final String localName,
                           final String qualifiedName) throws SAXException {

        if (currentElement == null) {
            return;
        }

        log.debug("End processing element {}", currentElement);

        addCspForInnerText();

        super.endElement(namespaceUri, localName, qualifiedName);
    }

    private void addIntegrityAttributeAndCspForSrc() {

        // Get the source of the script
        final String src = currentElement.attributes().getValue("src");
        if (src == null) {
            log.debug("No src attribute found for element {}", currentElement);
            return;
        }

        // Attempt to find a clientlib associated with the src attribute
        final HtmlLibrary clientlib = getHtmlLibrary(src);

        // Attempt to read the integrity attribute
        final String integrity = currentElement.attributes().getValue("integrity");

        // If no clientlib can be found using the src, then assume the src is external
        final boolean isExternal = clientlib == null;

        // For security reasons, we consider that an external script without an integrity attribute is invalid. It will
        // not be added to the CSP and therefore will fail to load/execute in the browser.
        if (isExternal && integrity == null) {
            log.error("Integrity attribute missing from external src <{}>. Hash cannot be calculated.", src);
            return;
        }

        // Re-use the integrity hash if possible, else calculate the hash from the clientlib content
        final String hash = isExternal
                ? integrity
                : getHashFromClientlib(clientlib);

        // If no hash can be calculated, then the script will not be added to the CSP and therefore will fail to load
        if (hash == null) {
            log.debug("No clientlib or external hash found for found for src <{}>. Hash cannot be calculated.", src);
            return;
        }

        // For internal script, add the integrity attribute containing the hash. Security-wise this does not provide any
        // benefit as the CSP will already enforce the hash, but it is good practice to include it so that you can
        // easily identify which script corresponds to which hash for debugging puposes.
        if (!isExternal) {
            addIntegrityAttribute(hash);
        }

        // Finally, add the hash to the CSP
        addHashToCsp(hash);
    }

    private String getHashFromClientlib(final HtmlLibrary clientlib) {
        try (final InputStream inputStream = clientlib.getInputStream(true)) {
            final String hash = calculateHashAndEncodeBase64(inputStream);
            log.debug("Hash for <{}>: <{}>", clientlib.getPath(), hash);
            return hash;
        } catch (final IOException e) {
            log.error("Could not read clientlib <{}>", clientlib.getPath(), e);
            return null;
        }
    }

    private void addCspForInnerText() {
        final String innerText = currentElement.innerText().toString();
        if (innerText.isEmpty()) {
            log.debug("Element {} has no inner text", currentElement);
            return;
        }

        final String hash = calculateHashAndEncodeBase64(innerText);
        addHashToCsp(hash);
    }

    /**
     * Adds the hash as an integrity attribute to the current element.
     *
     * @param hash the hash to add
     */
    private void addIntegrityAttribute(final String hash) {
        final AttributesImpl attributes = new AttributesImpl(currentElement.attributes());
        attributes.addAttribute(currentElement.namespaceUri(), "integrity", "integrity", "0", hash);
        currentElement.attributes(attributes);
    }

    /**
     * Adds the hash to the Content-Security-Policy header.
     *
     * @param hash the hash to add
     */
    private void addHashToCsp(final String hash) {
        csp.addScriptSrcElem("'" + hash + "'");
        response.setHeader("Content-Security-Policy", csp.toString());
    }

    /**
     * Find the clientlib associated with the src attribute if such a clientlib exists.
     *
     * @param src the src attribute of the element
     * @return the clientlib associated with the src attribute, or null if no such clientlib exists
     */
    private HtmlLibrary getHtmlLibrary(final String src) {
        final String path;
        try {
            path = new URI(src).getPath();
        } catch (final URISyntaxException e) {
            log.error("src attribute element {} is not a valid URI", currentElement, e);
            return null;
        }

        // Find true path of clientlib in /apps (or /libs, via overlay)
        final String appsPath = path
                .replace("etc.clientlibs", "apps")
                .replace(".min.js", "");

        final Resource resource = request.getResourceResolver().resolve(appsPath);
        if (resource instanceof NonExistingResource) {
            log.error("Could not find resource using path <{}>", path);
            return null;
        }

        return htmlLibraryManager.getLibrary(LibraryType.JS, resource.getPath());
    }

    private String calculateHashAndEncodeBase64(final InputStream inputStream) {
        try {
            final MessageDigest digest = MessageDigest.getInstance(hashingAlgorithm);
            final byte[] buffer = new byte[4096];
            int bytesRead;

            while ((bytesRead = inputStream.read(buffer)) != -1) {
                digest.update(buffer, 0, bytesRead);
            }

            final byte[] hash = digest.digest();
            final String hashString = Base64.getEncoder().encodeToString(hash);
            return hashAndAlgorithm(hashString);

        } catch (final IOException e) {
            log.error("Error reading input stream for hashing", e);
            return null;
        } catch (final NoSuchAlgorithmException e) {
            log.error("Encryption algorithm not found", e);
            return null;
        }
    }

    private String calculateHashAndEncodeBase64(final String string) {
        try {
            final MessageDigest digest = MessageDigest.getInstance(hashingAlgorithm);

            final byte[] hash = digest.digest(string.getBytes(StandardCharsets.UTF_8));
            final String hashString = Base64.getEncoder().encodeToString(hash);
            return hashAndAlgorithm(hashString);

        } catch (final NoSuchAlgorithmException e) {
            log.error("Encryption algorithm not found", e);
            return null;
        }
    }

    private String hashAndAlgorithm(final String hash) {
        return hashingAlgorithm.toLowerCase().replace("-", "") + "-" + hash;
    }
}
Enter fullscreen mode Exit fullscreen mode

Adding the transformer to the pipeline

To create a pipeline that includes our transformer, we need to create a Rewriter by adding node at /apps/demo/config/rewriter/links-pipeline with the following properties:

<?xml version="1.0" encoding="UTF-8"?>
<jcr:root xmlns:jcr="http://www.jcp.org/jcr/1.0" xmlns:nt="http://www.jcp.org/jcr/nt/1.0"
          jcr:primaryType="nt:unstructured"
          contentTypes="text/html"
          generatorType="htmlparser"
          order="1"
          paths="[/content]"
          serializerType="htmlwriter"
          transformerTypes="[csp-hash-transformer]">
    <generator-htmlparser
            jcr:primaryType="nt:unstructured"
            includeTags="[SCRIPT]"/>
</jcr:root>
Enter fullscreen mode Exit fullscreen mode

The result

Now, if we load scripts onto our page using customfooterlibs.html:

<!-- This HTL include demonstrates the loading of a clientlib -->
<sly data-sly-use.clientlib="core/wcm/components/commons/v1/templates/clientlib.html">
    <sly data-sly-call="${clientlib.js @ categories='demo.base', async=true}"/>
</sly>

<!-- This script element demonstrates the loading of an external script -->
<script src="https://cdn.jsdelivr.net/npm/bootstrap@5.3.3/dist/js/bootstrap.bundle.min.js"
        integrity="sha384-YvpcrYf0tY3lHB60NNkmXc5s9fDVZLESaAA55NDzOxhy9GkcIdslK1eN7N6jIeHz"
        crossorigin="anonymous"></script>

<!-- This script element demonstrates the inline script hashing -->
<script>
    console.log('This was logged from an inline script!');
</script>

<!-- This inline script loads an external script to demonstrate the 'dynamic' principle -->
<script>
    const script = document.createElement('script');
    script.src = 'https://cdn.jsdelivr.net/npm/jquery@3.7.1/dist/jquery.min.js';
    document.body.appendChild(script);

    addEventListener("load", (event) => console.log('jQuery version:',$().jquery));
</script>
Enter fullscreen mode Exit fullscreen mode

We should receive a CSP like this (yours will vary depending on your clientlibs/dependencies):

Content-Security-Policy: script-src-elem 'strict-dynamic' 'sha256-47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=' 'sha256-XjA+iLg5j0FvhFkZc7LcXfbQJ0b3gvw2c2jj9vv65q0=' 'sha256-Uv8dzRTPRZ2++L/ZWgfN9lPjdvzsDYVS4rEfvWCA0x0=' 'sha256-3dfW5u+XJXRPqcC3F8wewmnAr6oxejxP7ArjOE38P2Q=' 'sha256-5hrKOpQWBa1NuajxV3udxJCgNMQMD/lUApbmGxMmpuM=' 'sha256-wlCSQBL9yeqVFrMGUIlSAc0Wfb1JydFIkk8wiBq/o5M=' 'sha256-WJ3od+zqoblT5apcuXdUh4o1UWwVnb5AjQhmHWIu2OY=' 'sha384-YvpcrYf0tY3lHB60NNkmXc5s9fDVZLESaAA55NDzOxhy9GkcIdslK1eN7N6jIeHz' 'sha256-QFYsdZ/eGhCq89XHZ7IOsy5A9dTKeyvduAx2RnCqvAA=' 'sha256-Fb8sXaPGzkkQJOoKLIpn0I5s+VOOiBlZMYGgv8wHxZI=' 'sha256-g2cCN9gX44Hp5lFL/iomg3hI3LeG/LRkzeNJfQZjJGI=';
Enter fullscreen mode Exit fullscreen mode

And you should see that the demonstration scripts have executed as expected in the browser console:

This was logged from an inline script!
jQuery version: 3.7.1
Enter fullscreen mode Exit fullscreen mode

What about the other CSP directives?

Good question! There are indeed dozens of other directives you can use to fine-tune your CSP.

This article will not give you a comprehensive strategy for dealing with all your CSP requirements, it only shows you how to automate the configurationscript-src-elem directive.

Thankfully, using multiple CSP headers is a valid approach, so you can add the rest of the directives anywhere you like as additional CSP headers. Just make sure you understand how multiple CSP headers interact with each other to avoid surprises.

Conclusion

As promised, all the code is available in one easy-to-read diff on Github. You can find it here.

If you have any comments or ideas about this article, the topic matter or the format, don't hesitate to leave a comment or to reach out to me on LinkedIn!

Top comments (0)