<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Arcjet</title>
    <description>The latest articles on DEV Community by Arcjet (@arcjet).</description>
    <link>https://dev.to/arcjet</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F9240%2F340e54de-ab24-45af-929b-5a71289be1ef.png</url>
      <title>DEV Community: Arcjet</title>
      <link>https://dev.to/arcjet</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/arcjet"/>
    <language>en</language>
    <item>
      <title>Devcontainers, Little Snitch, macOS TCC - protecting developer laptops</title>
      <dc:creator>David Mytton</dc:creator>
      <pubDate>Tue, 08 Jul 2025 09:39:51 +0000</pubDate>
      <link>https://dev.to/arcjet/devcontainers-little-snitch-macos-tcc-protecting-developer-laptops-3e7g</link>
      <guid>https://dev.to/arcjet/devcontainers-little-snitch-macos-tcc-protecting-developer-laptops-3e7g</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F79113a24hsh01t6rpm7e.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F79113a24hsh01t6rpm7e.jpg" alt="Devcontainers, Little Snitch, macOS TCC - protecting developer laptops" width="800" height="448"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A single compromised npm package on a developer's laptop is all it takes - a quiet threat that executes with the familiar &lt;code&gt;npm install&lt;/code&gt; command.&lt;/p&gt;

&lt;p&gt;The potential for damage is significant - compromised commit rights to source repositories, stolen session tokens, exposed secrets from environment variables, and even direct access to production networks. Once you gain a foothold on a developer laptop, there are many opportunities to reach sensitive production systems.&lt;/p&gt;

&lt;p&gt;At &lt;a href="https://arcjet.com/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;Arcjet&lt;/a&gt;, developer laptops consistently come top in our assessments of the "most likely" threats. By the very nature of the job, developers are regularly expected to install dependencies, execute code on their local systems, use third-party editor extensions, and connect to sensitive environments. This inherent risk likely explains the recent surge in developer-focused exploits, such as malware bundled within &lt;a href="https://socket.dev/blog/malicious-pypi-package-targets-discord-developers-with-token-theft-and-backdoor?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Python&lt;/u&gt;&lt;/a&gt;, &lt;a href="https://socket.dev/blog/typosquatted-go-packages-deliver-malware-loader?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Go&lt;/u&gt;&lt;/a&gt;, and &lt;a href="https://socket.dev/blog/npm-package-wipes-codebases-with-remote-trigger?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Node&lt;/u&gt;&lt;/a&gt; packages; &lt;a href="https://control-plane.io/posts/abusing-vscode-from-malicious-extensions-to-stolen-credentials-part-1/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;VS Code extension&lt;/u&gt;&lt;/a&gt; exploits; and &lt;a href="https://github.blog/security/vulnerability-research/attacking-browser-extensions/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Chrome extension hijacking&lt;/u&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Arcjet is a devtools startup. Our security as code SDK helps developers implement features like bot detection and signup form spam detection. We’re thinking about security all day every day, not just in our product, but also in how we run the company. In this blog post, I’ll talk through some of the work we’ve done to improve our own developer security.&lt;/p&gt;

&lt;h2&gt;
  
  
  Devcontainers
&lt;/h2&gt;

&lt;p&gt;The first line of defense is containing the development environment itself. Originally developed by Microsoft, &lt;a href="https://containers.dev/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Devcontainers&lt;/u&gt;&lt;/a&gt; is an open specification that defines the development environment for a project:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A development container (or dev container for short) allows you to use a container as a full-featured development environment.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Using a &lt;code&gt;.devcontainer/devcontainer.json&lt;/code&gt; file, you can define a container environment by specifying a base image from a public or private registry. Various &lt;a href="https://containers.dev/features?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;optional features&lt;/u&gt;&lt;/a&gt; can be added to install common tools, such as the GitHub or AWS CLIs, linters, formatters, and other language runtimes. Include recommended VS Code extensions and scripts to run after installation, and within a few seconds of launching the container you have a fully configured development environment.&lt;/p&gt;

&lt;p&gt;When you have a team of developers, getting them all running the same version of the same tools can be a big challenge. Using Devcontainers solves this by defining a consistent environment as configuration, rather than manually setting things up. The Devcontainers config &lt;a href="https://github.com/arcjet/arcjet-js/blob/main/.devcontainer/devcontainer.json?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;in our public JS SDK&lt;/u&gt;&lt;/a&gt; and &lt;a href="https://github.com/arcjet/arcjet-docs/blob/main/.devcontainer/devcontainer.json?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;docs repos&lt;/u&gt;&lt;/a&gt; has helped make it easy for external contributions.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;// For format details, see https://aka.ms/devcontainer.json. For config options, see the
// README at: https://github.com/devcontainers/templates/tree/main/src/javascript-node
{
  "name": "arcjet-docs",
  "image": "mcr.microsoft.com/devcontainers/javascript-node:1-22-bookworm",
  // Features to add to the dev container. More info: https://containers.dev/features.
  "features": {
    "ghcr.io/devcontainers/features/common-utils:2.5.2": {},
    "ghcr.io/trunk-io/devcontainer-feature/trunk:1.1.0": {}
  },
  // Use 'forwardPorts' to make a list of ports inside the container available locally.
  // "forwardPorts": [],
  // Install trunk tools inside the container
  // Uses array syntax to skip the shell: https://containers.dev/implementors/json_reference/#formatting-string-vs-array-properties
  "updateContentCommand": ["trunk", "install"],
  // Install npm dependencies within the container
  // Uses array syntax to skip the shell: https://containers.dev/implementors/json_reference/#formatting-string-vs-array-properties
  "postCreateCommand": ["npm", "ci"],
  "customizations": {
    "vscode": {
      "extensions": [
        "astro-build.astro-vscode",
        "unifiedjs.vscode-mdx",
        "trunk.io"
      ]
    }
  }
  // Configure tool-specific properties.
  // "customizations": {},
  // Uncomment to connect as root instead. More info: https://aka.ms/dev-containers-non-root.
  // "remoteUser": "root",
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Devcontainer config for the Arcjet docs repo.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Using a Devcontainer isolates the dev environment from the host system (the developer laptop). Code is executed inside the container rather than directly on the host. This isolation mitigates most attack vectors such as malware executed via post-install scripts or from backdoored dependencies.&lt;/p&gt;

&lt;p&gt;The downside is a minor performance overhead, particularly with I/O-intensive operations on macOS due to the underlying virtualization layer. However, for most development workflows, the security benefits far outweigh the negligible performance impact. &lt;a href="https://code.visualstudio.com/remote/advancedcontainers/improve-performance?ref=blog.arcjet.com#_use-clone-repository-in-container-volume" rel="noopener noreferrer"&gt;&lt;u&gt;Cloning a repository directly into a container volume&lt;/u&gt;&lt;/a&gt; rather than binding to the host filesystem can mitigate most of the performance issues. &lt;/p&gt;

&lt;p&gt;While credentials and source code within the devcontainer could still be exfiltrated, the damage is constrained by the container's boundaries, making it easier to quarantine. This prevents code from unrestricted host access - keychain, password manager vaults, browser history databases, etc.&lt;/p&gt;

&lt;p&gt;Containers are not designed for security or 100% isolation - they’re more of a convenient packaging and deployment format - so there is always the potential for container breakout. However, most attackers will assume that code is executed on the host system directly. All the code knows is that it’s running on a (pretty sparse) Linux machine. Devcontainers can therefore be a very effective layer of security for development.&lt;/p&gt;

&lt;h2&gt;
  
  
  Outbound firewall
&lt;/h2&gt;

&lt;p&gt;The next layer is controlling what the isolated environment can access. macOS has a good built-in firewall, but it is primarily designed to protect against inbound connections. Tracking outbound connections is just as important.&lt;/p&gt;

&lt;p&gt;Using an outbound firewall such as &lt;a href="https://objective-see.org/products/lulu.html?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;LuLu firewall&lt;/u&gt;&lt;/a&gt; (free, open source) or &lt;a href="https://obdev.at/products/littlesnitch/index.html?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Little Snitch&lt;/u&gt;&lt;/a&gt; (paid, or its free variant &lt;a href="https://obdev.at/products/littlesnitch-mini/index.html?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Little Snitch Mini&lt;/u&gt;&lt;/a&gt;) will alert you the first time any application attempts to make outbound connections. This is initially quite noisy, but you get a good baseline of common applications pretty quickly.&lt;/p&gt;

&lt;p&gt;Why is this important? A compromised dependency might try to "phone home" by sending exfiltrated secrets (like your &lt;code&gt;AWS_ACCESS_KEY_ID&lt;/code&gt;) to a remote server over a standard port like DNS (53) or HTTPS (443). A default-deny firewall forces you to explicitly allow connections, making this anomalous traffic immediately obvious.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4gn7azt9g2ykspt7fduu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4gn7azt9g2ykspt7fduu.png" alt="Devcontainers, Little Snitch, macOS TCC - protecting developer laptops" width="800" height="579"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Little Snitch rules configuration.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Built-in macOS protections
&lt;/h2&gt;

&lt;p&gt;This layer focuses on using mechanisms built into macOS. Introduced in macOS Mojave (10.14), the &lt;a href="https://eclecticlight.co/2023/02/10/privacy-what-tcc-does-and-doesnt/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;Transparency Consent and Control (TCC) framework&lt;/a&gt; restricts application access to sensitive user data and system resources. &lt;/p&gt;

&lt;p&gt;You’ll have seen this in action with the consent boxes appearing whenever applications try to access your microphone, camera, location, photos, contacts, and other areas of your system that macOS considers sensitive. &lt;/p&gt;

&lt;p&gt;This protection also extends to the &lt;code&gt;~/Downloads&lt;/code&gt;, &lt;code&gt;~/Documents&lt;/code&gt;, and &lt;code&gt;~/Desktop&lt;/code&gt; folders so that any process which tries to read or write files to these locations will be blocked until you approve access. macOS 10.15 &lt;a href="https://developer.apple.com/forums/thread/663889?answerId=640805022&amp;amp;ref=blog.arcjet.com#640805022" rel="noopener noreferrer"&gt;&lt;u&gt;introduced additional access controls&lt;/u&gt;&lt;/a&gt; for anything located in the &lt;code&gt;~/Desktop&lt;/code&gt; and &lt;code&gt;~/Documents&lt;/code&gt; directories which &lt;a href="https://github.com/h4llow3En/mac-notification-sys/issues/33?ref=blog.arcjet.com#issuecomment-1480294409" rel="noopener noreferrer"&gt;&lt;u&gt;locks down kernel access even further&lt;/u&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa23ksglnodr018vp3hlw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa23ksglnodr018vp3hlw.png" alt="Devcontainers, Little Snitch, macOS TCC - protecting developer laptops" width="800" height="874"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;macOS Privacy &amp;amp; Security controls.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;These restrictions apply to each application, which includes any scripts or processes that might attempt to exfiltrate the contents of source code directories on disk. If you place all your code into one of these three directories, they will also benefit from the TCC protections (although &lt;a href="https://www.qt.io/blog/the-curious-case-of-the-responsible-process?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;the responsible process&lt;/u&gt;&lt;/a&gt; might show up as your editor or terminal).&lt;/p&gt;

&lt;p&gt;For example, if you check out your Git repository to &lt;code&gt;~/Documents/repo&lt;/code&gt; then any malware that attempts to scrape the contents of &lt;code&gt;~/Documents&lt;/code&gt; will trigger the consent popup.&lt;/p&gt;

&lt;p&gt;The role of TCC becomes more nuanced when using Devcontainers. This is because the container runtime itself (e.g., Docker Desktop or OrbStack) is the application that receives TCC authorization to access directories on the host. Consequently, malware executing within the container (e.g., via a post-install script) that accesses these mounted files will not trigger a new TCC prompt. The I/O request is proxied through the trusted runtime, effectively bypassing a direct TCC check on the malicious process.&lt;/p&gt;

&lt;p&gt;While this means TCC's file-access prompts offer less direct protection from threats inside the container, the container itself still provides a layer of isolation. TCC remains a useful defense against a potential container escape, where malware might try to break out and execute directly on the macOS host.&lt;/p&gt;

&lt;h2&gt;
  
  
  SSH agent for Git keys
&lt;/h2&gt;

&lt;p&gt;The final layer is ensuring the developer's identity and access are secure. Any keys stored directly on disk are easily accessible. This is one reason why AWS recommends using &lt;a href="https://aws.amazon.com/iam/identity-center/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;IAM Identity Center&lt;/u&gt;&lt;/a&gt; with the &lt;a href="https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-sso.html?ref=blog.arcjet.com#cli-configure-sso-login" rel="noopener noreferrer"&gt;&lt;u&gt;SSO CLI flow&lt;/u&gt;&lt;/a&gt; for logging into AWS accounts - so static credentials aren’t stored on disk.&lt;/p&gt;

&lt;p&gt;The same problems arise with SSH keys, often &lt;a href="https://docs.github.com/en/authentication/connecting-to-github-with-ssh?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;used for GitHub authentication&lt;/u&gt;&lt;/a&gt;. Generated keys are stored in &lt;code&gt;~/.ssh&lt;/code&gt; by default, which makes them easy to exfiltrate. &lt;/p&gt;

&lt;p&gt;Balancing UX with security is always a challenge. One option is to set a passphrase for the key and then store it in Keychain. This happens automatically if you access a passphrase protected file stored at &lt;code&gt;.ssh/id_rsa&lt;/code&gt; or &lt;code&gt;.ssh/identity&lt;/code&gt; - macOS will manage access for you. If you have multiple keys or the key is stored somewhere else, you can &lt;a href="https://docs.github.com/en/authentication/connecting-to-github-with-ssh/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent?ref=blog.arcjet.com#adding-your-ssh-key-to-the-ssh-agent" rel="noopener noreferrer"&gt;&lt;u&gt;manually add it to ssh-agent and ask for the passphrase to be stored in Keychain&lt;/u&gt;&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;An alternative to using the macOS Keychain is the &lt;a href="https://developer.1password.com/docs/ssh/get-started/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;SSH key support in 1Password&lt;/u&gt;&lt;/a&gt;. This avoids any files on disk and in contrast to using your login password for Keychain, 1Password triggers a prompt whenever a new application wishes to access the key. We have 1Password reporting to our logging infrastructure which means we also get audit logs for all credential access.&lt;/p&gt;

&lt;p&gt;At Arcjet, we mandate signed commits, which requires developers to manage a &lt;a href="https://docs.github.com/en/authentication/managing-commit-signature-verification/telling-git-about-your-signing-key?ref=blog.arcjet.com#telling-git-about-your-ssh-key" rel="noopener noreferrer"&gt;&lt;u&gt;signing key&lt;/u&gt;&lt;/a&gt;. Enforcing signed commits is a foundational practice for securing the software supply chain. It provides verifiable attestations that code originates from a trusted developer (that has also authenticated recently), protecting the repository from unauthorized code injection, even if a developer's GitHub access is compromised.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F91ofqjda1h2p3clkay0f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F91ofqjda1h2p3clkay0f.png" alt="Devcontainers, Little Snitch, macOS TCC - protecting developer laptops" width="800" height="710"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;1Password Developer tools.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  MDM
&lt;/h2&gt;

&lt;p&gt;It’s only a matter of time before an incident happens. Ideally, one of the above layers will catch attacks or mistakes, but when something does happen we need to be able to detect it, understand how it happened, apply effective quarantine measures, and quickly remediate the situation.&lt;/p&gt;

&lt;p&gt;The focus is often on the fancy detection and response part of this, but logging is just as important because it helps you answer questions like: What happened? What data (if any) was extracted? How long has this been compromised? Were any other systems impacted?&lt;/p&gt;

&lt;p&gt;All Arcjet devices are provisioned with MDM tooling to help detect potential problems quickly, alert the right people, and protect our team &amp;amp; customers. We’ve partnered with &lt;a href="https://www.latacora.com/services/detection-and-response/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Latacora for 24/7 monitoring &amp;amp; response&lt;/u&gt;&lt;/a&gt; and their team acts like our internal security experts. Various detection rules notify us of any suspicious activity, we have escalation channels to trigger rapid response investigations, and regularly run practice tabletop exercises to test our processes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusions
&lt;/h2&gt;

&lt;p&gt;Security is all about layers. Securing a developer workstation is not about achieving an impenetrable state; it's about creating layers of defense that systematically reduce the attack surface. By isolating development work in containers, controlling network egress, making use of built-in features on the host OS, and securing developer identity, we’re able to build a robust security posture without getting in the way of development.&lt;/p&gt;

</description>
      <category>engineering</category>
      <category>security</category>
    </item>
    <item>
      <title>How we run Arcjet like an open source project</title>
      <dc:creator>David Mytton</dc:creator>
      <pubDate>Mon, 23 Jun 2025 13:47:45 +0000</pubDate>
      <link>https://dev.to/arcjet/how-we-run-arcjet-like-an-open-source-project-128f</link>
      <guid>https://dev.to/arcjet/how-we-run-arcjet-like-an-open-source-project-128f</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6p42a8zhifzw3wqsz1of.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6p42a8zhifzw3wqsz1of.jpg" alt="How we run Arcjet like an open source project" width="800" height="448"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Building a startup culture that matches the users of the product is an advantage. We're a remote team and as several of our team members have a long history of open source contributions, these practices have evolved from how many open source projects operate.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://arcjet.com/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;Arcjet&lt;/a&gt; is a remote-first company. We're developing SDK that streamlines bot detection, attack prevention, and spam protection for developers. Even though our core service is not open source &lt;a href="https://github.com/arcjet?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;our SDK and docs are&lt;/a&gt;, so we've adopted many of the workflows that are used in open source. We're not quite requiring contributions go through &lt;a href="https://git-send-email.io/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;git email&lt;/a&gt;, but remote working requires systems and tools to make it work well.&lt;/p&gt;

&lt;p&gt;For example, we track feature requests, ideas, bugs, and technical debates in GitHub issues. Code comments reference issue IDs (e.g., &lt;code&gt;// TODO(#123): something something&lt;/code&gt;), and we use draft PRs to solicit feedback before formal reviews.&lt;/p&gt;

&lt;p&gt;Over the last 16 years &lt;a href="https://davidmytton.blog/a-guide-to-remote-working-for-startups/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;I’ve led three startups remotely&lt;/a&gt; - Server Density (acquired 2018), Console.dev (a devtools newsletter) and now Arcjet. In this blog post I'll talk through some of the processes we're using to make remote engineering work at a devtools startup.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real-time sometimes, asynchronous most of the time
&lt;/h2&gt;

&lt;p&gt;Slack aimed to replace email as a single workspace for all communication, but in practice it fragments attention and search is unreliable.&lt;/p&gt;

&lt;p&gt;Following the &lt;a href="https://basecamp.com/guides/how-we-communicate?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;37signals Guide to Internal Communication&lt;/u&gt;&lt;/a&gt; recommendation that unrestricted group chat can overload teams, we set clear boundaries.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Slack for socializing, quick clarifications or discussions, and link-sharing.&lt;/li&gt;
&lt;li&gt;No expectation of immediate replies.&lt;/li&gt;
&lt;li&gt;If it will &lt;a href="https://critter.blog/2021/01/12/if-it-matters-after-today-stop-talking-about-it-in-a-chat-room/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;matter after today&lt;/u&gt;&lt;/a&gt;, all lasting ideas, decisions, and conclusions migrate to GitHub issues/PRs or Notion pages.&lt;/li&gt;
&lt;li&gt;We auto-delete Slack history after 7 days to enforce permanent records elsewhere.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Over the past two years, we shifted from GitHub Discussions to GitHub Issues due to notification limitations. Engineering questions, bugs, and feature ideas are all tracked via an issue. For code-in-progress, Pull Requests are used for context-specific discussions and PRs are often opened in draft first for early feedback. The principle here is that everything has a permanent link we can reference in future. This has worked well in general, but big PRs and extensive comments easily get lost in the GitHub web UI.&lt;/p&gt;

&lt;p&gt;Notion is for design notes, policies, research, and general information that doesn't belong in a PR. Notion is like a powerful wiki, however version history and comment threading can be cumbersome, causing duplication between Notion pages and GitHub discussions (e.g., architecture feedback appearing in both).&lt;/p&gt;

&lt;p&gt;We’ve recently started capturing more structured &lt;a href="https://adr.github.io/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Architectural Decision Records&lt;/u&gt;&lt;/a&gt; as a way to improve this. The advantage of committing ADRs to the codebase is the ability to use PRs and commit history for tracking. However, discovery is more of a challenge and we've not yet solved the problem of information existing in multiple places e.g. design notes in Notion and decision records in GitHub.&lt;/p&gt;

&lt;p&gt;I am optimistic about using Notion’s AI integration with GitHub, Google Docs, and Slack, to answer questions and summarize discussions. There’s a lot of information embedded across several tools, so it seems the perfect AI use case.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stripe's internal email groups
&lt;/h2&gt;

&lt;p&gt;A decade ago, Stripe &lt;a href="https://stripe.com/blog/email-transparency?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;wrote about&lt;/u&gt;&lt;/a&gt; how they use internal email lists to improve transparency. They provided more detail on their methodology with &lt;a href="https://stripe.com/blog/scaling-email-transparency?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;a post a year later&lt;/u&gt;&lt;/a&gt; and even built &lt;a href="https://github.com/stripe-archive/gaps?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;a custom tool&lt;/u&gt;&lt;/a&gt; for managing Google Groups.&lt;/p&gt;

&lt;p&gt;Stripe was trying to solve the problem where email is directed to individuals. Using a group meant that you could search all past history and get a link to any specific thread. However, their tool was achieved in 2019 and I couldn't find anything recent about whether the approach still works. I know they use Slack, but has that replaced email?&lt;/p&gt;

&lt;p&gt;We continue using email externally and use group addresses for finance, operations, hiring, and marketing. Almost all external emails are CC’d to the internal group to maintain transparency. This has proven useful as we hire people into roles that I was previously involved with - the handover is much easier when context is contained within a searchable group.&lt;/p&gt;

&lt;h2&gt;
  
  
  Internal weekly update emails
&lt;/h2&gt;

&lt;p&gt;We also introduced a weekly email to an internal-only updates mailing list. Every Friday everyone sends an email to the list answering three questions:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;What did you ship?&lt;/strong&gt; Briefly explain the main thing you shipped this week. “Shipped” = merged to &lt;code&gt;main&lt;/code&gt; and deployed to production. Dependency updates don’t count unless there was major migration work.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What did you work on?&lt;/strong&gt; A couple of bullet points of what else you worked on, not including what you shipped.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What did you learn or find interesting?&lt;/strong&gt; A couple of sentences explaining one or two things you found interesting, with links. Must be related to Arcjet in some way.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This keeps everyone up to date and you can read the updates at your leisure. I particularly enjoy reading everyone's responses to the final question, which I adapted from &lt;a href="https://www.wsj.com/business/nvidia-jensen-huang-book-advice-b9794576?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Jensen Huang’s Top 5 Things (T5T) emails&lt;/u&gt;&lt;/a&gt;. Sometimes an update gets no replies, but there's often discussion with people replying to ask about or add to these points.&lt;/p&gt;

&lt;p&gt;So far Google Groups has worked well (it’s a shame GitHub Discussions wasn’t better).  Each person can subscribe how they like (every email, summaries, digests), they’re searchable, and there’s a link to each message. However, the web interface is outdated, threading doesn’t work perfectly, and external spam protection seems poor compared to regular Gmail. &lt;a href="https://wordpress.com/p2/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;WordPress P2&lt;/u&gt;&lt;/a&gt; might be an option, but it feels too much like it’s a hacked together blog.&lt;/p&gt;

&lt;h2&gt;
  
  
  Remote-first, in-person regularly
&lt;/h2&gt;

&lt;p&gt;The remote-only purists are wrong. So are those who say everyone must be in the office all the time. There’s always been a balance to strike and there are plenty of examples of both styles working well. It comes down to who is setting principles and defining the culture.&lt;/p&gt;

&lt;p&gt;One thing I’ve learned is that you can create a remote-first culture and then add an in-person or in-office element later, but &lt;a href="https://davidmytton.blog/how-to-make-remote-working-work/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;not the other way around&lt;/a&gt;. The most important thing is to have the right tools and systems in place.&lt;/p&gt;

&lt;p&gt;If the default assumption is that everyone is in the office then these systems don’t get fully developed and important information gets lost. The result is the remote team being left out, the whole system failing, and then you get “return to office” mandates. The solution is to always assume that everyone is remote and record every discussion and decision.&lt;/p&gt;

&lt;p&gt;But working in-person works well for coming up with ideas and iterating quickly. I’ve always found a big difference before and after people meet for the first time. Once you’ve met in-person, it’s a lot easier to work together remotely.&lt;/p&gt;

&lt;p&gt;Arcjet is set up as a remote-first, distributed company (US and Western EU timezones to ease collaboration). However, we aim to do in-person meetups 2-3 times a year. It’s not cheap to organize, but remote working has never been about saving costs - the money you save on an office goes into travel. We’ve run these multiple times in London, New York, and Las Vegas (for Defcon) and are planning other varied locations for the next ones. The main challenge is finding somewhere to work from together - comfortable chairs and good wifi are hard to find!&lt;/p&gt;

&lt;h2&gt;
  
  
  Key principles
&lt;/h2&gt;

&lt;p&gt;Open-source-inspired workflows suit our developer-focused security product by improving transparency and feedback loops. Our team has more context to help make good decisions because we can search through past work to understand the current state of things.&lt;/p&gt;

&lt;p&gt;To make this work we following a few key principles:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Document everything in linkable, permanent repositories.&lt;/li&gt;
&lt;li&gt;Favor async workflows; use real-time sparingly.&lt;/li&gt;
&lt;li&gt;Invest in periodic in-person meetups to build rapport.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The tools have never been better, and I expect AI to make linking systems more effective. It'll be interesting to look back in another 10 years and see what else has changed.&lt;/p&gt;

</description>
      <category>engineering</category>
    </item>
    <item>
      <title>Bot detection techniques for developers</title>
      <dc:creator>David Mytton</dc:creator>
      <pubDate>Thu, 05 Jun 2025 19:50:00 +0000</pubDate>
      <link>https://dev.to/arcjet/bot-detection-techniques-for-developers-5524</link>
      <guid>https://dev.to/arcjet/bot-detection-techniques-for-developers-5524</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxp5ehgsuvulkqvohxd51.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxp5ehgsuvulkqvohxd51.jpg" alt="Bot detection techniques for developers" width="800" height="448"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;tl;dr: Bot traffic now dominates the web, and AI scrapers are making it worse. Blocking by user agent or IP isn’t enough. This post covers practical detection and enforcement strategies - including fingerprinting, rate limiting, and proof-of-work - plus how&lt;/em&gt; &lt;a href="https://arcjet.com/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;em&gt;Arcjet&lt;/em&gt;&lt;/a&gt;&lt;em&gt;’s security as code product builds defenses directly into your app logic.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Bots have always been a part of the internet. Most site owners like good bots because they want to be indexed in search engines and they tend to follow the rules.&lt;/p&gt;

&lt;p&gt;Bad bots have also been a part of the internet for a long time. You can observe this within seconds of exposing a server on a public IP address. If it’s a web server you’ll quickly see scanners testing for known WordPress vulnerabilities, accidentally published .git directories, exposed config files, etc. Same for other servers: SSH brute forcing, SMTP relay attacks, SMB login attempts, etc.&lt;/p&gt;

&lt;p&gt;But recently things seem to have become worse. Depending on who you ask, bots make up &lt;a href="https://www.malwarebytes.com/blog/uncategorized/2025/04/hi-robot-half-of-all-internet-traffic-now-automated?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;37%&lt;/u&gt;&lt;/a&gt;, &lt;a href="https://www.akamai.com/newsroom/press-release/bots-compose-42-percent-of-web-traffic-nearly-two-thirds-are-malicious?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;42%&lt;/u&gt;&lt;/a&gt; or almost &lt;a href="https://www.imperva.com/resources/resource-library/reports/2024-bad-bot-report/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;50%&lt;/u&gt;&lt;/a&gt; of all 2024 internet traffic. This also ranges by industry, from &lt;a href="https://www.statista.com/statistics/1264540/human-and-bot-web-traffic-share-industry/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;18% in marketing to 57% in gaming&lt;/u&gt;&lt;/a&gt; (2023).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.arcjet.com/bot-protection/concepts?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Arcjet’s bot detection&lt;/u&gt;&lt;/a&gt; is our most popular feature and we see millions of requests from bots every day with all sorts of abuse patterns. In this post we’ll look at the problem bots cause, the techniques they use to evade defenses, and how you can protect yourself.&lt;/p&gt;

&lt;h2&gt;
  
  
  What problems do bots cause?
&lt;/h2&gt;

&lt;p&gt;A human using a web browser has certain expected behaviors. Even if many requests to the various page assets can be executed in parallel, they will still make requests at human speed. Their progress will be gradual (they won’t load every page on a site simultaneously), and caching mechanisms (local in-browser and/or through a CDN) work to reduce the number of requests and/or data transfer.&lt;/p&gt;

&lt;p&gt;Good bots mimic this behavior by progressively visiting site pages at a more human-like pace. They typically follow the rules posted by site owners (the &lt;a href="https://en.wikipedia.org/wiki/Robots.txt?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;robots.txt voluntary standard&lt;/u&gt;&lt;/a&gt; was first published in 1994 and became &lt;a href="https://doi.org/10.17487/RFC9309?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;a proposed standard in 2022&lt;/u&gt;&lt;/a&gt;).&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Problem&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Examples&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Expensive requests&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Bots request resource-intensive pages - from static HTML to dynamic pages backed by costly database queries. These pages often can’t be cached or pre-rendered.&lt;/td&gt;
&lt;td&gt;Bots crawling every Git commit, blame, and history page on &lt;a href="https://drewdevault.com/2025/03/17/2025-03-17-Stop-externalizing-your-costs-on-me.html?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;SourceHut&lt;/a&gt;. Dynamic content on &lt;a href="https://techcrunch.com/2025/01/10/how-openais-bot-crushed-this-seven-person-companys-web-site-like-a-ddos-attack/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;online stores&lt;/a&gt; and &lt;a href="https://fabulous.systems/posts/2025/05/anubis-saved-our-websites-from-a-ddos-attack/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;wikis&lt;/a&gt; being repeatedly requested.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Large downloads&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Projects hosting big files - like ISOs or software archives - suffer when bots download at scale, straining bandwidth.&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://www.scrye.com/blogs/nirik/posts/2025/03/15/mid-march-infra-bits-2025/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;Fedora Linux&lt;/a&gt; mirrors overwhelmed by bot downloads. Open source projects struggle with abusive scraping of images, documentation, and archives. Even large vendors like Red Hat and Canonical have to manage these loads; smaller projects rely on limited infrastructure or donations.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Resource exhaustion&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Every request has a cost - whether for dynamic or static content. Bots can saturate compute, bandwidth, or memory limits, degrading service or creating DoS-like conditions.&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://mastodon.social/@AndresFreundTec/113868582630760229?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;LWN&lt;/a&gt; and &lt;a href="https://diff.wikimedia.org/2025/04/01/how-crawlers-impact-the-operations-of-the-wikimedia-projects/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;Wikipedia&lt;/a&gt; fighting traffic spikes. Brute-force login attempts on mail servers (e.g., &lt;a href="https://jan.wildeboer.net/2025/02/Blocking-Stealthy-Botnets/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;this case&lt;/a&gt;) seeking spam relays. Even generous hosts like Hetzner can’t handle infinite abuse; serverless (per request) pricing makes this even riskier.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Are AI bots worse?
&lt;/h2&gt;

&lt;p&gt;Many recent scraping incidents have been attributed to AI crawlers. Attribution is tricky - user agents and IPs are easy to spoof - but detailed logs and traffic patterns from open-source platforms strongly suggest AI bots are a major contributor:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;a href="https://diasporafoundation.org/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Diaspora&lt;/u&gt;&lt;/a&gt; open source web infrastructure &lt;a href="https://pod.geraspora.de/posts/3d473600a616013da02e268acd52edbf?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;traffic logs&lt;/u&gt;&lt;/a&gt; show 24% of traffic from &lt;a href="https://openai.com/gptbot?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;OpenAI’s GPTBot&lt;/u&gt;&lt;/a&gt; and 4.3% from Anthropic’s Claudebot. Around 16% come from &lt;a href="https://developer.amazon.com/support/amazonbot?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Amazonbot&lt;/u&gt;&lt;/a&gt;, although it’s not clear if that is for AI or not.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://about.readthedocs.com/blog/2024/07/ai-crawlers-abuse/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;ReadTheDocs posted examples&lt;/u&gt;&lt;/a&gt; of crawlers excessive download requests. The user agents weren’t listed, but applying Cloudflare’s AI crawler block list cut bandwidth by 75% - from 800GB/day to 200GB/day.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://thelibre.news/foss-infrastructure-is-under-attack-by-ai-companies/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Discussions around bot traffic&lt;/u&gt;&lt;/a&gt; from KDE’s GitLab instance suggested traffic from “Chinese AI companies” did not include proper user agent identification whereas traffic from “Western LLM operators” did.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As with everything, incentives matter. Site owners are happy to serve Googlebot because its reasonable behavior means it doesn’t cost (much) and the site gets traffic from searches in return. Win-win. If you don’t want that, it’s easy to restrict or block with &lt;code&gt;robots.txt&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Contrast this with AI where scraping is usually for training purposes with no guarantee that the source of that training data will ever be cited or receive traffic. Why would a site owner want to participate in this “trade”? They’re more likely to want to block the traffic, so the AI scrapers need to hide their identity.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wildcard: agents acting on behalf of humans
&lt;/h2&gt;

&lt;p&gt;Most site owners want human traffic, so the definition of good vs bad bots comes down to whether that automated traffic is acceptable or not. There’s a spectrum:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Automated API clients = good.&lt;/li&gt;
&lt;li&gt;Search engine indexing bots = usually good.&lt;/li&gt;
&lt;li&gt;AI crawlers = sometimes good or bad, depending on your philosophical stance.&lt;/li&gt;
&lt;li&gt;Scrapers = bad.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This becomes more challenging when you introduce AI agents acting on behalf of humans. The difficulty is nicely illustrated by &lt;a href="https://platform.openai.com/docs/bots/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;the type of bots OpenAI operates&lt;/u&gt;&lt;/a&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Bot&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;th&gt;Training?&lt;/th&gt;
&lt;th&gt;Citations?&lt;/th&gt;
&lt;th&gt;Identification&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://platform.openai.com/docs/bots?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;OAI-SearchBot&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Crawls sites to power ChatGPT’s search index.&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅ &lt;a href="https://www.linkedin.com/posts/zenorocha_chatgpt-is-now-the-top-3-source-of-traffic-activity-7329153728060538880-CeaA?utm_source=share&amp;amp;utm_medium=member_desktop" rel="noopener noreferrer"&gt;Drives referral traffic&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;Clear user agent. Site owners can verify traffic sources. Generally considered beneficial.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://platform.openai.com/docs/bots?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;ChatGPT-User&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Real-time bot for ChatGPT Q&amp;amp;A sessions - reads live content to summarize responses.&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;⚠️ Sometimes cited, sometimes not. Requires monitoring traffic to assess impact.&lt;/td&gt;
&lt;td&gt;Uses a dedicated user agent. Behavior is passive until invoked by a user prompt.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://openai.com/gptbot?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;GPTBot&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Crawler used to collect training data for foundation models.&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌ Provides no return value to the site owner.&lt;/td&gt;
&lt;td&gt;User agent is &lt;a href="https://openai.com/gptbot?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;documented&lt;/a&gt; and can be blocked via robots.txt. High bandwidth and content costs.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;em&gt;(Operator)&lt;/em&gt;&lt;/td&gt;
&lt;td&gt;Full browser agent (Chrome in a VM) used by OpenAI agents to interact with the web on user request.&lt;/td&gt;
&lt;td&gt;❓&lt;/td&gt;
&lt;td&gt;❓ Depends on use case - behaves like a human user.&lt;/td&gt;
&lt;td&gt;No public documentation on how to identify it. Mimics normal Chrome user traffic. Cannot be reliably blocked without false positives.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;As AI tools become more popular, simply blocking all AI bots is probably not what you should do. For example, allowing AI bots that act more like search engines such as OAI-SearchBot, means your site will receive traffic from users displaced from the traditional search engines.&lt;/p&gt;

&lt;p&gt;Distinguishing between different areas of your site is also important. You should allow search indexing of your content, but block automated bots from a signup page. This is what the &lt;code&gt;robots.txt&lt;/code&gt; is supposed to be for, but using a bot detection tool like &lt;a href="https://arcjet.com/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Arcjet&lt;/u&gt;&lt;/a&gt; with different rules for different pages of your site allows it to be enforced.&lt;/p&gt;

&lt;p&gt;Operators like OAI-SearchBot offer value (e.g., traffic, citations). Others, like GPTBot, provide no benefit and can impose high costs. Treat each agent class differently. Blocking all AI bots is a blunt instrument.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to detect and block bots
&lt;/h2&gt;

&lt;p&gt;The first step to detecting and managing bots is to create rules in your &lt;code&gt;robots.txt&lt;/code&gt; file. Good bots like Google will behave and follow these rules. It’s a good exercise to develop an understanding of how you want to control bots on your site. &lt;a href="https://developers.google.com/search/docs/crawling-indexing/robots/intro?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Use Google’s documentation&lt;/u&gt;&lt;/a&gt; to guide creating the rules.&lt;/p&gt;

&lt;p&gt;But we have to assume that the bad bots won’t follow these rules, so this is where we start to build layers of defenses. Start with low-cost signals (headers), then move to harder-to-spoof data (IP reputation, TLS/HTTP fingerprinting), and finally consider active challenges (CAPTCHAs, PoW).&lt;/p&gt;

&lt;h3&gt;
  
  
  Blocking user agents
&lt;/h3&gt;

&lt;p&gt;A surprising number of bad bots actually identify themselves with the user agent HTTP header. We often see requests identified as curl, python-urllib, or Go-http-client from simplistic scrapers that haven’t changed the default user agent. We track many hundreds of known user agents in our open source &lt;a href="https://github.com/arcjet/well-known-bots?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;well-known-bots project&lt;/u&gt;&lt;/a&gt; (forked from &lt;a href="https://github.com/monperrus/crawler-user-agents?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;crawler-user-agents&lt;/u&gt;&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;You can use open source projects like &lt;a href="https://github.com/omrilotan/isbot?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;isbot&lt;/u&gt;&lt;/a&gt; (&lt;a href="http://node.js/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Node.js&lt;/u&gt;&lt;/a&gt;), &lt;a href="https://nextjs.org/docs/app/api-reference/functions/userAgent?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;userAgent&lt;/u&gt;&lt;/a&gt; (&lt;a href="http://next.js/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Next.js&lt;/u&gt;&lt;/a&gt;), and &lt;a href="https://github.com/JayBizzle/Crawler-Detect?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;CrawlerDetect&lt;/u&gt;&lt;/a&gt; (PHP, &lt;a href="https://github.com/moskrc/CrawlerDetect?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;also available for Python&lt;/u&gt;&lt;/a&gt;) to write checks in your application web server or middleware.&lt;/p&gt;

&lt;p&gt;There are two obvious downsides to this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;New crawlers and bots are released all the time. As we saw above, OpenAI has at least 3 bots, plus its Operator agent, and Anthropic &lt;a href="https://www.404media.co/websites-are-blocking-the-wrong-ai-scrapers-because-ai-companies-keep-making-new-ones/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;has changed the name of its bot multiple times&lt;/u&gt;&lt;/a&gt;. Keeping up with the latest user agent variants is time consuming.&lt;/li&gt;
&lt;li&gt;Clients can set the user agent header to whatever they like and can pretend to be something else. The User-Agent header &lt;a href="https://www.rfc-editor.org/rfc/rfc9110?ref=blog.arcjet.com#section-10.1.5" rel="noopener noreferrer"&gt;&lt;u&gt;should be set for every HTTP request and it has a specific format&lt;/u&gt;&lt;/a&gt;, but there is no enforcement of this. If you want to allow GoogleBot, a bad bot could pretend to be Google by using the same user agent header.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://docs.arcjet.com/bot-protection/quick-start?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Arcjet’s rule configuration&lt;/u&gt;&lt;/a&gt; allows you to choose specific bots to allow or deny and use categories of common bots, which get regularly updated. This means bot protection rules can be granular and easy to understand. For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;const aj = arcjet({
  key: process.env.ARCJET_KEY!, // Get your site key from https://app.arcjet.com
  rules: [
    detectBot({
      mode: "LIVE",
      // Block all bots except the following
      allow: [
        "CATEGORY:SEARCH_ENGINE", // Google, Bing, etc
        // Uncomment to allow these other common bot categories
        // See the full list at https://arcjet.com/bot-list
        //"CATEGORY:MONITOR", // Uptime monitoring services
        //"CATEGORY:PREVIEW", // Link previews e.g. Slack, Discord
      ],
    }),
  ],
});

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Verifying user agents
&lt;/h3&gt;

&lt;p&gt;To mitigate spoofed requests where a client pretends to be a bot we want to allow, you need to verify the request. For example, if we see a request with a GoogleBot user agent, we need to verify that request is actually coming from Google.&lt;/p&gt;

&lt;p&gt;The big crawler operators all provide methods for verifying their bots e.g. &lt;a href="https://support.apple.com/en-gb/119829?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Applebot&lt;/u&gt;&lt;/a&gt;, &lt;a href="https://www.bing.com/webmasters/help/how-to-verify-bingbot-3905dc26?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Bing&lt;/u&gt;&lt;/a&gt;, &lt;a href="https://docs.datadoghq.com/synthetics/guide/identify_synthetics_bots/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Datadog&lt;/u&gt;&lt;/a&gt;, &lt;a href="https://developers.google.com/search/docs/crawling-indexing/verifying-googlebot?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Google&lt;/u&gt;&lt;/a&gt;, and &lt;a href="https://platform.openai.com/docs/bots?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;OpenAI&lt;/u&gt;&lt;/a&gt; all support verification. This is usually through a reverse DNS lookup to verify the source IP address belongs to the organization claimed in the user agent. Some provide a list of IPs which then need to be checked instead e.g. &lt;a href="https://developers.google.com/search/docs/crawling-indexing/verifying-googlebot?ref=blog.arcjet.com#automatic" rel="noopener noreferrer"&gt;&lt;u&gt;Google has a machine readable list of IPs&lt;/u&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example: Verifying Googlebot
&lt;/h3&gt;

&lt;p&gt;To check if a request is coming from a Google Crawler:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Run a reverse lookup on the client source IP address. For example, a request from 66.249.66.1 claiming to be Google using the host command on macOS or Linux. Check that the domain result is either googlebot.com, google.com, or &lt;a href="http://googleusercontent.com/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;googleusercontent.com&lt;/u&gt;&lt;/a&gt;:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;~ host 66.249.66.1
1.66.249.66.in-addr.arpa domain name pointer crawl-66-249-66-1.googlebot.com.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Run a forward DNS lookup on the domain returned and check that the IP address matches the original source IP address:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;~ host crawl-66-249-66-1.googlebot.com
crawl-66-249-66-1.googlebot.com has address 66.249.66.1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can see that the IP address matches the source IP, so we know that this request is actually coming from Google.&lt;/p&gt;

&lt;h3&gt;
  
  
  IP address reputation
&lt;/h3&gt;

&lt;p&gt;As you receive requests from a variety of IP addresses, you can build up a picture of what normal traffic looks like. This technique has been used to prevent email spam for decades - the same principles apply when analyzing suspicious web traffic. If an IP address has recently been associated with bot traffic then it’s more likely that a new request is also a bot.&lt;/p&gt;

&lt;p&gt;Various commercial databases exist from providers like &lt;a href="https://www.maxmind.com/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;MaxMind&lt;/u&gt;&lt;/a&gt;, &lt;a href="https://ipinfo.io/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;IPInfo&lt;/u&gt;&lt;/a&gt;, and &lt;a href="https://ipapi.is/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;IP API&lt;/u&gt;&lt;/a&gt;. They offer an API or downloadable database of IPs with associated metadata like location and fraud scoring.&lt;/p&gt;

&lt;p&gt;Looking up IP data like the network owner, IP address type, and geo-location all help to build a picture of whether the request is likely to be abusive or not. For example, requests coming from a data center or cloud provider are highly likely to be automated so you might want to block them from signup forms. &lt;a href="https://blog.cloudflare.com/radar-2024-year-in-review/?ref=blog.arcjet.com#the-united-states-was-responsible-for-over-a-third-of-global-bot-traffic-amazon-web-services-was-responsible-for-12-7-of-global-bot-traffic-and-7-8-came-from-google" rel="noopener noreferrer"&gt;&lt;u&gt;Cloudflare reported&lt;/u&gt;&lt;/a&gt; that AWS was responsible for 12.7% of global bot traffic in 2024 and Fedora Linux &lt;a href="https://www.scrye.com/blogs/nirik/posts/2025/03/15/mid-march-infra-bits-2025/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;was forced to block all traffic from Brazil&lt;/u&gt;&lt;/a&gt; during a period of high abuse.&lt;/p&gt;

&lt;p&gt;IP data isn’t perfect though. IP geo-location is notorious for inaccuracies, especially for IP addresses linked to mobile or satellite networks. Bot operators cycle through large numbers of IP addresses across disparate networks, and &lt;a href="https://jan.wildeboer.net/2025/02/Blocking-Stealthy-Botnets/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;are buying access to residential proxies&lt;/u&gt;&lt;/a&gt; to mask their requests.&lt;/p&gt;

&lt;p&gt;IP-based decisions often cause false positives, so blocking solely on IP reputation is risky. &lt;a href="https://docs.arcjet.com/bot-protection/concepts?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Arcjet’s bot detection&lt;/u&gt;&lt;/a&gt; provides all the signals back into your code so you can decide how to handle suspicious requests e.g. flagging an online order for human review rather than immediately accepting it.&lt;/p&gt;

&lt;p&gt;The usual approach is to trigger a challenge, like a CAPTCHA. Early versions relied on distorted text that was difficult for OCR (Optical Character Recognition) software to parse. More recent iterations include "no-CAPTCHA reCAPTCHA" (which analyzes user behavior like mouse movements before presenting a challenge) and invisible CAPTCHAs that work in the background or require showing “proof of work”.&lt;/p&gt;

&lt;h3&gt;
  
  
  Proof of work
&lt;/h3&gt;

&lt;p&gt;Requiring all clients to spend some compute time completing a proof of work challenge introduces a cost to every request.&lt;/p&gt;

&lt;p&gt;This idea isn’t new. Bill Gates famously &lt;a href="https://www.cnet.com/tech/tech-industry/gates-reveals-his-magic-solution-to-spam/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;announced a similar idea back in 2004&lt;/u&gt;&lt;/a&gt; in response to huge volumes of email spam. The theory is that an individual human browser can afford to spend a micro-amount of time solving a challenge, whereas it would make mass web crawling economically inefficient. The challenge difficulty could increase depending on how suspicious the request is, making it increasingly expensive for bots.&lt;/p&gt;

&lt;p&gt;There are several open source projects which implement this idea through a reverse proxy that will challenge all requests to your site: &lt;a href="https://github.com/TecharoHQ/anubis?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Anubis&lt;/u&gt;&lt;/a&gt;, &lt;a href="https://github.com/vaxerski/checkpoint?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Checkpoint&lt;/u&gt;&lt;/a&gt;, &lt;a href="https://git.gammaspectra.live/git/go-away?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;go-away&lt;/u&gt;&lt;/a&gt;, &lt;a href="https://zadzmo.org/code/nepenthes/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Nepenthes&lt;/u&gt;&lt;/a&gt;, &lt;a href="https://gitgud.io/fatchan/haproxy-protection/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;haproxy-protection&lt;/u&gt;&lt;/a&gt;, and &lt;a href="https://iocaine.madhouse-project.org/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Iocaine&lt;/u&gt;&lt;/a&gt; are all interesting implementations.&lt;/p&gt;

&lt;p&gt;AI crawlers are able to work around these though - &lt;a href="http://dx.doi.org/10.5281/zenodo.13318796?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;AI poisoning is a well-known technique, and many crawlers already evade such defenses&lt;/u&gt;&lt;/a&gt; - and there are downsides, &lt;a href="https://mastodon.social/@cks/114571090294492114?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;such as hurting accessibility&lt;/u&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The effectiveness of these types of proof of work or CAPTCHAs is an ongoing arms race. Modern AI can now solve many types of CAPTCHAs with increasing accuracy and speed. This means that while CAPTCHAs can deter simpler bots, determined attackers can bypass them, especially for high-value targets where the cost of solving the CAPTCHA is negligible compared to the potential profit. For example, paying a few cents (or even tens of dollars) to solve a challenge protecting sports or concert ticket purchases &lt;a href="https://behind.pretix.eu/2025/05/23/captchas-are-over/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;is worth it&lt;/u&gt;&lt;/a&gt; when the profits are in the hundreds of dollars.&lt;/p&gt;

&lt;h3&gt;
  
  
  HTTP message signatures
&lt;/h3&gt;

&lt;p&gt;Cloudflare &lt;a href="https://datatracker.ietf.org/doc/html/draft-meunier-web-bot-auth-architecture?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;has proposed a new signature verification technique&lt;/u&gt;&lt;/a&gt; where bots can identify themselves using request signing. Whilst there are some benefits like non-repudiation, it remains to be seen how this improves the existing approach to verifying bot IP addresses using reverse DNS.&lt;/p&gt;

&lt;p&gt;Whereas HTTP Message Signatures are focused on requests from automated bots, Apple introduced something similar with &lt;a href="https://developer.apple.com/news/?id=huqjyh7k&amp;amp;ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Private Access Tokens in 2022&lt;/u&gt;&lt;/a&gt; for browsers. Although &lt;a href="https://datatracker.ietf.org/doc/rfc9577/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;RFC 9577 (Privacy Pass HTTP Authentication Scheme)&lt;/u&gt;&lt;/a&gt; is progressing through the standardization process, it hasn’t had widespread adoption. It is &lt;a href="https://support.apple.com/en-us/102591?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;built into the Apple ecosystem&lt;/u&gt;&lt;/a&gt; and works in Safari, but no other browsers have adopted it (a &lt;a href="https://privacypass.github.io/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Chrome / Firefox extension&lt;/u&gt;&lt;/a&gt; is available).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F81q75rza9qefsqeesr79.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F81q75rza9qefsqeesr79.png" alt="Bot detection techniques for developers" width="800" height="253"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;HTTP Message Signatures for automated traffic Architecture.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  JA3/JA4 fingerprint
&lt;/h3&gt;

&lt;p&gt;The JA3 fingerprint &lt;a href="https://github.com/salesforce/ja3?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;was invented in 2017 at Salesforce&lt;/u&gt;&lt;/a&gt;. It’s based on hashing various characteristics of the SSL/TLS client negotiation metadata. The idea is that the same client will have the same fingerprint even if it is making requests across IP addresses and networks. JA3 has mostly been deprecated because of how easy it is to cause the hash to change just by making slight changes to network traffic e.g. reordering cipher suites. It has been &lt;a href="https://github.com/FoxIO-LLC/ja4?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;replaced by JA4&lt;/u&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The challenge with JA4 hashing is &lt;a href="https://github.com/FoxIO-LLC/ja4/blob/main/technical_details/README.md?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;how it's based on the TLS handshake metadata&lt;/u&gt;&lt;/a&gt;, such as the protocol version and number of ciphers. This is available if you run your own web servers, but not on modern platforms like Vercel, Netlify, and Fly.io because they run reverse proxy edge gateways for you (Vercel calculates the JA3 and JA4 fingerprints for you and adds headers with the data).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F280vs8cdzakxij41pw8c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F280vs8cdzakxij41pw8c.png" alt="Bot detection techniques for developers" width="800" height="450"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;JA4: TLS Client Fingerprint&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;An alternative is &lt;a href="https://github.com/FoxIO-LLC/ja4/blob/main/technical_details/JA4H.md?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;JA4H&lt;/u&gt;&lt;/a&gt; which is calculated based on HTTP request metadata, but this is a proprietary algorithm whereas JA4 is open source.&lt;/p&gt;

&lt;p&gt;When you have the hash there is still manual work to decide which ones to block just like deciding to block IP addresses. It is best combined with IP reputation when taking automated decisions so as to minimize the risk of false positives.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw1rgz6as3mmaty2ui6f2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw1rgz6as3mmaty2ui6f2.png" alt="Bot detection techniques for developers" width="800" height="450"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;JA4H: HTTP Client Fingerprint&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Rate limiting
&lt;/h3&gt;

&lt;p&gt;IP address based rate limiting is a basic solution which, like user agent blocking, can help with some of the simpler attacks. However, it is easily bypassed with rotating IP addresses. Using different characteristics to key the rate limit can help - a session or user ID is the best option if your site requires a login. &lt;/p&gt;

&lt;p&gt;Otherwise, using a fingerprint (like JA3/JA4) will help manage anonymous clients. Using compound keys that include features such as the IP address, path, and fingerprint as supported by &lt;a href="https://docs.arcjet.com/rate-limiting/configuration?ref=blog.arcjet.com#characteristics" rel="noopener noreferrer"&gt;&lt;u&gt;Arcjet’s rate limiting functionality&lt;/u&gt;&lt;/a&gt; can help create sophisticated protections.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusions
&lt;/h2&gt;

&lt;p&gt;No one technique is enough because &lt;a href="https://blog.arcjet.com/bot-detection-isnt-perfect/" rel="noopener noreferrer"&gt;&lt;u&gt;bot detection isn’t perfect&lt;/u&gt;&lt;/a&gt;. Instead, a robust defense strategy relies on a multi-layered approach. Starting with &lt;code&gt;robots.txt&lt;/code&gt; to guide well-behaved bots is a good first step, but it must be augmented by more assertive techniques. These include verifying user agents, leveraging IP address reputation data, employing TLS and HTTP fingerprinting like JA3/JA4, implementing intelligent rate limiting, and considering proof-of-work challenges or CAPTCHAs where appropriate.&lt;/p&gt;

&lt;p&gt;The key is to remain vigilant and adapt. Understanding the different types of bots, their evasion techniques, and the array of available countermeasures allows site owners to make informed decisions. Or use a product like &lt;a href="https://arcjet.com/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Arcjet&lt;/u&gt;&lt;/a&gt; which takes away a lot of the hassle and means the protections are built right into the logic of your application.&lt;/p&gt;

</description>
      <category>botdetection</category>
    </item>
    <item>
      <title>Low latency global routing with AWS Global Accelerator</title>
      <dc:creator>David Mytton</dc:creator>
      <pubDate>Tue, 27 May 2025 09:16:10 +0000</pubDate>
      <link>https://dev.to/arcjet/low-latency-global-routing-with-aws-global-accelerator-1n60</link>
      <guid>https://dev.to/arcjet/low-latency-global-routing-with-aws-global-accelerator-1n60</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1t5ohv9g45euiund1kp3.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1t5ohv9g45euiund1kp3.jpg" alt="Low latency global routing with AWS Global Accelerator" width="800" height="448"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://arcjet.com/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;Arcjet&lt;/a&gt; performs real-time security analysis in the critical path of API and authentication flows. That means latency isn’t just a consideration - it’s a core design constraint.&lt;/p&gt;

&lt;p&gt;To meet our end-to-end p50 latency SLA of 20–30ms: we deploy globally, use persistent HTTP/2 connections, and rely on AWS's network to ensure routing to the nearest healthy region.&lt;/p&gt;

&lt;p&gt;A core component of our architecture is &lt;a href="https://aws.amazon.com/global-accelerator/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;AWS Global Accelerator&lt;/a&gt;. This service routes traffic through a set of &lt;a href="https://en.wikipedia.org/wiki/Anycast?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;Anycast&lt;/a&gt; IPs via distributed AWS edge locations (points of presence) to the closest healthy AWS region, utilizing AWS's private global network for more stable and lower-latency pathways compared to the public internet.&lt;/p&gt;

&lt;p&gt;This post explains the details of how we use AWS Global Accelerator and other AWS services to achieve our SLA targets. While our context is delivering a low-overhead security product for developers, the principles and benefits discussed are applicable to any mission-critical, latency-sensitive global application.&lt;/p&gt;

&lt;h2&gt;
  
  
  The challenge - request analysis in the hot path
&lt;/h2&gt;

&lt;p&gt;Arcjet is delivered as an SDK so it can be tightly integrated into the logic of the application and benefit from the full request context. Security rules can be adjusted dynamically based on user and session characteristics, and the results can be incorporated into how the application behaves.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://blog.arcjet.com/minimizing-latency-for-humans-machines/" rel="noopener noreferrer"&gt;Everything is downstream of latency&lt;/a&gt;. That’s why Arcjet takes a local-first approach: rules are evaluated in-process via a WebAssembly module bundled in the SDK. Many decisions can complete within 1-2ms, but others require a network call, such as when our dynamic IP reputation database is used as one of the security signals.&lt;/p&gt;

&lt;p&gt;In cases where our cloud decision API is involved, we set ourselves a p50 response time goal of 20ms which leaves 5-10ms for the network round trip so that we can hit our &lt;a href="https://blog.arcjet.com/how-we-achieve-our-25ms-p95-response-time-sla/" rel="noopener noreferrer"&gt;end to end latency goal of 20-30ms&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This goal applies globally. Developers deploy their applications everywhere, so centralizing our API is not an option.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyisv3kpzuji95t7wmmyu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyisv3kpzuji95t7wmmyu.png" alt="Low latency global routing with AWS Global Accelerator" width="800" height="648"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Architecture diagram showing the request being processed by the Arcjet SDK within the application environment. An Arcjet API call may be used to enrich the decision. Read more about the Arcjet Architecture.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The solution - Global Accelerator + multi-region
&lt;/h2&gt;

&lt;p&gt;Arcjet’s cloud decision API is written in Go and provides both JSON and gRPC interfaces to support different environments, with gRPC preferred due to the more optimized Protocol Buffers message format. &lt;/p&gt;

&lt;p&gt;Our cloud decision API is containerized and deployed across multiple availability zones within each AWS region using AWS Elastic Kubernetes Service (EKS), fronted by an AWS Application Load Balancer (ALB). The ALB distributes incoming API traffic across our container instances; its cross-zone load balancing capability ensures that if one availability zone experiences an issue, traffic is automatically routed to instances in the other healthy zones within that region, enhancing resilience.&lt;/p&gt;

&lt;p&gt;From the developer’s perspective, Arcjet is a single endpoint. Behind the scenes, Global Accelerator ensures each request is routed to the closest healthy AWS region using latency-based routing and active health checks. This happens automatically - no configuration or region awareness required from the developer.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8ej1witjuda9ob9p6je3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8ej1witjuda9ob9p6je3.png" alt="Low latency global routing with AWS Global Accelerator" width="800" height="453"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;AWS Global Accelerator uses a global network of 119 Points of Presence in 94 cities across 51 countries (source).&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Application to edge to region
&lt;/h2&gt;

&lt;p&gt;Setting up a new connection is often the slowest part of communicating with our API. Establishing a TCP connection and performing a TLS handshake takes multiple round trips, adding many milliseconds to each new request. This can quickly eat into our 5-10ms network round trip budget and is particularly problematic in serverless environments where cold starts are common.&lt;/p&gt;

&lt;p&gt;To mitigate this our SDK establishes a persistent HTTP/2 connection to our API, allowing multiple requests to be multiplexed over a single, long-lived connection. Global Accelerator helps minimize the round trip time because the initial TLS handshake can be completed by the closest network edge, which is usually much closer than the AWS region serving the request.&lt;/p&gt;

&lt;p&gt;The network path between the edge location and the AWS region uses the AWS global network, which is much more optimized compared to routing over the public internet. Global Accelerator helps eliminate network jitter and unpredictable routing by using AWS’s private backbone between edge locations and regions. This stability is critical for maintaining consistent performance, especially in bursty serverless environments.&lt;/p&gt;

&lt;p&gt;We run in &amp;gt;10 AWS regions, launching more based on customer demand. Each regional deployment is completely independent, a key design choice for maximizing availability and fault isolation, orchestrated seamlessly by AWS Global Accelerator's intelligent routing. If traffic is re-routed then this happens within AWS’s network without requiring our SDK client to reconnect and avoiding a cold start.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance in practice
&lt;/h2&gt;

&lt;p&gt;We measure both internal processing latency and end-to-end cold start latency from the SDK’s perspective. Over the last 7 days, the median internal API response time across all regions was 12ms (p95: 20ms).&lt;/p&gt;

&lt;p&gt;In a full cold start scenario where a new connection is initiated by the Arcjet SDK, we recorded a p50 of 25ms from request initiation to response delivery - including 2ms for TCP and 6ms for TLS handshakes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusions
&lt;/h2&gt;

&lt;p&gt;Low latency isn’t just a feature - it’s fundamental to Arcjet’s design for real-time application security. By leveraging AWS’s global edge network and services like Global Accelerator, we offload the hardest parts of distributed networking and stay focused on building developer-first security features.&lt;/p&gt;

&lt;p&gt;If you're building latency-sensitive APIs, especially in a multi-region or serverless world, Global Accelerator can probably help.&lt;/p&gt;

</description>
      <category>networking</category>
      <category>cloud</category>
      <category>engineering</category>
    </item>
    <item>
      <title>Next.js middleware bypasses: How to tell if you were affected?</title>
      <dc:creator>David Mytton</dc:creator>
      <pubDate>Fri, 11 Apr 2025 15:11:48 +0000</pubDate>
      <link>https://dev.to/arcjet/nextjs-middleware-bypasses-how-to-tell-if-you-were-affected-1ep6</link>
      <guid>https://dev.to/arcjet/nextjs-middleware-bypasses-how-to-tell-if-you-were-affected-1ep6</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdqy8k4c5ei13injwx6oe.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdqy8k4c5ei13injwx6oe.jpg" alt="Next.js middleware bypasses: How to tell if you were affected?" width="800" height="448"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At &lt;a href="https://arcjet.com/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;Arcjet&lt;/a&gt;, we found the recent Next.js middleware bypass vulnerabilities (&lt;a href="https://vercel.com/blog/postmortem-on-next-js-middleware-bypass?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;CVE-2025-29927&lt;/a&gt; &amp;amp; &lt;a href="https://github.com/vercel/next.js/security/advisories/GHSA-7gfc-8cq8-jh5f?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;CVE‑2024‑51479&lt;/a&gt;) especially relevant - not only are we Next.js users ourselves, but we wanted to see how our own security SDK could help Next.js users manage incident response.&lt;/p&gt;

&lt;p&gt;Authorization bypasses rank among the most critical security threats because they allow attackers to enter areas of an application that should remain off-limits. Given that Next.js is one of the most popular JavaScript frameworks and using middleware for authentication is a common pattern, there is a high chance of users being affected these vulnerabilities.&lt;/p&gt;

&lt;p&gt;In this post, we’ll examine both of these vulnerabilities and walk through how to identify suspicious patterns to determine if you were affected.&lt;/p&gt;

&lt;p&gt;For those already using Arcjet, we’ll also discuss how you can leverage our platform to confirm whether your applications were exploited. Our security SDK integrates with your application’s middleware (or directly inside routes), providing real-time inspection of every incoming HTTP request. This gives us the ability to search for attack signatures in past requests.&lt;/p&gt;

&lt;h2&gt;
  
  
  CVE-2025-29927: Next.js Middleware bypass via &lt;code&gt;x‑middleware‑subrequest&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/vercel/next.js/security/advisories/GHSA-f82v-jwr5-mffw?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;GHSA-f82v-jwr5-mffw&lt;/a&gt; (&lt;a href="https://vercel.com/blog/postmortem-on-next-js-middleware-bypass?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;CVE-2025-29927&lt;/a&gt;) is a critical Next.js vulnerability that lets attackers bypass middleware authorization checks by providing a crafted &lt;code&gt;x‑middleware‑subrequest&lt;/code&gt; header. It affects all Next.js versions released after 11.1.4, with patches now available in 12.3.5, 13.5.9, 14.2.25, and 15.2.3.&lt;/p&gt;

&lt;p&gt;While outright blocking &lt;code&gt;x‑middleware‑subrequest&lt;/code&gt; can stop the exploit, many services legitimately rely on the header for internal requests, as Cloudflare discovered when they attempted a blanket block that &lt;a href="https://x.com/elithrar/status/1903526240847331362?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;had to be rolled back&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Any mitigation that drops this header risks breaking valid functionality, underscoring the need for a more targeted fix, and highlighting a classic problem with generic, network-level WAFs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Was I affected?
&lt;/h3&gt;

&lt;p&gt;Once you apply the patch (or block requests with the header), the question turns to whether or not you were affected. Because the vulnerability dates back to code introduced in 2022 (Next.js &amp;gt; 11.1.4), it’s crucial to search historical logs for requests containing a suspicious &lt;code&gt;x‑middleware‑subrequest&lt;/code&gt; header.&lt;/p&gt;

&lt;p&gt;Even though CVE‑2025‑29927 was only recently disclosed, there was a lag in releasing patches (and then announcing the vulnerability). Patch adoption takes time and real-world exploitation has been possible for several years - zero-day vulnerabilities often circulate in underground markets long before they’re officially revealed. This means it's possible to have been affected for some time.&lt;/p&gt;

&lt;h3&gt;
  
  
  What to look for in logs
&lt;/h3&gt;

&lt;p&gt;The exploit signature for CVE-2025-29927 is very simple and depends on the version of Next.js you’re using. As explained in &lt;a href="https://zhero-web-sec.github.io/research-and-things/nextjs-and-the-corrupt-middleware?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;the disclosure writeup&lt;/a&gt;, we can look for specific payloads:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Next.js prior to 12.2&lt;/strong&gt; : requests to known sensitive paths such as &lt;code&gt;/admin&lt;/code&gt; where the &lt;code&gt;x-middleware-subrequest&lt;/code&gt; header contains &lt;code&gt;pages/_middleware&lt;/code&gt; or &lt;code&gt;pages/admin/_middleware&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Next.js 12.2 onwards&lt;/strong&gt; : requests with a &lt;code&gt;x-middleware-subrequest&lt;/code&gt; header containing &lt;code&gt;middleware&lt;/code&gt; or &lt;code&gt;src/middleware&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Next.js 15 onwards&lt;/strong&gt; : requests with repeated patterns, due to the introduction of a &lt;code&gt;MAX_RECURSION_DEPTH&lt;/code&gt; constant set to &lt;code&gt;5&lt;/code&gt; e.g. requests with a &lt;code&gt;x-middleware-subrequest&lt;/code&gt; header containing &lt;code&gt;middleware:middleware:middleware:middleware:middleware&lt;/code&gt; or &lt;code&gt;src/middleware:src/middleware:src/middleware:src/middleware:src/middleware&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Any requests matching these signatures could indicate compromise.&lt;/p&gt;

&lt;h2&gt;
  
  
  CVE-2024-51479: Next.js middleware pathname-based authorization bypass
&lt;/h2&gt;

&lt;p&gt;Looking further back, another critical middleware flaw, &lt;a href="https://github.com/vercel/next.js/security/advisories/GHSA-7gfc-8cq8-jh5f?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;GHSA-7gfc-8cq8-jh5f&lt;/a&gt; / CVE-2024-51479, emerged in December 2024. Affecting versions from 9.5.5 up to 14.2.14, Next.js was patched in 14.2.15.&lt;/p&gt;

&lt;p&gt;This issue occurs when the middleware’s authorization logic depends on the pathname: the middleware inadvertently allows access to the top‑level route (&lt;code&gt;/admin&lt;/code&gt;) even when sub-paths like &lt;code&gt;/admin/users&lt;/code&gt; are correctly matched.&lt;/p&gt;

&lt;h3&gt;
  
  
  Was I affected?
&lt;/h3&gt;

&lt;p&gt;In the snippet below (adapted from &lt;a href="https://www.herodevs.com/vulnerability-directory/cve-2024-51479?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;Herodevs&lt;/a&gt;), all requests to paths under &lt;code&gt;/admin&lt;/code&gt; with the missing &lt;code&gt;authorization&lt;/code&gt; header should be blocked.&lt;/p&gt;

&lt;p&gt;However, this vulnerability meant that while requests to &lt;code&gt;/admin/users&lt;/code&gt; are denied, requests to &lt;code&gt;/admin&lt;/code&gt; would be allowed. If your middleware had similar logic, any sensitive pages at top-level URLs could be accessed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import { NextResponse } from 'next/server';

export function middleware(request) {
  const { pathname } = request.nextUrl;
  // Simulate authorization check: block access to /admin unless a condition is met
  if (pathname.startsWith('/admin') &amp;amp;&amp;amp; !request.headers.get('authorization')) {
    return new Response('Unauthorized', { status: 401 });
  }
  return NextResponse.next();
}

export const config = {
  matcher: ['/admin/:path*', '/admin'],
};

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  What to look for in logs
&lt;/h3&gt;

&lt;p&gt;This vulnerability hinges on two factors:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A request to a sensitive top-level path (e.g. &lt;code&gt;/admin&lt;/code&gt;) where the middleware is checking for the pathname. Only the exact root path is exposed, not its sub-routes (like &lt;code&gt;/admin/users&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;An authorization check in middleware alone. In the example, we rely on a header named &lt;code&gt;authorization&lt;/code&gt; (plus the pathname check above) but it could be any other header.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;As with CVE‑2025‑29927, any application that relied solely on middleware for authorization checks would have been vulnerable. If you run the check (or additional checks) in the routes themselves then you would have had another layer of security. Middleware is good for running initial checks and redirecting users to an authentication flow, it is a good idea to perform further authorization checks in each route.&lt;/p&gt;

&lt;p&gt;To confirm whether you were exposed, find any requests to top-level protected paths (e.g., &lt;code&gt;/admin&lt;/code&gt;) that lacked the expected authorization header (like &lt;code&gt;authorization&lt;/code&gt; or &lt;code&gt;x-auth-token&lt;/code&gt;). If those requests bypassed your middleware without triggering an error, you were likely vulnerable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Request lookup using Arcjet
&lt;/h2&gt;

&lt;p&gt;When conducting an incident analysis, it’s important to be able to answer questions like: was I affected? How long was I affected for? What data was accessed? The answers to these determine what kind of legal disclosures you need to make and how you inform customers.&lt;/p&gt;

&lt;p&gt;If you don’t have request logs then how do you know if you were impacted?&lt;/p&gt;

&lt;p&gt;Arcjet analyzes and reports granular request metadata such as paths and headers (though sensitive headers like &lt;code&gt;authorization&lt;/code&gt; are redacted and the request body is never sent outside your environment). By installing Arcjet, if (when) a new vulnerability surfaces you’ll have detailed request history to confirm potential exploit attempts. Request logs are available to all Arcjet users, although retention time depends on your pricing plan.&lt;/p&gt;

&lt;p&gt;If you need help with determining if you are affected (or have ever been affected) by these or other vulnerabilities, &lt;a href="https://docs.arcjet.com/support?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;reach out to our support team&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>nextjs</category>
      <category>security</category>
    </item>
    <item>
      <title>Secure local Node.js dev servers with OrbStack</title>
      <dc:creator>David Mytton</dc:creator>
      <pubDate>Fri, 07 Feb 2025 15:05:03 +0000</pubDate>
      <link>https://dev.to/arcjet/secure-local-nodejs-dev-servers-with-orbstack-10lo</link>
      <guid>https://dev.to/arcjet/secure-local-nodejs-dev-servers-with-orbstack-10lo</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpks11a0y3ilvckwrqdaw.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpks11a0y3ilvckwrqdaw.jpg" alt="Secure local Node.js dev servers with OrbStack" width="800" height="448"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At &lt;a href="https://arcjet.com/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;Arcjet&lt;/a&gt;, we use &lt;a href="https://orbstack.dev/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;OrbStack&lt;/a&gt; to manage our local development environment. We run various containers in production, so we use Docker Compose to mirror those services locally. This includes our &lt;a href="https://blog.arcjet.com/how-we-achieve-our-25ms-p95-response-time-sla/" rel="noopener noreferrer"&gt;low-latency security decision API&lt;/a&gt; used by our security as code SDKs, the dashboard webapp, website, docs, and backend processing components.&lt;/p&gt;

&lt;p&gt;OrbStack is a feature-rich alternative to (but compatible with) Docker Desktop. It has a significantly better UI, is much more performant, and comes with powerful features like &lt;a href="https://docs.orbstack.dev/docker/domains?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Custom Domains&lt;/u&gt;&lt;/a&gt; and &lt;a href="https://orbstack.dev/blog/orbstack-1.1-https?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Automatic HTTPS&lt;/u&gt;&lt;/a&gt;. These allow us to mirror our production environment very closely, particularly ensuring we have full SSL configured locally so we can properly test things like &lt;a href="https://blog.arcjet.com/nosecone-a-library-for-setting-security-headers-in-next-js-sveltekit-node-js-bun-and-deno/" rel="noopener noreferrer"&gt;&lt;u&gt;security headers&lt;/u&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Recently, &lt;a href="https://docs.orbstack.dev/release-notes?ref=blog.arcjet.com#v1-9-0" rel="noopener noreferrer"&gt;&lt;u&gt;OrbStack 1.9&lt;/u&gt;&lt;/a&gt; made improvements so SSL certificates are now trusted between containers. I decided to spend some time improving our development experience by replacing the self-signed certificates we previously used between containers with this new OrbStack functionality.&lt;/p&gt;

&lt;h2&gt;
  
  
  Working… Sometimes
&lt;/h2&gt;

&lt;p&gt;The first change I made was removing a custom HTTP transport workaround to allow the local certificates between our services written in Go. This was very straightforward and no other changes were needed—the Go services trusted the certificate and seamlessly communicated via HTTPS. This allowed me to remove the self-signed certificates generated by &lt;a href="https://github.com/FiloSottile/mkcert?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;mkcert&lt;/u&gt;&lt;/a&gt; and remove a setup step for our development environment.&lt;/p&gt;

&lt;p&gt;However, whenever I tried to make the same changes for our Node.js services, they would fail with an error: &lt;code&gt;self-signed certificate in certificate chain&lt;/code&gt;. This didn’t make sense to me because OrbStack claimed that these certificates were trusted between containers and the error went away in Go code with the 1.9 release.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bundled certificates
&lt;/h2&gt;

&lt;p&gt;In scouring the Node.js documentation, I discovered the &lt;a href="https://nodejs.org/api/cli.html?ref=blog.arcjet.com#--use-bundled-ca---use-openssl-ca" rel="noopener noreferrer"&gt;&lt;u&gt;&lt;code&gt;--use-openssl-ca&lt;/code&gt;&lt;/u&gt;&lt;/a&gt; command-line flag. The documentation explicitly calls out:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The bundled CA store, as supplied by Node.js, is a snapshot of Mozilla CA store that is fixed at release time. It is identical on all supported platforms.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This means that Node.js bundles a static snapshot of &lt;a href="https://wiki.mozilla.org/CA?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Mozilla’s Certificate Authority&lt;/u&gt;&lt;/a&gt; which it uses by default to validate certificates. Once I understood this, it makes sense that each Node.js service still flagged the OrbStack certificates would be untrusted—they aren’t included in Mozilla’s CA!&lt;/p&gt;

&lt;p&gt;Luckily, Node.js accepts the &lt;code&gt;--use-openssl-ca&lt;/code&gt; flag to opt-out of the bundled CA and instead rely on OpenSSL for CA. The documentation even explains that changes to the CA require this flag:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Using OpenSSL store allows for external modifications of the store. For most Linux and BSD distributions, this store is maintained by the distribution maintainers and system administrators. OpenSSL CA store location is dependent on configuration of the OpenSSL library but this can be altered at runtime using environment variables.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Custom OpenSSL CA store location in a container
&lt;/h2&gt;

&lt;p&gt;We want this to apply to any instance of the &lt;code&gt;node&lt;/code&gt; command run inside our containers without needing to specify the flag each time. We can apply this globally by setting the &lt;code&gt;NODE_OPTIONS&lt;/code&gt; environment variable in our container. I set this in our &lt;code&gt;docker-compose.yml&lt;/code&gt; file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;node-service:
  build:
    context: .
    dockerfile: services/node-service/Dockerfile
  environment:
    - NODE_OPTIONS="--use-openssl-ca"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Root certificate
&lt;/h2&gt;

&lt;p&gt;Even with this option set, communication from our Node.js services was still failing. It &lt;a href="https://github.com/orbstack/orbstack/issues/1159?ref=blog.arcjet.com#issuecomment-2482646337" rel="noopener noreferrer"&gt;&lt;u&gt;turns out&lt;/u&gt;&lt;/a&gt; that OrbStack mounts the root certificate at &lt;code&gt;/usr/local/share/ca-certificates/orbstack-root.crt&lt;/code&gt; but the OpenSSL CA doesn’t know about it.&lt;/p&gt;

&lt;p&gt;To make Node.js aware of the root certificate, we need to set the &lt;a href="https://nodejs.org/api/cli.html?ref=blog.arcjet.com#ssl_cert_filefile" rel="noopener noreferrer"&gt;&lt;u&gt;&lt;code&gt;SSL_CERT_FILE&lt;/code&gt;&lt;/u&gt;&lt;/a&gt; environment variable. I updated our &lt;code&gt;docker-compose.yml&lt;/code&gt; file with this variable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;node-service:
  build:
    context: .
    dockerfile: services/node-service/Dockerfile
  environment:
    - NODE_OPTIONS="--use-openssl-ca"
    - SSL_CERT_FILE=/usr/local/share/ca-certificates/orbstack-root.crt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Perhaps we could apply some other commands to avoid setting the &lt;code&gt;SSL_CERT_FILE&lt;/code&gt; variable, but I wasn’t sure which magic incantation to apply.&lt;/p&gt;

&lt;h2&gt;
  
  
  OpenSSL headers are not in slim containers
&lt;/h2&gt;

&lt;p&gt;Be aware of your base containers when applying this technique. Using a &lt;code&gt;*-slim&lt;/code&gt; container will fail with &lt;code&gt;ERR_SSL_WRONG_VERSION_NUMBER&lt;/code&gt; because the OpenSSL headers are removed to slim down the container. Instead, use the full container image or reinstall the OpenSSL headers that were removed.&lt;/p&gt;

&lt;h2&gt;
  
  
  HTTPS dev servers
&lt;/h2&gt;

&lt;p&gt;With all of the above applied, we have seamless HTTPS communication in our entire development stack—our browser communicates over HTTPS with our applications which communicate over HTTPS to various other services.&lt;/p&gt;

&lt;p&gt;Development certificates have always frustrated me, so I am delighted that none of this required manually generating a certificate or committing a development certificate into the repository. Finally, we have a streamlined onboarding process for our development environment with full HTTPS support!&lt;/p&gt;

&lt;h2&gt;
  
  
  Bonus: Vite
&lt;/h2&gt;

&lt;p&gt;Recently, Vite &lt;a href="https://github.com/vitejs/vite/pull/19234?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;fixed&lt;/u&gt;&lt;/a&gt; a &lt;a href="https://github.com/advisories/GHSA-vg6x-rcgg-rjx6?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;CVE&lt;/u&gt;&lt;/a&gt; that would allow a malicious page to interact with the dev server. Their solution (more or less) was to disallow anything other than the localhost domain from communicating with the dev server.&lt;/p&gt;

&lt;p&gt;This is a problem with the &lt;a href="https://docs.orbstack.dev/docker/domains?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;OrbStack Custom Domains feature&lt;/u&gt;&lt;/a&gt; because you are accessing the Vite dev server via a domain such as &lt;a href="https://modest_bhaskara.orb.local/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;https://modest_bhaskara.orb.local&lt;/u&gt;&lt;/a&gt; instead of &lt;code&gt;localhost&lt;/code&gt;. You can solve this by setting your custom domain in &lt;a href="https://vite.dev/config/server-options.html?ref=blog.arcjet.com#server-allowedhosts" rel="noopener noreferrer"&gt;&lt;u&gt;&lt;code&gt;server.allowedHosts&lt;/code&gt;&lt;/u&gt;&lt;/a&gt; in your &lt;code&gt;vite.config.js&lt;/code&gt; file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;export default {
  server: {
    allowedHosts: ["modest_bhaskara.orb.local"]
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Unfortunately, OrbStack doesn’t provide this value inside the container. If it were available as an environment variable, we could specify it as a generalized &lt;code&gt;process.env&lt;/code&gt; property access. However, the custom domain names are based on the container name or &lt;a href="https://docs.orbstack.dev/docker/domains?ref=blog.arcjet.com#custom" rel="noopener noreferrer"&gt;&lt;u&gt;customizable with labels&lt;/u&gt;&lt;/a&gt;, so we know what the domain will be for each container.&lt;/p&gt;

&lt;p&gt;It’s great to see these regular improvements to OrbStack, which is a core part of our development setup. It means we can remove dev-only workarounds so our development environment is as close as possible to mirroring production.&lt;/p&gt;

</description>
      <category>engineering</category>
    </item>
    <item>
      <title>Does Next.js need a WAF?</title>
      <dc:creator>David Mytton</dc:creator>
      <pubDate>Thu, 16 Jan 2025 12:05:11 +0000</pubDate>
      <link>https://dev.to/arcjet/does-nextjs-need-a-waf-13ab</link>
      <guid>https://dev.to/arcjet/does-nextjs-need-a-waf-13ab</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F34euds47xqptx845i7cb.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F34euds47xqptx845i7cb.jpg" alt="Does Next.js need a WAF?" width="800" height="448"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The fact that the developers of Next.js at Vercel enable their &lt;a href="https://vercel.com/docs/security/vercel-waf?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Web Application Firewal&lt;/u&gt;l&lt;/a&gt; by default free for all accounts, suggests that yes, Next.js needs a WAF!&lt;/p&gt;

&lt;p&gt;Throwing a network-level WAF in front of your application is an easy way to defend against attacks. Next.js is no different from any other web application in that sense. Although there are some out of the box protections in React from things like Cross Site Scripting (XSS), Next.js itself has had past vulnerabilities and unsafe coding can introduce others. You still need to follow a &lt;a href="https://blog.arcjet.com/next-js-security-checklist/" rel="noopener noreferrer"&gt;&lt;u&gt;security checklist for Next.js&lt;/u&gt;&lt;/a&gt;&lt;u&gt;.&lt;/u&gt;&lt;/p&gt;

&lt;p&gt;But there are tradeoffs from using a WAF. Analyzing every request adds latency and different areas of your application may be higher risk than others. Arcjet tackles this differently through our &lt;a href="https://docs.arcjet.com/shield/quick-start?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Shield WAF&lt;/u&gt;&lt;/a&gt; feature - we still analyze requests for common threats, but do so in the background. This minimizes latency and helps avoid false positives because multiple requests can be analyzed.&lt;/p&gt;

&lt;p&gt;Arcjet is also more closely integrated with your code so you can dynamically adjust your application logic depending on the response. You might want to instantly block suspicious requests from anonymous users, but if you know the user is signed into an enterprise account and completed 2FA, you may just want to flag the request or trigger an alert.&lt;/p&gt;

&lt;p&gt;Network level proxy firewalls are a legacy architecture that doesn’t work well with modern applications. Choosing whether to add a WAF is now a more interesting discussion because it can be more deeply integrated with your code.&lt;/p&gt;

&lt;h2&gt;
  
  
  How can a WAF protect Next.js?
&lt;/h2&gt;

&lt;p&gt;A WAF can help defend against attacks by scanning all incoming requests, looking for known attack patterns, and proactively blocking suspicious requests.&lt;/p&gt;

&lt;p&gt;Arcjet includes Shield WAF as one of the features you can enable in our SDK. Through processing millions of requests, we see two main types of attacks in our WAF logs:&lt;/p&gt;

&lt;h3&gt;
  
  
  Passive scanning attacks
&lt;/h3&gt;

&lt;p&gt;A lot of malicious web traffic is from passive scanners. This is usually high volume, low sophistication. For example, we see a lot of scanning for Wordpress config files and Windows remote code execution even though the application is clearly a Next.js app hosted on Vercel. &lt;/p&gt;

&lt;p&gt;We also see requests trying to take advantage of configuration mistakes, such as &lt;code&gt;.env&lt;/code&gt; or &lt;code&gt;.git&lt;/code&gt; files accidentally deployed, or plain text configuration files left behind.&lt;/p&gt;

&lt;p&gt;If you’re using Next.js deployed to a modern platform like Vercel, Render, Railway, or Fly.io, then these requests are a minor annoyance. They can be easily blocked to reduce your infrastructure costs, but likely don’t represent much of a threat.&lt;/p&gt;

&lt;p&gt;More concerning are targeted, passive attacks, such as scanners enumerating your forms, login routes, and API endpoints for SQL injection or other input-based attacks. These rarely succeed on the first request because the attacker is really enumerating your application opportunistically to see if something strange happens. A WAF can detect those attempts and proactively block the client before the exploration discovers something.&lt;/p&gt;

&lt;p&gt;For these types of attacks, a WAF in front of Next.js is a nice bonus. It provides another layer of protection and can even reduce infrastructure costs because requests are blocked before your main code executes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Active direct attacks
&lt;/h3&gt;

&lt;p&gt;Like all popular software, Next.js has had past security vulnerabilities. Any dependencies you use may have security issues and your own code and contain mistakes, such as incorrect input validation leading to &lt;a href="https://owasp.org/Top10/A03_2021-Injection/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Cross Site Scripting, SQL Injection&lt;/u&gt;&lt;/a&gt;, or &lt;a href="https://owasp.org/Top10/A10_2021-Server-Side_Request_Forgery_%28SSRF%29/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Server Side Request Forgery&lt;/u&gt;&lt;/a&gt; type issues.&lt;/p&gt;

&lt;p&gt;These can be detected by more sophisticated scanning - looking for specific vulnerabilities because you know that the site is using Next.js.&lt;/p&gt;

&lt;p&gt;For example, &lt;a href="https://github.com/advisories/GHSA-fr5h-rqp8-mj6g?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;CVE-2024-34351&lt;/u&gt;&lt;/a&gt; affected Next.js &amp;lt;14.1.1 where a specific Host header could result in a Server Side Request Forgery attack with Server Actions. This only affected self-hosted installations of Next.js due to how Vercel works.&lt;/p&gt;

&lt;p&gt;Another issue from 2024 was &lt;a href="https://github.com/advisories/GHSA-7gfc-8cq8-jh5f?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;CVE-2024-51479&lt;/u&gt;&lt;/a&gt; where a middleware authorization bypass in Next.js &amp;lt;14.2.15 affected Next.js hosted everywhere. Vercel deployed mitigations for all applications on their platform, but others were vulnerable to attack.&lt;/p&gt;

&lt;p&gt;In this case, the answer is that using a WAF as a security layer in front of Next.js is needed to help detect emerging vulnerabilities. These attacks all have signatures which can be detected and blocked if the WAF is regularly updated. This is important because you might not be able to update to a new version immediately, so having the protections applied through a managed WAF ruleset will mitigate any active attacks. This is where a WAF can really make a difference.&lt;/p&gt;

&lt;h2&gt;
  
  
  PCI v4 makes WAFs mandatory
&lt;/h2&gt;

&lt;p&gt;If you are running any service that needs to comply with the PCI payments processing standards then you too will soon &lt;strong&gt;need&lt;/strong&gt; to enable a WAF in front of Next.js. &lt;/p&gt;

&lt;p&gt;On March 31, 2025, &lt;a href="https://east.pcisecuritystandards.org/document_library?category=pcidss&amp;amp;document=pci_dss&amp;amp;ref=blog.arcjet.com" rel="noopener noreferrer"&gt;PCI DSS 4.0&lt;/a&gt; will take effect. This includes changes that mean a WAF moves from recommended (in PCI DSS 3) to required. Section 6.4.2 says:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;For public-facing web applications, an automated technical solution is deployed that continually detects and prevents web-based attacks&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Using a WAF is given as a specific example:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A web application firewall (WAF), which can be either on-premise or cloud-based, installed in front of public-facing web applications to check all traffic, is an example of an automated technical solution that detects and prevents web-based attacks&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Making a WAF for Next.js
&lt;/h2&gt;

&lt;p&gt;The big downside of network-level WAFs is they are generic. They have no understanding of your application so can’t customize rules based on things like tech stack (no point applying PHP detections for a Next.js application), the routes (should this endpoint return JSON or is it an HTML response?), or the session context (is a user logged into a paying account vs an anonymous user on your website?).&lt;/p&gt;

&lt;p&gt;This is why we’ve designed the &lt;a href="https://docs.arcjet.com/shield/quick-start?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Arcjet Shield WAF&lt;/u&gt;&lt;/a&gt; to run as part of our native SDK. When you add a Shield rule, we analyze the request in the background with rules customized based on an understanding of the tech stack being protected.&lt;/p&gt;

&lt;p&gt;As requests are processed, suspicious activity is monitored until a certain threshold is reached. This helps avoid false positives where legitimate traffic accidentally triggers a rule. &lt;/p&gt;

&lt;p&gt;We see this regularly when applying the rules from the open source &lt;a href="https://github.com/coreruleset/coreruleset?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;OWASP Core Ruleset&lt;/u&gt;&lt;/a&gt;, which is integrated into Arcjet Shield for Pro plan users. This is a great repository of mitigations against common attacks, but has to be broad to cover all types of web applications.&lt;/p&gt;

&lt;p&gt;We regularly monitor the results of Shield analysis and apply customizations based on our understanding of the type of traffic we expect to see. We already know which SDK and framework you’re using, so we can be much more targeted with the rules that get applied.&lt;/p&gt;

&lt;p&gt;And of course since Arcjet rules are all just code, and you can see the reasons behind our decisions, your application can adjust in real time. The flexibility is there to decide what’s best for your situation.&lt;/p&gt;

&lt;p&gt;Arcjet Shield is free for all users and the managed OWASP Core Ruleset is applied to Pro &amp;amp; Enterprise accounts. Enable it with just a few lines of code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;const aj = arcjet({
  key: process.env.ARCJET_KEY!,
  rules: [
    // Shield protects your app from common attacks e.g. SQL injection
    // DRY_RUN mode logs only. Use "LIVE" to block
    shield({
      mode: "DRY_RUN",
    }),
  ],
});
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  So do you need a WAF for Next.js?
&lt;/h2&gt;

&lt;p&gt;If you need to be PCI compliant, then from March 31, 2025 the answer is yes; you need a WAF for Next.js.&lt;/p&gt;

&lt;p&gt;For everyone else, enabling a WAF will give you another layer of security. Defense in depth is important because no layer can ever be 100% secure, so you probably should put a WAF in front of Next.js.&lt;/p&gt;

</description>
      <category>nextjs</category>
      <category>security</category>
    </item>
    <item>
      <title>Test security rules without breaking production: Arcjet's DRY_RUN mode</title>
      <dc:creator>David Mytton</dc:creator>
      <pubDate>Tue, 07 Jan 2025 09:36:04 +0000</pubDate>
      <link>https://dev.to/arcjet/test-security-rules-without-breaking-production-arcjets-dryrun-mode-5cpk</link>
      <guid>https://dev.to/arcjet/test-security-rules-without-breaking-production-arcjets-dryrun-mode-5cpk</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh2qn93wqiha3r0vohpcq.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh2qn93wqiha3r0vohpcq.jpg" alt="Test security rules without breaking production: Arcjet's DRY_RUN mode" width="800" height="448"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Picture this: it’s well into the evening in the office, and you sit at your computer, moments away from altering the security configurations of your company’s critical software. You were urgently asked to tighten some things up, but right now the only thing on your mind is receiving an emergency call as soon as you’re about to go to bed. Who knew making changes could be this stressful?&lt;/p&gt;

&lt;p&gt;With the right tools, security changes don’t have to be this intimidating. It’s fear of the unknown that is the biggest cause for hesitation. The best thing you can do is to build confidence in your changes through a data-driven approach to the implementation - an approach that uses &lt;strong&gt;real environments&lt;/strong&gt; , &lt;strong&gt;detailed context&lt;/strong&gt; , and &lt;strong&gt;live activity&lt;/strong&gt; to build evidence that your change will not be disruptive.&lt;/p&gt;

&lt;p&gt;In this article, we’ll cover the challenges of configuring legacy WAFs and CDN security services, and how &lt;a href="https://arcjet.com/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;Arcjet&lt;/a&gt; takes a different approach that takes the nerves out of security rule changes.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Old Approach: &lt;em&gt;Cumbersome logging&lt;/em&gt;
&lt;/h2&gt;

&lt;p&gt;When it comes to building confidence in our changes, traditional CDN security tools have limited options. Typically, you can turn on WAF logging and evaluate potential rule changes against those logs, but there are a few of downsides to this approach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The WAF is a separate system, meaning different configuration and unusual log formats. This level of logging is usually something that needs to be enabled and may not even be included as a feature of the product available by default.&lt;/li&gt;
&lt;li&gt;Using a separate system means finding where the logs are and figuring out how to query them. Then when you get a result, you need to understand how to correlate it to the requests that triggered the log entries.&lt;/li&gt;
&lt;li&gt;Logging is often billed separately and by volume, so if you can only test in production your log volume may suddenly explode.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With so much context switching, it’s easy for something to slip through the cracks.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Old Approach: &lt;em&gt;Cross-environment limitations&lt;/em&gt;
&lt;/h2&gt;

&lt;p&gt;Another way to build confidence in a change is to test it in a non-production environment first, but this is still a challenge for traditional web application security tools. Traditional WAFs operate as a reverse proxy and are usually only deployed to production and never in the development environment.&lt;/p&gt;

&lt;p&gt;The software development lifecycle begins on a developer’s workstation, but there isn’t a realistic way to test WAF rules on a workstation. You typically need to wait until you have deployed your application to a dedicated testing environment.&lt;/p&gt;

&lt;p&gt;And even if you do have security set up in your testing environment, it can be cumbersome to manage as an external system and your testing won’t include normal user traffic.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Arcjet Way: &lt;em&gt;Test everywhere&lt;/em&gt;
&lt;/h2&gt;

&lt;p&gt;At this point, it’s clear that there’s some toil and doubt involved when making security rule changes with traditional WAFs. Now, let’s explore how &lt;a href="https://docs.arcjet.com/architecture?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;Arcjet’s architecture&lt;/a&gt; allows you to take an approach that removes doubt through simplicity.&lt;/p&gt;

&lt;h2&gt;
  
  
  Test locally
&lt;/h2&gt;

&lt;p&gt;In “&lt;em&gt;the old approach&lt;/em&gt;”, we discussed how it’s nearly impossible to evaluate your security rules while developing on your workstation because the security engine lives on another system.&lt;/p&gt;

&lt;p&gt;A benefit of &lt;a href="https://docs.arcjet.com/architecture?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Arcjet’s architecture&lt;/u&gt;&lt;/a&gt; is that your security functionality lives inside your application. That means you can build your application on your development workstation, and your security rules will act the same locally as if you were running them in production.&lt;/p&gt;

&lt;p&gt;Let’s say you are developing a Next.js application and want to add &lt;a href="https://docs.arcjet.com/shield/quick-start?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;Arcjet Shield WAF&lt;/a&gt; to one of your routes. Once you have Arcjet added to your &lt;code&gt;route.ts&lt;/code&gt; file with a Shield rule, you start your application locally.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import arcjet, { shield } from "@arcjet/next";
import { NextResponse } from "next/server";

const aj = arcjet({
  key: process.env.ARCJET_KEY!,
  rules: [
    shield({
      mode: "LIVE",
    }),
  ],
});

export async function GET(req: Request) {
  const decision = await aj.protect(req);

  for (const result of decision.results) {
    console.log("Rule Result", result);
  }

  console.log("Conclusion", decision.conclusion);

  if (decision.isDenied() &amp;amp;&amp;amp; decision.reason.isShield()) {
    return NextResponse.json(
      {
        error: "You are suspicious!",
      },
      { status: 403 },
    );
  }

  return NextResponse.json({
    message: "Hello world",
  });
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;/app/api/arcjet/route.ts&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;All you need to do to make sure the rule is working is send 5 &lt;code&gt;curl&lt;/code&gt; requests with the special header to cross the test threshold for malicious activity. You can do this with the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;for i in {1..5}; do curl -v -H "x-arcjet-suspicious: true" http://localhost:3000; done
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After running the 5th curl request, you should receive a 403 error and see a blocked request in your Arcjet logs.&lt;/p&gt;

&lt;p&gt;Since the security engine is part of your application, you can do these simple sanity-checks for your rules anywhere your application can run.&lt;/p&gt;

&lt;h2&gt;
  
  
  Test in CI/CD
&lt;/h2&gt;

&lt;p&gt;Another area where it’s traditionally hard to test security rules is in your CI pipeline. Arcjet’s security-as-code architecture makes it easy to do automated testing, such as with the &lt;a href="https://github.com/postmanlabs/newman?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Newman&lt;/u&gt;&lt;/a&gt; framework. Let’s take a look at the following Express app example from &lt;a href="https://github.com/arcjet/arcjet-js/tree/main/examples/express-newman?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;GitHub&lt;/u&gt;&lt;/a&gt; that illustrates how this works:&lt;/p&gt;

&lt;p&gt;In our Express app, we have an API endpoint that is very sensitive to performance issues, so we’ll add an Arcjet rate limit rule to only allow 1 request per second.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import express from "express";
import arcjet, { detectBot, fixedWindow } from "@arcjet/node";

const aj = arcjet({
  key: process.env.ARCJET_KEY,
  rules: [],
});

const app = express();

app.get("/api/low-rate-limit", async (req, res) =&amp;gt; {
  const decision = await aj
    // Only inline to self-contain the sample code.
    // Static rules should be defined outside the handler for performance.
    .withRule(fixedWindow({ mode: "LIVE", window: "1s", max: 1 }))
    .protect(req);

  if (decision.isDenied()) {
    res.status(429).json({ error: "rate limited" });
  } else {
    res.json({ hello: "world" });
  }
});

//...

const server = app.listen(8080);

// Export the server close function so we can shut it down in our tests
export const close = server.close.bind(server);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;index.js&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Now, we’ll define our test collection for Newman. To test the rate limiting, we will have Newman send two requests. We will expect the first request to succeed, and the second request should be denied by our Arcjet rate limit rule.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  "variable": [{ "key": "baseUrl", "value": "http://localhost:8080" }],
  "item": [
    {
      "name": "/api/low-rate-limit",
      "item": [
        {
          "name": "Allowed",
          "request": {
            "url": "{{baseUrl}}/api/low-rate-limit",
            "header": [
              {
                "key": "Accept",
                "value": "application/json"
              }
            ],
            "method": "GET",
            "body": {},
            "auth": null
          },
          "event": [
            {
              "listen": "test",
              "script": {
                "type": "text/javascript",
                "exec": [
                  "pm.test('should be allowed', () =&amp;gt; pm.response.to.have.status(200))"
                ]
              }
            }
          ]
        },
        {
          "name": "Denied",
          "request": {
            "url": "{{baseUrl}}/api/low-rate-limit",
            "header": [
              {
                "key": "Accept",
                "value": "application/json"
              }
            ],
            "method": "GET",
            "body": {},
            "auth": null
          },
          "event": [
            {
              "listen": "test",
              "script": {
                "type": "text/javascript",
                "exec": [
                  "pm.test('should be rate limited', () =&amp;gt; pm.response.to.have.status(429))"
                ]
              }
            }
          ]
        }
      ]
    }
  ],
  "event": []
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;tests/low-rate-limit.json&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Lastly, we’ll create the Javascript file to run our test:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import { after, before, describe, test } from "node:test";
import assert from "node:assert";
import { fileURLToPath } from "node:url";
import { promisify } from "node:util";

import { run } from "newman";

// Promisify the `newman.run` API as `newmanRun` in the tests
const newmanRun = promisify(run);

describe("API Tests", async () =&amp;gt; {
  // Importing the server also starts it listening on port 8080
  const server = await import("../index.js");

  after((done) =&amp;gt; server.close(done));

  test("/api/low-rate-limit", async () =&amp;gt; {
    const summary = await newmanRun({
      collection: fileURLToPath(
        new URL("./low-rate-limit.json", import.meta.url),
      ),
    });

    assert.strictEqual(
      summary.run.failures.length,
      0,
      "expected suite to run without error",
    );
  });

//...

});

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;tests/api.test.js&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Now, you just need to set up a workflow to execute the automated tests within your CI environment and you can run the security rules as part of your test suite.&lt;/p&gt;

&lt;h2&gt;
  
  
  Test in staging or preview environments
&lt;/h2&gt;

&lt;p&gt;With traditional WAFs, staging environments are typically set up to simulate production and run integration tests that behave like they would for your users. Even though Arcjet’s ability to simulate production is similar to WAFs in dedicated environments, there are still two major benefits at this point.&lt;/p&gt;

&lt;p&gt;The first benefit is that &lt;strong&gt;you’ve already simulated production before you even got to the staging environment&lt;/strong&gt;. This means you may have run most of your security-specific tests in CI already and caught issues earlier in the development lifecycle.&lt;/p&gt;

&lt;p&gt;The second benefit is that &lt;strong&gt;setting up additional environments is less work with Arcjet&lt;/strong&gt;. You don’t need to configure a reverse proxy, and all of your security rules were configured when you wrote the code.&lt;/p&gt;

&lt;p&gt;When it comes to dedicated environments, you can stick to the old ways and run tests in staging or preview. However, it’s even more effective to test Arcjet rules in production.&lt;/p&gt;

&lt;h2&gt;
  
  
  Test in production
&lt;/h2&gt;

&lt;p&gt;Admit it - you read the words “test in production” and cringed a little. With Arcjet rules, testing in production isn’t a bad thing. Any rule you create in Arcjet can be run in &lt;code&gt;DRY_RUN&lt;/code&gt; mode without affecting your users. Let’s break down what that looks like.&lt;/p&gt;

&lt;p&gt;When you are defining Arcjet security rules, each rule is deployed in either &lt;code&gt;LIVE&lt;/code&gt; or &lt;code&gt;DRY_RUN&lt;/code&gt; mode. &lt;code&gt;LIVE&lt;/code&gt; rules will actively block a request that matches the security rule, but &lt;code&gt;DRY_RUN&lt;/code&gt; rules will simply log the would-be block action in Arcjet. Here’s an example:&lt;/p&gt;

&lt;p&gt;Let’s say you have an existing Next.js application with &lt;a href="https://docs.arcjet.com/shield/concepts?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Arcjet Shield&lt;/u&gt;&lt;/a&gt; for blocking attacks like SQL injection, but you’d also like to start &lt;a href="https://docs.arcjet.com/bot-protection/concepts?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;blocking automated bots&lt;/u&gt;&lt;/a&gt;. You simply add the detectBot rule to your Arcjet object with &lt;code&gt;mode: "DRY_RUN"&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import arcjet, { createMiddleware, detectBot } from "@arcjet/next";
export const config = {
  // Matcher tells Next.js which routes to run the middleware on.
  // This runs the middleware on all routes except for static assets.
  matcher: ["/((?!_next/static|_next/image|favicon.ico).*)"],
};
const aj = arcjet({
  key: process.env.ARCJET_KEY!,
  rules: [
    shield({
      mode: "LIVE", // Will block requests.
    }),
    detectBot({
      mode: "DRY_RUN", // New rule, log only for evaluation.
      allow: [
        "CATEGORY:SEARCH_ENGINE", // Google, Bing, etc.
      ],
    }),
  ],
});
// Pass any existing middleware with the optional existingMiddleware prop.
export default createMiddleware(aj);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After deploying the new rule, your application’s traffic keeps flowing the same way it did before. After some time, you can check Arcjet’s logs and notice that your uptime monitor’s requests are being logged as would-be blocks. You add another category to your detectBot rule and redeploy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;const aj = arcjet({
  key: process.env.ARCJET_KEY!,
  rules: [
    shield({
      mode: "LIVE",
    }),
    detectBot({
      mode: "DRY_RUN",
      allow: [
        "CATEGORY:SEARCH_ENGINE",
        "CATEGORY:MONITOR", // Uptime monitoring services.
      ],
    }),
  ],
});
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Since the &lt;code&gt;DRY_RUN&lt;/code&gt; capability is built into the rule definition, the process of evaluating rule changes is as easy as actually making the change.&lt;/p&gt;

&lt;p&gt;With Arcjet, all your rules are just code, so you can do things like &lt;a href="https://docs.arcjet.com/blueprints/sampling?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;selectively sampling requests&lt;/a&gt; and applying rules to a subset of traffic. For example, if you wanted to trigger Arcjet Shield and bot detection rules in live mode on 10% of your traffic then you could write a sample function like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import arcjet, { detectBot, shield } from "@arcjet/next";
import { NextRequest, NextResponse } from "next/server";

export const config = {
  // matcher tells Next.js which routes to run the middleware on. This runs
  // the middleware on all routes except for static assets.
  matcher: ["/((?!_next/static|_next/image|favicon.ico).*)"],
};

const sampleRate = 0.1; // 10% of requests

const aj = arcjet({
  key: process.env.ARCJET_KEY!,
  // You could include one or more base rules to apply to all requests
  rules: [],
});

function shouldSampleRequest(sampleRate: number) {
  // sampleRate should be between 0 and 1, e.g., 0.1 for 10%, 0.5 for 50%
  return Math.random() &amp;lt; sampleRate;
}

// Shield and bot rules will be configured with live mode if the request is
// sampled, otherwise only Shield will be configured with dry run mode
function sampleSecurity() {
  if (shouldSampleRequest(sampleRate)) {
    console.log("Rule is LIVE");
    return aj
      .withRule(
        shield(
          { mode: "LIVE" }, // will block requests if triggered
        ),
      )
      .withRule(
        detectBot({
          mode: "LIVE",
          allow: [], // "allow none" will block all detected bots
        }),
      );
  } else {
    console.log("Rule is DRY_RUN");
    return aj.withRule(
      shield({
        mode: "DRY_RUN", // Only logs the result
      }),
    );
  }
}

export default async function middleware(request: NextRequest) {
  const decision = await sampleSecurity().protect(request);

  if (decision.isDenied()) {
    if (decision.reason.isBot()) {
      return NextResponse.json({ error: "You are a bot" }, { status: 403 });
    } else if (decision.reason.isShield()) {
      return NextResponse.json({ error: "Shields up!" }, { status: 403 });
    } else {
      return NextResponse.json({ error: "Forbidden" }, { status: 403 });
    }
  } else {
    return NextResponse.next();
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Next.js middleware.ts&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Maybe “evaluate in production” is the more accurate term for what Arcjet allows you to do, but the benefits are clear: &lt;em&gt;you can push a new rule to production with no worries and see what happens&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Making changes to security rules for your application doesn’t have to be intimidating or uncertain. We covered some of the ways traditional web application security tools fall short for change management, and how Arcjet provides a solution.&lt;/p&gt;

&lt;p&gt;Arcjet’s architecture delivers a lot of benefits for developer experience, and testing changes is one of them. By making use of &lt;code&gt;DRY_RUN&lt;/code&gt; mode, you can build confidence in your changes with no added complexity. With early sanity-checks and simple evidence from real traffic, you will have no fear of breaking production when using Arcjet to protect your application.&lt;/p&gt;

</description>
      <category>javascript</category>
    </item>
    <item>
      <title>Building a minimalist web server using the Go standard library + Tailwind CSS</title>
      <dc:creator>David Mytton</dc:creator>
      <pubDate>Thu, 02 Jan 2025 12:12:40 +0000</pubDate>
      <link>https://dev.to/arcjet/building-a-minimalist-web-server-using-the-go-standard-library-tailwind-css-39gj</link>
      <guid>https://dev.to/arcjet/building-a-minimalist-web-server-using-the-go-standard-library-tailwind-css-39gj</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4denxdu0k17kg0z4uxxe.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4denxdu0k17kg0z4uxxe.jpg" alt="Building a minimalist web server using the Go standard library + Tailwind CSS" width="800" height="448"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Dependencies pose a significant maintenance burden on software projects. Every package introduces risks by adding code outside your control, making them a &lt;a href="https://blog.arcjet.com/security-concepts-for-developers-dependency-confusion-attacks/" rel="noopener noreferrer"&gt;&lt;u&gt;common attack vector&lt;/u&gt;&lt;/a&gt;. Failing to stay up to date can force stressful upgrades when security patches are released. This is especially challenging in ecosystems like JavaScript, where breaking changes and dependency churn are common.&lt;/p&gt;

&lt;p&gt;At &lt;a href="https://arcjet.com/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Arcjet&lt;/u&gt;&lt;/a&gt;, we’re on a path to zero dependencies for our developer security SDK. &lt;a href="https://github.com/arcjet/arcjet-js/issues/44?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Our goal for the 1.0.0 release&lt;/u&gt;&lt;/a&gt; (coming soon!) is that when you install Arcjet, you only have to trust us because the only code you bring into your project is our SDK.&lt;/p&gt;

&lt;p&gt;I’ve also been thinking about how we can achieve this with our server-side Go code. Our API is &lt;a href="https://blog.arcjet.com/how-we-achieve-our-25ms-p95-response-time-sla/" rel="noopener noreferrer"&gt;&lt;u&gt;built for low-latency high-throughput&lt;/u&gt;&lt;/a&gt; security decisions, so performance is crucial. While the Go ecosystem experiences less dependency churn than JavaScript, where keeping up to date has become a serious chore, minimizing dependencies remains a goal across our entire codebase.&lt;/p&gt;

&lt;p&gt;Luckily, Go has an extensive standard library. Over the holidays I was inspired by &lt;a href="https://matthewsanabria.dev/posts/start-with-the-go-standard-library/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;several&lt;/u&gt;&lt;/a&gt; recent &lt;a href="https://threedots.tech/post/common-anti-patterns-in-go-web-applications/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;blog posts&lt;/u&gt;&lt;/a&gt; about &lt;a href="https://www.youtube.com/watch?v=H7tbjKFSg58&amp;amp;ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;using the standard library first&lt;/u&gt;&lt;/a&gt; before reaching for third-party modules. I decided to experiment with building a website using only the Go standard library, plus HTML and &lt;a href="https://tailwindcss.com/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Tailwind CSS&lt;/u&gt;&lt;/a&gt; for styling.&lt;/p&gt;

&lt;p&gt;In this post I’ll discuss how I built a minimalist web server using the Go standard library which dynamically generates HTML and CSS using Tailwind CSS. The only external dependency is the Tailwind CLI and optional use of &lt;a href="https://github.com/air-verse/air?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;Air&lt;/a&gt; for live reload, neither of which are in the production build artifacts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Dependency churn
&lt;/h2&gt;

&lt;p&gt;How often do you &lt;a href="https://abdisalan.com/posts/tragedy-running-old-node-project?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;come back to an old project only to find it won’t build&lt;/u&gt;&lt;/a&gt; because some key dependencies have changed? Or you want to build a new feature only to find out there have been breaking changes to the core dependencies which must be upgraded first? Or a major API you relied on has been deprecated and the migration path is incomplete?&lt;/p&gt;

&lt;p&gt;Whether it’s a side project or a major application you work on, I bet every developer has experienced this. It’s frustrating because you then have to spend time on rebuilding, refactoring, and/or migrating to the “new” way of doing things.&lt;/p&gt;

&lt;p&gt;Using third party libraries speeds up development because you don’t need to reinvent the wheel. However, they always introduce maintenance overhead and usually come without any guarantee of continued updates or backwards compatibility. Multiply this for dependency you include and you have a real maintenance burden.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why rely on the Go standard library?
&lt;/h2&gt;

&lt;p&gt;The standard library for whichever programming language you’re using also might not have any such guarantees, but mature languages know that developers rely on them. There is an implied contract that things should rarely break.&lt;/p&gt;

&lt;p&gt;But in Go, there is an explicit contract:&lt;/p&gt;

&lt;p&gt;In &lt;a href="https://go.dev/doc/go1compat?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Go 1 and the Future of Go Programs&lt;/u&gt;&lt;/a&gt; (from 2012) the Go team state&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Go 1 defines two things: first, the specification of the language; and second, the specification of a set of core APIs, the "standard packages" of the Go library. The Go 1 release includes their implementation in the form of two compiler suites (gc and gccgo), and the core libraries themselves.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And in 2023 this was followed up by &lt;a href="https://go.dev/blog/compat?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Backward Compatibility, Go 1.21, and Go 2&lt;/u&gt;&lt;/a&gt;: &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;when should we expect the Go 2 specification that breaks old Go 1 programs?   &lt;/p&gt;

&lt;p&gt;The answer is never. Go 2, in the sense of breaking with the past and no longer compiling old programs, is never going to happen. Go 2 in the sense of being the major revision of Go 1 we started toward in 2017 has already happened.   &lt;/p&gt;

&lt;p&gt;There will not be a Go 2 that breaks Go 1 programs. Instead, we are going to double down on compatibility, which is far more valuable than any possible break with the past. In fact, we believe that prioritizing compatibility was the most important design decision we made for Go 1.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;By relying solely on the Go standard library, we can effectively guarantee long-term compatibility and minimal breakage.&lt;/p&gt;

&lt;h2&gt;
  
  
  Does Go have everything we need?
&lt;/h2&gt;

&lt;p&gt;Yes! Go 1.22 introduced some improvements to the built in web server to &lt;a href="https://jvns.ca/blog/2024/09/27/some-go-web-dev-notes/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;make defining routes a lot easier&lt;/u&gt;&lt;/a&gt;, negating many of the ergonomic benefits of frameworks like Gin (and others &lt;a href="https://www.alexedwards.net/blog/which-go-router-should-i-use?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;to choose from&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;And if we combine this with the existing support for serving &lt;a href="https://pkg.go.dev/embed?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;embedded static files&lt;/u&gt;&lt;/a&gt;, compiling &lt;a href="https://go.dev/doc/articles/wiki/?ref=blog.arcjet.com#tmp_6" rel="noopener noreferrer"&gt;&lt;u&gt;dynamic HTML templates&lt;/u&gt;&lt;/a&gt;, &lt;a href="https://go.dev/blog/slog?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;structured logging&lt;/u&gt;&lt;/a&gt;, &lt;a href="https://go.dev/doc/tutorial/database-access?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;SQL drivers for common databases&lt;/u&gt;&lt;/a&gt;, &lt;a href="https://go.dev/blog/execution-traces-2024?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;runtime traces&lt;/u&gt;&lt;/a&gt;, and the ability to &lt;a href="https://go.dev/blog/generate?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;execute commands before the Go build step&lt;/u&gt;&lt;/a&gt;, we can easily build a single binary with no external dependencies ready to ship to production.&lt;/p&gt;

&lt;p&gt;Of course you can swap things out later - using an ORM like &lt;a href="https://gorm.io/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Gorm&lt;/u&gt;&lt;/a&gt; or &lt;a href="https://opentelemetry.io/docs/languages/go/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;exporting telemetry to Otel&lt;/u&gt;&lt;/a&gt;, for example - but Go has everything we need to get started in the standard library.&lt;/p&gt;

&lt;h2&gt;
  
  
  Start with the web server
&lt;/h2&gt;

&lt;p&gt;Developers familiar with frameworks like Next.js or Remix are accustomed to automatic static asset management and filesystem-based route definitions. With our minimalist Go server this is more manual, but can be implemented in a way that feels idiomatic with &lt;a href="https://go.dev/blog/routing-enhancements?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;the routing enhancements in Go 1.22&lt;/u&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We’ll follow the commonly used &lt;a href="https://github.com/golang-standards/project-layout?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Go Project Layout&lt;/u&gt;&lt;/a&gt; by defining the routes and server in main.go which will load the route handlers from an &lt;code&gt;internal/handlers&lt;/code&gt; package and keep the web content and templates in a &lt;code&gt;web/templates&lt;/code&gt; directory.&lt;/p&gt;

&lt;p&gt;For a basic website we want a static directory of assets like CSS and images (at &lt;code&gt;web/static&lt;/code&gt;), plus a favicon and &lt;code&gt;robots.txt&lt;/code&gt; hosted at the root. These are embedded in the Go binary. We also include a simple health check to indicate the server is running when it’s deployed.&lt;/p&gt;

&lt;p&gt;The server will be containerized and shipped to a modern hosting platform like Railway, Fly.io, Render, or one of the larger cloud providers. It’s standard to route requests through a proxy or load balancer which can handle SSL, so that’s another dependency avoided.&lt;/p&gt;

&lt;p&gt;However, if you wanted to just host this on a single VM then you could use something like &lt;a href="https://github.com/caddyserver/certmagic?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Certmagic&lt;/u&gt;&lt;/a&gt; to automatically generate a Let’s Encrypt certificate for you. It goes against our zero dependency philosophy, but dealing with issuing SSL certificates might not be something you want to write from scratch! This becomes more challenging when you have to sync certificates across multiple servers, which is why it's often delegated to the proxy frontend.&lt;/p&gt;

&lt;p&gt;Finally, we also set up &lt;a href="https://go.dev/blog/slog?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;structured logging using &lt;code&gt;log/slog&lt;/code&gt;&lt;/a&gt; and use an environment variable to configure plain text (default, for development) and JSON (for production).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;package main

import (
    "embed"
    "fmt"
    "io/fs"
    "log/slog"
    "net/http"
    "os"
    "time"
)

//go:embed web/static/*
var static embed.FS

func init() {
    _, jsonLogger := os.LookupEnv("JSON_LOGGER")
    _, debug := os.LookupEnv("DEBUG")

    var programLevel slog.Level
    if debug {
        programLevel = slog.LevelDebug
    }

    if jsonLogger {
        jsonHandler := slog.NewJSONHandler(os.Stdout, &amp;amp;slog.HandlerOptions{
            Level: programLevel,
        })
        slog.SetDefault(slog.New(jsonHandler))
    } else {
        textHandler := slog.NewTextHandler(os.Stdout, &amp;amp;slog.HandlerOptions{
            Level: programLevel,
        })
        slog.SetDefault(slog.New(textHandler))
    }

    slog.Info("Logger initialized", slog.Bool("debug", debug))
}

func main() {
    port := os.Getenv("PORT")
    if port == "" {
        port = "8080"
    }
    addr := ":" + port

    mux := http.NewServeMux()

    // Use an embedded filesystem rooted at "web/static"
    fs, err := fs.Sub(static, "web/static")
    if err != nil {
        slog.Error("Failed to create sub filesystem", "error", err)
        return
    }

    // Serve files from the embedded /web/static directory at /static
    fileServer := http.FileServer(http.FS(fs))
    mux.Handle("GET /static/", http.StripPrefix("/static/", fileServer))

    mux.HandleFunc("GET /favicon.ico", func(w http.ResponseWriter, r *http.Request) {
        data, err := static.ReadFile("web/static/img/favicon.ico")
        if err != nil {
            http.NotFound(w, r)
            return
        }
        w.Header().Set("Content-Type", "text/plain")
        w.Write(data)
    })
    mux.HandleFunc("GET /robots.txt", func(w http.ResponseWriter, r *http.Request) {
        data, err := static.ReadFile("web/static/robots.txt")
        if err != nil {
            http.NotFound(w, r)
            return
        }
        w.Header().Set("Content-Type", "text/plain")
        w.Write(data)
    })

    mux.HandleFunc("GET /health", func(w http.ResponseWriter, r *http.Request) {
        w.Header().Set("Content-Type", "text/plain")
        w.Write([]byte(`OK`))
    })

    server := &amp;amp;http.Server{
        Addr: fmt.Sprintf(":%s", port),
        Handler: mux,
        // Recommended timeouts from
        // https://blog.cloudflare.com/exposing-go-on-the-internet/
        ReadTimeout: 5 * time.Second,
        WriteTimeout: 10 * time.Second,
        IdleTimeout: 120 * time.Second,
    }

    slog.Info("Server listening", "addr", addr)

    if err := server.ListenAndServe(); err != nil {
        slog.Error("Server failed to start", "error", err)
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;main.go web server.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Assuming &lt;code&gt;web/static/robots.txt&lt;/code&gt; exists then when you &lt;code&gt;go run main.go&lt;/code&gt; and &lt;code&gt;curl http://localhost:8080/robots.txt&lt;/code&gt; you’ll get the contents of that file served. Same for &lt;code&gt;robots.txt&lt;/code&gt; and the health check.&lt;/p&gt;

&lt;h2&gt;
  
  
  Middleware
&lt;/h2&gt;

&lt;p&gt;Running code on every request is useful for logging, error handling, authentication, etc. Go web frameworks like &lt;a href="https://github.com/gin-gonic/gin?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Gin&lt;/u&gt;&lt;/a&gt; offer &lt;a href="https://github.com/gin-contrib?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;a rich middleware ecosystem&lt;/a&gt;, which is an advantage. Figuring out &lt;a href="https://github.com/gin-contrib/cors?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;CORS&lt;/a&gt; is a pain! However, writing custom middleware for Go &lt;code&gt;net/http&lt;/code&gt; is straightforward.&lt;/p&gt;

&lt;p&gt;To make this web server more robust, we’ll include a simple panic handler so that if any of the routes panic, we don’t crash the server.&lt;/p&gt;

&lt;p&gt;Create a new package in &lt;code&gt;internal/middleware&lt;/code&gt; with &lt;code&gt;middleware.go&lt;/code&gt; defining the structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;package middleware

import (
    "net/http"
)

// Middleware is a function that wraps an http.Handler with custom logic.
type Middleware func(http.Handler) http.Handler

// Chain is a helper to build up a pipeline of middlewares, then apply them to a
// final handler.
type Chain struct {
    middlewares []Middleware
}

// Use appends a middleware to the chain.
func (c *Chain) Use(m Middleware) {
    c.middlewares = append(c.middlewares, m)
}

// Then applies the entire chain of middlewares to the final handler in reverse
// order.
func (c *Chain) Then(h http.Handler) http.Handler {
    for i := len(c.middlewares) - 1; i &amp;gt;= 0; i-- {
        h = c.middlewares[i](h)
    }
    return h
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;The middleware definition in internal/middleware/middleware.go&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Then &lt;code&gt;internal/middleware/recover.go&lt;/code&gt; can be a new HTTP handler:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;package middleware

import (
    "log/slog"
    "net/http"
)

func RecoverMiddleware(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        defer func() {
            if err := recover(); err != nil {
                slog.Error("Recovered from panic", "error", err)
                http.Error(w, "Internal Server Error", http.StatusInternalServerError)
            }
        }()
        next.ServeHTTP(w, r)
    })
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;The panic recovery middleware in internal/middleware/recover.go&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In &lt;code&gt;main.go&lt;/code&gt; we can wrap &lt;code&gt;mux&lt;/code&gt; with the new middleware and update the &lt;code&gt;http.Server&lt;/code&gt; to use the wrapped &lt;code&gt;mux&lt;/code&gt; as the &lt;code&gt;Handler&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;chain := &amp;amp;middleware.Chain{}
chain.Use(middleware.RecoverMiddleware)
wrappedMux := chain.Then(mux)

server := &amp;amp;http.Server{
    Addr: fmt.Sprintf(":%s", port),
    Handler: wrappedMux,
    // Recommended timeouts from
    // https://blog.cloudflare.com/exposing-go-on-the-internet/
    ReadTimeout: 5 * time.Second,
    WriteTimeout: 10 * time.Second,
    IdleTimeout: 120 * time.Second,
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Updated main.go with the new middleware.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Generating CSS with Tailwind
&lt;/h2&gt;

&lt;p&gt;Tailwind uses HTML class attributes to automatically generate the CSS needed to create the layout, but that means it needs a build step. It has to parse the HTML then build the CSS, which we want to ship with the Go binary so everything is self contained.&lt;/p&gt;

&lt;p&gt;I followed Xe Iaso’s blog &lt;a href="https://xeiaso.net/blog/using-tailwind-go/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;How to use Tailwind CSS in your Go programs&lt;/u&gt;&lt;/a&gt; to get this working. You have to set up a basic npm package in your root so that we can include the Tailwind CLI. The package.json build script line triggers the build where &lt;code&gt;web/static/css/main.css&lt;/code&gt; is any custom CSS you want included and &lt;code&gt;web/static/css/styles.css&lt;/code&gt; is the output. You can remove the &lt;code&gt;main.css&lt;/code&gt; file if you don’t have anything extra to add.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  "name": "example.com",
  "version": "1.0.0",
  "scripts": {
    "build": "tailwindcss build -i web/static/css/main.css -o web/static/css/styles.css"
  },
  "dependencies": {
    "tailwindcss": "3.4.17"
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;package.json&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;tailwind.config.js&lt;/code&gt; file looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/** @type {import('tailwindcss').Config} */
module.exports = {
  darkmode: "media",
  content: ["./web/templates/*.html"],
  plugins: [],
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;tailwind.config.js&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Then in &lt;code&gt;main.go&lt;/code&gt; in the root we add a generate command so that we can trigger the npm build script as part of the Go build process.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;package main

import (
    "embed"
    "fmt"
    "io/fs"
    "log/slog"
    "net/http"
    "os"
    "time"
)

//go:generate npm run build

//go:embed web/static/*
var static embed.FS

func init() {
    // ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Update top of the main.go file&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Running &lt;code&gt;go generate &amp;amp;&amp;amp; go run main.go&lt;/code&gt; triggers Tailwind to generate CSS in the &lt;code&gt;web/static&lt;/code&gt; directory, which Go then embeds into the build.&lt;/p&gt;

&lt;h2&gt;
  
  
  Web templates
&lt;/h2&gt;

&lt;p&gt;The final thing to do is set up a simple index page as our first route. Create an HTML file at &lt;code&gt;web/templates/index.html&lt;/code&gt; with anything you like. Then at &lt;code&gt;web/templates.go&lt;/code&gt; set up this file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;package web

import (
    "embed"
)

//go:embed templates
var Templates embed.FS
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;web/templates.go&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This allows us to use the templates as a Go module so they can be imported elsewhere in our code.&lt;/p&gt;

&lt;p&gt;In a new package at &lt;code&gt;internal/handlers/root.go&lt;/code&gt; we can define a root handler:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;package handlers

import (
    "net/http"

    "html/template"
    "log/slog"

    "github.com/davidmytton/example/web"
)

type PageData struct {
    Title string
}

func RootHandler() http.HandlerFunc {
    return func(w http.ResponseWriter, r *http.Request) {
        file, err := web.Templates.ReadFile("templates/index.html")
        if err != nil {
            http.Error(w, "Internal Server Error", http.StatusInternalServerError)
            slog.Error("Error reading template", "error", err)
            return
        }

        tmpl := template.Must(template.New("index.html").Parse(string(file)))

        data := PageData{
            Title: "Home",
        }
        if err := tmpl.Execute(w, data); err != nil {
            http.Error(w, "Internal Server Error", http.StatusInternalServerError)
            slog.Error("Error executing template", "error", err)
        }
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;internal/handlers/root.go handler&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This reads the package template and then parses the data. Our panic recovery middleware will ensure that any template compilation errors are handled gracefully. The template &lt;a href="https://pkg.go.dev/text/template?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;can include variables and other structures&lt;/a&gt;, such as the title we pass in.&lt;/p&gt;

&lt;p&gt;By using &lt;a href="https://pkg.go.dev/html/template?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;Go's html/template we get injection-safety&lt;/a&gt; - the templates themselves are assumed to be safe, but the data injected is not, so Go handles it appropriately.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;lt;!DOCTYPE html&amp;gt;
&amp;lt;html lang="en" class="h-screen"&amp;gt;
  &amp;lt;head&amp;gt;
    &amp;lt;meta charset="utf-8" /&amp;gt;
    &amp;lt;meta name="viewport" content="width=device-width,initial-scale=1" /&amp;gt;
    &amp;lt;title&amp;gt;{{.Title}}&amp;lt;/title&amp;gt;
    &amp;lt;!-- ... --&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;web/templates/index.html template file&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Finally, in &lt;code&gt;main.go&lt;/code&gt; we can set up the handler:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;mux.HandleFunc("GET /", handlers.RootHandler())
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Add to the main.go route definitions&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Live reload with Air
&lt;/h2&gt;

&lt;p&gt;One nice feature of web frameworks like Next.js is the instant reload whenever you make code changes. To implement live reload for our Go web server we can launch it with &lt;a href="https://github.com/air-verse/air?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Air&lt;/u&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;After you have installed Air and generated the default config with &lt;code&gt;air init&lt;/code&gt; then you can adjust the build configuration with the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[build]
  args_bin = []
  bin = "./tmp/main"
  cmd = "go generate &amp;amp;&amp;amp; go build -o ./tmp/main ."
  delay = 1000
  exclude_dir = ["node_modules", "assets", "tmp", "vendor", "testdata"]
  exclude_file = []
  exclude_regex = ["_test.go"]
  exclude_unchanged = false
  follow_symlink = false
  full_bin = ""
  include_dir = []
  include_ext = ["go", "tpl", "tmpl", "html", "css"]
  include_file = []
  kill_delay = "0s"
  log = "build-errors.log"
  poll = false
  poll_interval = 0
  post_cmd = []
  pre_cmd = []
  rerun = false
  rerun_delay = 500
  send_interrupt = false
  stop_on_error = false
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;.air.toml&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Launching the dev environment
&lt;/h2&gt;

&lt;p&gt;How many times have you come back to a side project only to forget how to actually run it? &lt;code&gt;Make&lt;/code&gt; tends to be installed by default on most systems so if we always write a &lt;code&gt;make dev&lt;/code&gt; command then we don’t need to remember anything!&lt;/p&gt;

&lt;p&gt;Following the example of &lt;a href="https://github.com/Melkeydev/go-blueprint?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;go-blueprint&lt;/u&gt;&lt;/a&gt;, I also set up a &lt;code&gt;Makefile&lt;/code&gt; to easily start the web server. Here’s the contents:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.PHONY: dev build

dev:
    @if command -v $(HOME)/go/bin/air &amp;gt; /dev/null; then \
        AIR_CMD="$(HOME)/go/bin/air"; \
    elif command -v air &amp;gt; /dev/null; then \
        AIR_CMD="air"; \
    else \
        read -p "air is not installed. Install it? [Y/n] " choice; \
        if ["$$choice" != "n"] &amp;amp;&amp;amp; ["$$choice" != "N"]; then \
            echo "Installing..."; \
            go install github.com/air-verse/air@latest; \
            AIR_CMD="$(HOME)/go/bin/air"; \
        else \
            echo "Exiting..."; \
            exit 1; \
        fi; \
    fi; \
    echo "Starting Air..."; \
    $$AIR_CMD

build:
    @echo "Installing Tailwind..."
    npm ci
    @echo "Generate Tailwind CSS..."
    go generate
    @echo "Building Go server..."
    go build -o tmp/server main.go
    @echo "Build complete."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;The project Makefile&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Running &lt;code&gt;make dev&lt;/code&gt; will launch &lt;code&gt;air&lt;/code&gt; watching files for any changes. If you edit a template or any of the Go files, the server will relaunch.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deployment
&lt;/h2&gt;

&lt;p&gt;We can extend our minimalist philosophy to the production builds as well. I like the &lt;a href="https://github.com/GoogleContainerTools/distroless?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Distroless&lt;/u&gt;&lt;/a&gt; project from Google which provides bare-minimum container images without any operating system and without running as root. &lt;a href="https://edu.chainguard.dev/open-source/wolfi/wolfi-with-dockerfiles/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Wolfi&lt;/u&gt;&lt;/a&gt; is an alternative if you need more choice over what's installed, but still want to go with a minimalist approach.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;Dockerfile&lt;/code&gt; looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;FROM golang:1.23 AS builder

WORKDIR /app
COPY go.mod ./

ENV CGO_ENABLED=0
RUN go mod download

COPY . .
RUN go build -o server main.go

# Copy the server binary into a distroless container
FROM gcr.io/distroless/static-debian12:nonroot
COPY --from=builder /app/server /

CMD ["/server"]

USER nonroot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This assumes the CSS has already been generated with &lt;code&gt;go generate&lt;/code&gt; after which you can build the container with &lt;code&gt;docker build -t website --load .&lt;/code&gt; and run it with &lt;code&gt;docker run -t website&lt;/code&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;We’ve now built a self-contained website using Tailwind CSS and dynamic HTML templates, ready to deploy as a tiny ~10 MB Docker image. Contrast this to a typical Node.js container, which could easily exceed hundreds of MB. Smaller image sizes isn’t a particularly important goal by itself, but it’s indicative of all the extra bloat you’re shipping.&lt;/p&gt;

&lt;p&gt;Unfortunately, we still have to rely on the Tailwind CSS CLI. To illustrate how crazy things have become even with just that single external dependency, take a look inside the &lt;code&gt;node_modules&lt;/code&gt; directory and see how many packages it requires! Thankfully, &lt;a href="https://tailwindcss.com/docs/v4-beta?ref=blog.arcjet.com#installing-the-cli" rel="noopener noreferrer"&gt;&lt;u&gt;the Tailwind v4 beta includes standalone CLI binaries&lt;/u&gt;&lt;/a&gt; so hopefully we’ll be able to use that in future.&lt;/p&gt;

&lt;p&gt;However, the server itself has zero external dependencies and since the default &lt;code&gt;go.mod&lt;/code&gt; contains the toolchain version, even if we come back to this in 10 years it should still build and run without any changes!&lt;/p&gt;

</description>
      <category>go</category>
    </item>
    <item>
      <title>Remix Security Checklist</title>
      <dc:creator>David Mytton</dc:creator>
      <pubDate>Mon, 23 Dec 2024 11:08:57 +0000</pubDate>
      <link>https://dev.to/arcjet/remix-security-checklist-2mi8</link>
      <guid>https://dev.to/arcjet/remix-security-checklist-2mi8</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1d5pamnfh0rxzuegt4x2.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1d5pamnfh0rxzuegt4x2.jpg" alt="Remix Security Checklist" width="800" height="448"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://remix.run/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;Remix&lt;/a&gt; has been growing in popularity as a more lightweight framework that closely follows web standards. As an alternative to Next.js, it has tried to take a more minimalist path. Perhaps that’s why &lt;a href="https://www.youtube.com/watch?v=hHWgGfZpk00&amp;amp;ref=blog.arcjet.com" rel="noopener noreferrer"&gt;OpenAI recently migrated the ChatGPT UI from Next.js to Remix&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://blog.arcjet.com/next-js-security-checklist/" rel="noopener noreferrer"&gt;&lt;u&gt;Like Next.js&lt;/u&gt;&lt;/a&gt;, building frontend UI with React includes basic security out of the box - vulnerabilities like cross site scripting are much less likely due to the design of the framework. However, you still need to think about security.&lt;/p&gt;

&lt;p&gt;Good security is built in layers, creating walls behind ones that may be breached. Although total security is impossible, there are several additional measures you can take to mitigate the risk of attack. Defense-in-depth ensures that if one mechanism fails, others still protect the application.&lt;/p&gt;

&lt;p&gt;We recently released the &lt;a href="https://blog.arcjet.com/announcing-the-arcjet-nestjs-remix-adapters/" rel="noopener noreferrer"&gt;&lt;u&gt;Arcjet security as code SDK for Remix&lt;/u&gt;&lt;/a&gt; to bring bot detection, PII redaction, signup form spam protection and rate limiting to Remix. This article will cover some of the important areas from our research about how to improve your Remix app security.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Dependencies &amp;amp; Updates
&lt;/h2&gt;

&lt;p&gt;With the frequency and severity of supply chain attacks on the rise, one of the easiest ways to keep your website and user base safe is to stay current on the latest patches and updates. Though this vulnerability class has been gaining more attention and registries are making changes to minimize the threat level, third-party integrations will always pose a risk.&lt;/p&gt;

&lt;p&gt;At Arcjet we review non-critical dependency updates every week using &lt;a href="https://docs.github.com/en/code-security/dependabot?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Dependabot&lt;/u&gt;&lt;/a&gt; with &lt;a href="https://socket.dev/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Socket&lt;/u&gt;&lt;/a&gt; pull request analysis to help us keep an eye on what’s included in those updates. Alternatively, &lt;a href="https://docs.npmjs.com/cli/v9/commands/npm-audit?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;npm audit&lt;/u&gt;&lt;/a&gt; can be used to assist you in keeping up-to-date with the latest releases that have addressed publicly known vulnerabilities.&lt;/p&gt;

&lt;p&gt;Attacks such as &lt;a href="https://blog.arcjet.com/security-concepts-for-developers-dependency-confusion-attacks/" rel="noopener noreferrer"&gt;&lt;u&gt;dependency confusion&lt;/u&gt;&lt;/a&gt; and &lt;a href="https://blog.arcjet.com/security-concepts-for-developers-package-hijacking/" rel="noopener noreferrer"&gt;&lt;u&gt;package hijacking&lt;/u&gt;&lt;/a&gt; have been responsible for major disruptions. Socket helps us ensure we mitigate those risks by highlighting unusual updates. To minimize your attack surface, consider implementing the functionality provided by &lt;a href="https://blog.arcjet.com/security-concepts-for-developers-trivial-packages/" rel="noopener noreferrer"&gt;&lt;u&gt;trivial packages&lt;/u&gt;&lt;/a&gt; yourself.&lt;/p&gt;

&lt;p&gt;The JavaScript ecosystem has a relatively high churn rate of updates which can be challenging to keep up to date with, particularly when there are breaking changes. This is a pain, but it’s more painful to be forced through several major version changes if you don’t keep up and then there’s a critical vulnerability that is only addressed in the latest release!&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Module constraints
&lt;/h2&gt;

&lt;p&gt;Server-only code will be automatically removed from what gets sent to the browser by the Remix compiler. However, to ensure this works properly, you must avoid &lt;a href="https://remix.run/docs/en/main/guides/constraints?ref=blog.arcjet.com#no-module-side-effects" rel="noopener noreferrer"&gt;&lt;u&gt;module side effects&lt;/u&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Certain operations, such as logging or API calls, that occur immediately when a module is imported can expose sensitive information or cause errors:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import { auth } from "../auth.server";
import { useLoaderData } from "@remix-run/react";
import type { LoaderFunctionArgs } from "@remix-run/node";

// DANGEROUS: Immediately tries to verify auth on module import.
const authStatus = auth.verifySession();
// DANGEROUS: Potential side effect, may expose auth logic.
console.log("Auth Status:", authStatus);

export async function loader({ request }: LoaderFunctionArgs) {
  return Response.json({
    users: await auth.getAuthorizedUsers(),
    status: authStatus // Using the problematic module-level variable.
  });
}

export default function Users() {
  const data = useLoaderData&amp;lt;typeof loader&amp;gt;();
  return &amp;lt;div&amp;gt;{/* render users */}&amp;lt;/div&amp;gt;;
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Instead, the code responsible for the side effect should be wrapped in the &lt;code&gt;loader&lt;/code&gt; function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import { auth } from "../auth.server";
import { useLoaderData } from "@remix-run/react";
import type { LoaderFunctionArgs } from "@remix-run/node";

export async function loader({ request }: LoaderFunctionArgs) {
  // Side effect properly contained within the loader.
  const authStatus = await auth.verifySession();
  console.log("Auth Status:", authStatus); // Safe implementation.

  return Response.json({
    users: await auth.getAuthorizedUsers(),
    status: authStatus
  });
}

export default function Users() {
  const data = useLoaderData&amp;lt;typeof loader&amp;gt;();
  return &amp;lt;div&amp;gt;{/* render users */}&amp;lt;/div&amp;gt;;
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Placing sensitive operations (e.g., database queries, verifying sessions) directly in an imported module can unintentionally leak this logic to the client bundle, or execute it too early.&lt;/p&gt;

&lt;p&gt;Always wrap such operations in a &lt;code&gt;loader&lt;/code&gt; or &lt;code&gt;action&lt;/code&gt; so that Remix’s server-only compilation can properly exclude them from client code. This approach also helps ensure that any credentials or personally identifiable information (PII) are only accessed by server functions, further reducing the likelihood of a security breach.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Environment variables
&lt;/h2&gt;

&lt;p&gt;All your environment variables, such as API keys or database URLs, should be kept server-side to avoid accidental exposure. Any file with the &lt;code&gt;.server.ts&lt;/code&gt; suffix will only ever be executed on the server.&lt;/p&gt;

&lt;p&gt;You can access server-side environment variables inside your loader because loaders only ever run on the server. However, any value that is returned by the loader will be available on the client. Use sensitive environment variables inside the loader, but do not return them.&lt;/p&gt;

&lt;p&gt;Anything placed into &lt;code&gt;window.ENV&lt;/code&gt; will be exposed to the browser. While it’s convenient to pass environment variables to the client via a context like &lt;code&gt;window.ENV&lt;/code&gt;, you must ensure only non-sensitive values are exposed - like a public-facing API base URL.&lt;/p&gt;

&lt;p&gt;The ideal situation is to avoid using environment variables for any secrets - this is &lt;a href="https://blog.arcjet.com/storing-secrets-in-env-vars-considered-harmful/" rel="noopener noreferrer"&gt;a common anti-pattern that should be avoided&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Authentication &amp;amp; Authorization
&lt;/h2&gt;

&lt;p&gt;Securing routes in Remix can be accomplished with its native utilities. Session cookies can be created either with the createCookie utility or by using a session storage object which will be checked in a loader or action when reading or writing data.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;createCookie&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;This utility creates a logical container to manage a browser cookie issued by the server. Any &lt;a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Set-Cookie?ref=blog.arcjet.com#attributes" rel="noopener noreferrer"&gt;&lt;u&gt;attributes&lt;/u&gt;&lt;/a&gt; can be set by adding to the options object or during &lt;code&gt;serialize()&lt;/code&gt; when the &lt;code&gt;Set-Cookie&lt;/code&gt; response header is generated.&lt;/p&gt;

&lt;p&gt;Since cookies can be easily tampered with, &lt;a href="https://remix.run/docs/en/main/utils/cookies?ref=blog.arcjet.com#signing-cookies" rel="noopener noreferrer"&gt;Remix will automatically sign a cookie&lt;/a&gt; to verify its contents and ensure its integrity. The secrets that will be used for signing are sourced from the secrets property which stores a string value array. &lt;/p&gt;

&lt;p&gt;If multiple secrets are provided, the one at index position &lt;code&gt;0&lt;/code&gt; will be used to sign all outgoing cookies. However, any cookies that were signed with older secrets will still successfully decode. This is useful when you want to rotate secrets. It is critical that any secrets used are complex enough to be unguessable, as cookies could be forged if a malicious attacker is aware of their correct value.&lt;/p&gt;

&lt;p&gt;It is recommended that all created cookies are stored in a &lt;code&gt;*.server.ts&lt;/code&gt; file and then imported into your route modules. Files with the &lt;code&gt;.server.ts&lt;/code&gt; suffix are never sent to the client.&lt;/p&gt;

&lt;h3&gt;
  
  
  Session Storage
&lt;/h3&gt;

&lt;p&gt;There are a variety of session storage strategies available in Remix, as well as the ability to create a custom one:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://remix.run/docs/zh/main/utils/sessions?ref=blog.arcjet.com#createsessionstorage" rel="noopener noreferrer"&gt;&lt;u&gt;&lt;code&gt;createSessionStorage()&lt;/code&gt;&lt;/u&gt;&lt;/a&gt; - Creates a custom strategy. Requires a cookie and CRUD methods to manage session data.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://remix.run/docs/zh/main/utils/sessions?ref=blog.arcjet.com#createcookiesessionstorage" rel="noopener noreferrer"&gt;&lt;u&gt;&lt;code&gt;createCookieSessionStorage()&lt;/code&gt;&lt;/u&gt;&lt;/a&gt; - This strategy stores all session data into the session cookie. With this method, additional backend services or databases are not required. A major drawback to this strategy is that every time the session changes due to a loader or action, it must be committed.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://remix.run/docs/zh/main/utils/sessions?ref=blog.arcjet.com#createfilesessionstorage-node" rel="noopener noreferrer"&gt;&lt;u&gt;&lt;code&gt;createFileSessionStorage()&lt;/code&gt;&lt;/u&gt;&lt;/a&gt; - Used with persistent file backed sessions.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://remix.run/docs/zh/main/utils/sessions?ref=blog.arcjet.com#createworkerskvsessionstorage-cloudflare-workers" rel="noopener noreferrer"&gt;&lt;u&gt;&lt;code&gt;createWorkersKVSessionStorage()&lt;/code&gt;&lt;/u&gt;&lt;/a&gt; - Used with Cloudflare Workers KV backed sessions.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://remix.run/docs/zh/main/utils/sessions?ref=blog.arcjet.com#createarctablesessionstorage-architect-amazon-dynamodb" rel="noopener noreferrer"&gt;&lt;u&gt;&lt;code&gt;createArcTableSessionStorage()&lt;/code&gt;&lt;/u&gt;&lt;/a&gt; - Used with Amazon DynamoDB backed sessions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Define your session storage object in &lt;code&gt;app/session.ts&lt;/code&gt; to act as a centralized location for routes to access session data.&lt;/p&gt;

&lt;h3&gt;
  
  
  Using Cookies
&lt;/h3&gt;

&lt;p&gt;Once a cookie is generated the &lt;code&gt;.parse()&lt;/code&gt; method can be used to extract and return its value. Then conditionals can be defined in your &lt;code&gt;loader&lt;/code&gt; and &lt;code&gt;action&lt;/code&gt; functions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;// app/routes/_index.tsx
// Remember the user's preference for banner visibility using a cookie.

import { json, redirect } from "@remix-run/node";
import { useLoaderData, Form } from "@remix-run/react";
import { userPrefs } from "~/cookies.server";

// Return showBanner value from userPrefs cookie.
export async function loader({ request }) {
  const cookieHeader = request.headers.get("Cookie");
  const cookie = (await userPrefs.parse(cookieHeader)) || {};
  return json({ showBanner: cookie.showBanner });
}

// Update showBanner value to false if bannerVisibility is set to hidden.
export async function action({ request }) {
  const cookieHeader = request.headers.get("Cookie");
  const cookie = (await userPrefs.parse(cookieHeader)) || {};
  const bodyParams = await request.formData();

  if (bodyParams.get("bannerVisibility") === "hidden") {
    cookie.showBanner = false;
  }

  // Serialize updated cookie and set in redirect response.
  return redirect("/", {
    headers: {
      "Set-Cookie": await userPrefs.serialize(cookie),
    },
  });
}

export default function Home() {
  const { showBanner } = useLoaderData();

  // Form to hide the banner.
  return (
    &amp;lt;div&amp;gt;
      {showBanner &amp;amp;&amp;amp; (
        &amp;lt;div&amp;gt;
          &amp;lt;Form method="post"&amp;gt;
            &amp;lt;input type="hidden" name="bannerVisibility" value="hidden" /&amp;gt;
            &amp;lt;button type="submit"&amp;gt;Hide&amp;lt;/button&amp;gt;
          &amp;lt;/Form&amp;gt;
        &amp;lt;/div&amp;gt;
      )}
      &amp;lt;h1&amp;gt;Welcome!&amp;lt;/h1&amp;gt;
    &amp;lt;/div&amp;gt;
  );
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Be aware that any data returned from a loader will be exposed to the client, even if it is not rendered in a component. So treat these with the same care as you would give a public API endpoint.&lt;/p&gt;

&lt;h3&gt;
  
  
  External Solutions
&lt;/h3&gt;

&lt;p&gt;Alternatively, third-party authentication libraries such as &lt;a href="https://clerk.com/docs/quickstarts/remix?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Clerk&lt;/u&gt;&lt;/a&gt; and &lt;a href="https://www.better-auth.com/docs/integrations/remix?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Better Auth&lt;/u&gt;&lt;/a&gt; provide an easy way to integrate robust identity management.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Cross-Site Request Forgery
&lt;/h2&gt;

&lt;p&gt;Cross-Site Request Forgery (CSRF) attacks trick victims into submitting requests using their authenticated session. Any functionality that can be executed by authenticated users can be exploited. This includes functionality such as updating the account password, changing the email address associated with the account, deleting the account, etc.&lt;/p&gt;

&lt;p&gt;The majority of browsers have built-in protection against CSRF attacks as they support the &lt;a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Set-Cookie?ref=blog.arcjet.com#strict" rel="noopener noreferrer"&gt;&lt;u&gt;SameSite&lt;/u&gt;&lt;/a&gt; cookie attribute which restricts the inclusion of cookies in requests initiated by another website. Remix cookies are set to use &lt;code&gt;SameSite=Lax&lt;/code&gt; by default.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;WARNING&lt;/strong&gt; : It is important that logout functions or any mutations are performed in an &lt;code&gt;action&lt;/code&gt; and not a &lt;code&gt;loader&lt;/code&gt; or you will put users at risk of a CSRF attack. View the &lt;a href="https://remix.run/docs/en/main/utils/sessions?ref=blog.arcjet.com#using-sessions" rel="noopener noreferrer"&gt;&lt;u&gt;official documentation&lt;/u&gt;&lt;/a&gt; for more information.&lt;/p&gt;

&lt;h3&gt;
  
  
  remix-utils
&lt;/h3&gt;

&lt;p&gt;To add an extra layer of protection against CSRF attacks, you can use the &lt;a href="https://github.com/sergiodxa/remix-utils?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;remix-utils&lt;/u&gt;&lt;/a&gt; library. This library also offers other security features to compensate for the lack of native security in Remix, including safe redirects and &lt;a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;CORS&lt;/u&gt;&lt;/a&gt; implementation.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Security Headers
&lt;/h2&gt;

&lt;p&gt;Each route in Remix can set its own HTTP headers with the &lt;code&gt;HeadersFunction&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import type { HeadersFunction } from "@remix-run/node";

export const headers: HeadersFunction = () =&amp;gt; ({
  "header-name-a": "value",
  "header-name-b": "value"
});
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;However, the headers returned depend on the nesting level of the route:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When a route defines headers, only those headers are used by default.&lt;/li&gt;
&lt;li&gt;Parent headers are only included if they are explicitly merged. If a child and parent share the same header, the value of the child's header overwrites the parent.&lt;/li&gt;
&lt;li&gt;If a child's &lt;code&gt;loader&lt;/code&gt; function throws an error and that error is handled by a parent route, then the parent's headers are used.&lt;/li&gt;
&lt;li&gt;When a route doesn't define headers, Remix will traverse up the route hierarchy one parent at a time until it finds headers.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is important to be aware of as you may cache content for longer than intended with a more aggressive child route &lt;code&gt;Cache-Control&lt;/code&gt; header. You can avoid any overwriting by only defining headers in your childless routes. &lt;/p&gt;

&lt;p&gt;As such, security headers in Remix should be set using the  &lt;code&gt;entry.server.ts&lt;/code&gt; file so they can be applied to every request. For data requests, you must use the &lt;a href="https://remix.run/docs/en/main/file-conventions/entry.server?ref=blog.arcjet.com#handledatarequest" rel="noopener noreferrer"&gt;&lt;u&gt;&lt;code&gt;handleDataRequest&lt;/code&gt;&lt;/u&gt;&lt;/a&gt; function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;export function handleDataRequest(
  response: Response,
  {
    request,
    params,
    context,
  }: LoaderFunctionArgs | ActionFunctionArgs
) {
  response.headers.set("X-Custom-Header", "value");
  return response;
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With this knowledge, there are several &lt;a href="https://www.darkrelay.com/post/http-security-headers?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;security headers and attributes&lt;/a&gt; that you should use to protect your application.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;// app/entry.server.tsx

import type { AppLoadContext, EntryContext } from "@remix-run/node";
import { RemixServer } from "@remix-run/react";
import { renderToString } from "react-dom/server";

export default function handleRequest(
  request: Request,
  responseStatusCode: number,
  responseHeaders: Headers,
  remixContext: EntryContext,
  loadContext: AppLoadContext
) {
  const markup = renderToString(
    &amp;lt;RemixServer context={remixContext} url={request.url} /&amp;gt;
  );

  // Set security headers.
  // Interpret response as HTML.
  responseHeaders.set("Content-Type", "text/html");
  // Prevent clickjacking attacks.
  responseHeaders.set("X-Frame-Options", "SAMEORIGIN");
  // Enforces Content-Type.
  responseHeaders.set("X-Content-Type-Options", "nosniff");
  // Only include path for same-origin requests.
  responseHeaders.set("Referrer-Policy", "strict-origin-when-cross-origin");
  // Only allow same-origin resources.
  responseHeaders.set(
    "Content-Security-Policy",
    "default-src 'self'; script-src 'self'; style-src 'self';"
  );
  // Only use HTTPS.
  responseHeaders.set(
    "Strict-Transport-Security",
    "max-age=31536000; includeSubDomains"
  );
  // User device protection.
  responseHeaders.set("Permissions-Policy", "camera=(), microphone=(), geolocation=()");
  // Block cross-origin window access.
  responseHeaders.set("Cross-Origin-Opener-Policy", "same-origin");
  // Block cross-origin resource embedding.
  responseHeaders.set("Cross-Origin-Resource-Policy", "same-origin");
  // Enable origin-keyed agent clustering.
  responseHeaders.set("Origin-Agent-Cluster", "?1");
  // Prevent browsers from DNS prefetching.
  responseHeaders.set("X-DNS-Prefetch-Control", "off");
  // Block Adobe from loading domain data.
  responseHeaders.set("X-Permitted-Cross-Domain-Policies", "none");

  return new Response("&amp;lt;!DOCTYPE html&amp;gt;" + markup, {
    headers: responseHeaders,
    status: responseStatusCode,
  });
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In addition to setting them manually, you can also use &lt;a href="http://helmet.js/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Helmet.js&lt;/u&gt;&lt;/a&gt; as &lt;a href="https://remix.run/docs/en/main/start/quickstart?ref=blog.arcjet.com#bring-your-own-server" rel="noopener noreferrer"&gt;&lt;u&gt;Remix can be integrated with Express&lt;/u&gt;&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import express from "express";
import helmet from "helmet";
import { createRequestHandler } from "@remix-run/express";

const app = express();

// Use Helmet to set security headers.
app.use(helmet());

// Serve static files from the public directory.
app.use(express.static("public"));

// Handle all requests with Remix.
app.all(
  "*",
  createRequestHandler({
    getLoadContext() {
      // Whatever you return here will be passed as `context` to your loaders.
    },
  })
);

const port = process.env.PORT || 3000;
app.listen(port, () =&amp;gt; {
  console.log(`Server is listening on port ${port}`);
});
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  7. Validation
&lt;/h2&gt;

&lt;p&gt;React, and by extension Remix, automatically escapes strings used in dynamic content by HTML encoding characters used in injection attacks. This provides a base level of protection unless the input is used within an anchor tag's &lt;code&gt;href&lt;/code&gt; attribute, &lt;code&gt;style&lt;/code&gt; attributes, and &lt;code&gt;dangerouslySetInnerHTML&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Additional bypasses have also been found when parsing &lt;a href="https://medium.com/dailyjs/exploiting-script-injection-flaws-in-reactjs-883fb1fe36c1?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;JSON-objects&lt;/u&gt;&lt;/a&gt; and &lt;a href="https://medium.com/javascript-security/avoiding-xss-via-markdown-in-react-91665479900?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;converting Markdown&lt;/u&gt;&lt;/a&gt; to HTML.&lt;/p&gt;

&lt;p&gt;At Arcjet, we recommend using &lt;a href="https://zod.dev/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Zod&lt;/u&gt;&lt;/a&gt; as it is designed to work seamlessly with TypeScript, allowing you to declare your schema once and use it for both static checking and runtime validation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;// app/schemas/auth.ts

import { z } from 'zod';
import isAlphanumeric from 'validator/lib/isAlphanumeric';

export const loginSchema = z.object({
  username: z.string()
    .min(3, "Username must be at least 3 characters long.")
    .max(20, "Username cannot exceed 20 characters.")
    .refine((val) =&amp;gt; isAlphanumeric(val, "en-US"), // Sets language locale.
      "Username can only contain letters and numbers."),

  email: z.string()
    .min(1, "Email is required.")
    .email("Please enter a valid email address."),

  password: z.string()
    .min(8, "Password must be at least 8 characters long.")
    .regex(/^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)/, 
      "Password must contain at least one uppercase letter, one lowercase letter, and one number."),

  confirmPassword: z.string()
    .min(8, "Please confirm your password."),
}).refine(
  (data) =&amp;gt; data.password === data.confirmPassword,
  {
    message: "Passwords do not match.",
    path: ["confirmPassword"], // Associates error with field.
  }
);

export type LoginInput = z.infer&amp;lt;typeof loginSchema&amp;gt;;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This schema can then be imported and used in an &lt;code&gt;action&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;// app/routes/login.tsx

// ...imports

export async function action({ request }: ActionFunctionArgs) {
  const formData = await request.formData();
  const data = Object.fromEntries(formData);

  const result = loginSchema.safeParse(data);
  if (!result.success) {
    return Response.json(
      { errors: result.error.flatten().fieldErrors },
      { status: 400 }
    );
  }

  const { email, password } = result.data;
  const user = await login(email, password);
  if (!user) {
    return Response.json(
      { errors: { form: "Invalid credentials." } },
      { status: 401 }
    );
  }

  // Valid credentials create a session and redirects user to dashboard.
  return createUserSession(user.id, "/dashboard");
}

export default function Login() {
  const actionData = useActionData&amp;lt;typeof action&amp;gt;();
  const navigation = useNavigation();
  const { errors, validate } = useFormValidation(loginSchema);
  // Route action being called due to non GET form submission.
  const isSubmitting = navigation.state === "submitting";
  // ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the form, any fields that fail validation will display an error that is sent from the server:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;// app/routes/login.tsx

// Within &amp;lt;Form method="post"&amp;gt;.
&amp;lt;div&amp;gt;
  &amp;lt;input name="email" type="email" required /&amp;gt;
  {actionData?.errors?.email &amp;amp;&amp;amp; (
    &amp;lt;div&amp;gt;{actionData.errors.email[0]}&amp;lt;/div&amp;gt;
  )}
&amp;lt;/div&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The same Zod schema can also be used for client-side validation using a hook to provide immediate client-side validation before the form submission reaches the server:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;// app/hooks/useFormValidation.ts

import { useState } from "react";
import type { z } from "zod";

export function useFormValidation&amp;lt;T extends z.ZodType&amp;gt;(schema: T) {
  const [errors, setErrors] = useState&amp;lt;z.inferFlattenedErrors&amp;lt;T&amp;gt;['fieldErrors']&amp;gt;({});

  const validate = (formData: FormData) =&amp;gt; {
    const data = Object.fromEntries(formData);
    const result = schema.safeParse(data);

    if (!result.success) {
      setErrors(result.error.flatten().fieldErrors);
      return false;
    }

    setErrors({});
    return true;
  };

  return { errors, validate };
}

// app/routes/login.tsx

// Added to the Login() function.
const handleSubmit = (event: React.FormEvent&amp;lt;HTMLFormElement&amp;gt;) =&amp;gt; {
  const form = event.currentTarget;
  const formData = new FormData(form);

  if (!validate(formData)) {
    event.preventDefault();
  }
};

// app/routes/login.tsx

&amp;lt;Form method="post" onSubmit={handleSubmit}&amp;gt;
  &amp;lt;div&amp;gt;
    &amp;lt;input name="email" type="email" required /&amp;gt;
    // Client-side OR server-side errors.
    {(errors.email || actionData?.errors?.email) &amp;amp;&amp;amp; (
      &amp;lt;div&amp;gt;
        {errors.email?.[0] || actionData?.errors?.email?.[0]}
      &amp;lt;/div&amp;gt;
    )}
  &amp;lt;/div&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Validation can also be performed on &lt;code&gt;GET&lt;/code&gt; query parameters and cookies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import type { LoaderFunctionArgs } from "@remix-run/node";
import { z } from "zod";
import { getSession } from "~/session.server";

// Example query: ?page=2&amp;amp;sort=desc
const querySchema = z.object({
  // .coerce converts page parameter string of 2 to number.
  // page must be &amp;gt; 1.
  page: z.coerce.number().min(1).default(1),
  // sort must be either "asc" or "desc", defaults to "asc" if missing.
  sort: z.enum(["asc", "desc"]).default("asc"),
});

// userId must be non-empty string.
const sessionSchema = z.object({ userId: z.string().min(1) });

export async function loader({ request }: LoaderFunctionArgs) {
  try {
    const url = new URL(request.url);
    const queryParams = Object.fromEntries(url.searchParams);
    const validatedQuery = querySchema.parse(queryParams);

    const session = await getSession(request.headers.get("Cookie"));
    const sessionData = { userId: session.get("userId") };
    const validatedSession = sessionSchema.parse(sessionData);

    return Response.json({ query: validatedQuery });
  } catch (error) {
    if (error instanceof z.ZodError) {
      return Response.json(
        { error: error.flatten().fieldErrors },
        { status: 400 }
      );
    }
    throw error;
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  8. File Uploads
&lt;/h2&gt;

&lt;p&gt;If your application allows users to upload files, it is essential to implement security measures to mitigate the risk of attackers uploading and executing malicious code in the context of your application.&lt;/p&gt;

&lt;p&gt;In Remix, you can use the &lt;a href="https://remix.run/docs/en/main/utils/unstable-create-file-upload-handler?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;&lt;code&gt;unstable_createFileUploadHandler&lt;/code&gt;&lt;/u&gt;&lt;/a&gt; and &lt;a href="https://remix.run/docs/en/main/utils/unstable-create-memory-upload-handler?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;&lt;code&gt;unstable_createMemoryUploadHandler&lt;/code&gt;&lt;/u&gt;&lt;/a&gt; utilities to filter uploads based on file characteristics. However, configuring restrictions to meet your needs and be secure can be overtly complex.&lt;/p&gt;

&lt;p&gt;Instead of dealing with securing upload functionality yourself, consider using services such as &lt;a href="https://docs.uploadthing.com/getting-started/remix?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;UploadThing&lt;/u&gt;&lt;/a&gt; or sending files directly to a cloud object storage service. Uploading files to disk on a server you operate is bad practice.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;These general guidelines will help you create a secure application for both you and your users. However, the specific configurations and implementations will need to be tailored to meet the needs of your specific application. There is no one-size-fits-all approach to security.&lt;/p&gt;

</description>
      <category>remix</category>
      <category>security</category>
      <category>javascript</category>
    </item>
    <item>
      <title>Bot spoofing and how to detect it with Arcjet</title>
      <dc:creator>David Mytton</dc:creator>
      <pubDate>Fri, 20 Dec 2024 14:01:10 +0000</pubDate>
      <link>https://dev.to/arcjet/bot-spoofing-and-how-to-detect-it-with-arcjet-1oeo</link>
      <guid>https://dev.to/arcjet/bot-spoofing-and-how-to-detect-it-with-arcjet-1oeo</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyu4wairf61yreuwk0riu.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyu4wairf61yreuwk0riu.jpg" alt="Bot spoofing and how to detect it with Arcjet" width="800" height="448"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://en.wikipedia.org/wiki/User-Agent_header?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;code&gt;User-Agent&lt;/code&gt; header&lt;/a&gt; is the name badge for web requests. Although it's&lt;a href="https://www.chromium.org/updates/ua-reduction/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;been deprecated by some browsers&lt;/a&gt;, it's still sent by well behaving clients and is commonly used to identify automated clients. It's &lt;a href="https://datatracker.ietf.org/doc/html/rfc9309?ref=blog.arcjet.com#name-the-user-agent-line" rel="noopener noreferrer"&gt;what &lt;code&gt;robots.txt&lt;/code&gt; is based on&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;But just like a name badge, clients can write whatever they like in the &lt;code&gt;User-Agent&lt;/code&gt; header. This is a problem if it's the only thing you use to set up rules for managing bots, and is one reason why Arcjet uses other fingerprinting techniques like IP address analysis as part of our bot detection features.&lt;/p&gt;

&lt;p&gt;Now we're adding more detailed verification options to developers where every request will be checked behind the scenes using published IP and reverse DNS data for common bots.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://blog.arcjet.com/bot-detection-isnt-perfect/" rel="noopener noreferrer"&gt;Bot detection is never perfect&lt;/a&gt;, but this improvement helps protect against spoofed bots where clients pretend to be someone else. For example, we can detect if a client is really Googlebot by checking if the request IP is within &lt;a href="https://developers.google.com/search/docs/crawling-indexing/verifying-googlebot?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;Google’s published IP ranges&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The analysis happens automatically for all Arcjet Pro plan users. If we detect a spoofed bot (or successfully verify a bot), additional metadata will be added to the response decision so you can decide how to handle it.&lt;/p&gt;

&lt;p&gt;For example, to check for spoofed bots:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;if (decision.reason.isBot() &amp;amp;&amp;amp; decision.reason.isSpoofed()) {
  console.log("Detected spoofed bot", decision.reason.spoofed);
  // Return a 403 or similar response
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And to confirm whether a bot has been verified:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;if (decision.reason.isBot() &amp;amp;&amp;amp; decision.reason.isVerified()) {
  console.log("Verified bot", decision.reason.verified);
  // Allow the request
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Right now we support verification for Google, Bing, ChatGPT, and Datadog. &lt;a href="https://github.com/arcjet/well-known-bots?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;Our bot list&lt;/a&gt; is open source and we'll be adding more over time.&lt;/p&gt;

&lt;p&gt;So if you're having trouble with bot traffic, try out &lt;a href="https://docs.arcjet.com/bot-protection/reference?ref=blog.arcjet.com#bot-verification" rel="noopener noreferrer"&gt;verified bot detection in Arcjet&lt;/a&gt; by &lt;a href="https://app.arcjet.com/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;signing up for free today&lt;/a&gt;. When you're ready to go to production, &lt;a href="//mailto:sales@arcjet.com"&gt;reach out&lt;/a&gt; to upgrade to Pro (&lt;a href="https://arcjet.com/pricing?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;pricing&lt;/a&gt;).&lt;/p&gt;

</description>
      <category>changelog</category>
      <category>botdetection</category>
    </item>
    <item>
      <title>The Wasm Component Model and idiomatic codegen</title>
      <dc:creator>David Mytton</dc:creator>
      <pubDate>Tue, 17 Dec 2024 11:08:32 +0000</pubDate>
      <link>https://dev.to/arcjet/the-wasm-component-model-and-idiomatic-codegen-54ml</link>
      <guid>https://dev.to/arcjet/the-wasm-component-model-and-idiomatic-codegen-54ml</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flbr78c6q1g42jhsd96fa.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flbr78c6q1g42jhsd96fa.jpg" alt="The Wasm Component Model and idiomatic codegen" width="800" height="448"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://arcjet.com/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;Arcjet&lt;/u&gt;&lt;/a&gt; bundles WebAssembly with our security as code SDK. This helps developers implement common security functionality like PII detection and bot detection directly in their code. Much of the logic is embedded in Wasm, which gives us a secure sandbox with near-native performance and is part of our philosophy around &lt;a href="https://blog.arcjet.com/how-we-achieve-our-25ms-p95-response-time-sla/" rel="noopener noreferrer"&gt;&lt;u&gt;local-first security&lt;/u&gt;&lt;/a&gt;&lt;u&gt;.&lt;/u&gt;&lt;/p&gt;

&lt;p&gt;The ability to run the same code across platforms is also helpful as we build out support from JavaScript to other tech stacks, but it requires an important abstraction to translate between languages (our Wasm is compiled from Rust).&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://component-model.bytecodealliance.org/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;WebAssembly Component Model&lt;/u&gt;&lt;/a&gt; is the powerful construct which enables this, but a construct can only be as good as the implementations and tooling surrounding it. For the Component Model, this is most evident in the code generation for Hosts (environments that execute WebAssembly Component Model) and Guests (WebAssembly modules written in any language and compiled to the Component Model; Rust in our case).&lt;/p&gt;

&lt;p&gt;The Component Model defines a language for communication between Hosts and Guests which is primarily composed of types, functions, imports and exports. It tries to define a broad language, but some types, such as variants, tuples, and resources, might not exist in a given general purpose programming language.&lt;/p&gt;

&lt;p&gt;When a tool tries to generate code for one of these languages, the authors often need to get creative to map Component Model types to that general purpose language. For example, we use &lt;a href="https://github.com/bytecodealliance/jco?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;jco&lt;/u&gt;&lt;/a&gt; for generating JS bindings and this implements variants using a JavaScript object in the shape of &lt;code&gt;{ tag: string, value: string }&lt;/code&gt;. It even has a special case for the &lt;code&gt;result&amp;lt;_, _&amp;gt;&lt;/code&gt; type where the error variant is turned into an &lt;code&gt;Error&lt;/code&gt; and thrown.&lt;/p&gt;

&lt;p&gt;This post explores how the Wasm Component Model enables cross-language integrations, the complexities of code generation for Hosts and Guests, and the trade-offs we make to achieve idiomatic code in languages like Go.&lt;/p&gt;

&lt;h2&gt;
  
  
  Host code generation for Go
&lt;/h2&gt;

&lt;p&gt;At Arcjet, we have had to build a tool to generate code for Hosts written in the Go programming language. Although our SDK attempts to analyze everything locally, that is not always possible and so we have &lt;a href="https://blog.arcjet.com/how-we-achieve-our-25ms-p95-response-time-sla/" rel="noopener noreferrer"&gt;&lt;u&gt;an API written in Go&lt;/u&gt;&lt;/a&gt; which augments local decisions with additional metadata.&lt;/p&gt;

&lt;p&gt;Go has a very minimal syntax and type system by design. They didn’t even have generics until very recently and they still have significant limitations. This makes codegen from the Component Model to Go complex in various ways. &lt;/p&gt;

&lt;p&gt;For example, we could generate a &lt;code&gt;result&amp;lt;_, _&amp;gt;&lt;/code&gt; as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;type Result[V any] struct {
    value V
    err error
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;However, this limits the type that can be provided in the error position. So we’d need to codegen it as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;type Result[V any, E any] struct {
    value V
    err E
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This works but becomes cumbersome to use with other idiomatic Go, which often uses the &lt;code&gt;val, err := doSomething()&lt;/code&gt; convention to indicate the same semantics as the &lt;code&gt;Result&lt;/code&gt; type we’ve defined above. &lt;/p&gt;

&lt;p&gt;Additionally, constructing this &lt;code&gt;Result&lt;/code&gt; is cumbersome: &lt;code&gt;Result[int, string]{value: 1, err: ""}&lt;/code&gt;. Instead of providing the &lt;code&gt;Result&lt;/code&gt; type, we probably want to match idiomatic patterns so Go users feel natural consuming our generated bindings.&lt;/p&gt;

&lt;h2&gt;
  
  
  Idiomatic vs Direct Mapping
&lt;/h2&gt;

&lt;p&gt;Code can be generated to feel more natural to the language or it can be a more direct mapping to the Component Model types. Neither option fits 100% of use cases so it is up to the tool authors to decide which makes the most sense.&lt;/p&gt;

&lt;p&gt;For the Arcjet tooling, we chose the idiomatic Go approach for &lt;code&gt;option&amp;lt;_&amp;gt;&lt;/code&gt; and &lt;code&gt;result&amp;lt;_, _&amp;gt;&lt;/code&gt; types, which map to &lt;code&gt;val, ok := doSomething()&lt;/code&gt; and &lt;code&gt;val, err := doSomething()&lt;/code&gt; respectively. For variants, we create an interface that each variant needs to implement, such as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;type BotConfig interface {
    isBotConfig()
}

func (AllowedBotConfig) isBotConfig() {}

func (DeniedBotConfig) isBotConfig() {}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This strikes a good balance between type safety and unnecessary wrapping. Of course, there are situations where the wrapping is required, but those can be handled as edge cases.&lt;/p&gt;

&lt;p&gt;Developers may struggle with non-idiomatic patterns, leading to verbose, less maintainable code. Using established conventions makes the code feel more familiar, but does require some additional effort to implement.&lt;/p&gt;

&lt;p&gt;We decided to take the idiomatic path to minimize friction and make it easier for our team so we know what to expect when moving around the codebase.&lt;/p&gt;

&lt;h2&gt;
  
  
  Calling conventions
&lt;/h2&gt;

&lt;p&gt;One of the biggest decisions tooling authors need to make is the calling convention of the bindings. This includes deciding how/when imports will be compiled, if the Wasm module will be compiled during setup or instantiation, and cleanup.&lt;/p&gt;

&lt;p&gt;In the Arcjet codebase, we chose the factory/instance pattern to optimize performance. Compiling a WebAssembly module is expensive, so we do it once in the &lt;code&gt;NewBotFactory()&lt;/code&gt; constructor. Subsequent &lt;code&gt;Instantiate()&lt;/code&gt; calls are then fast and cheap, allowing for high throughput in production workloads.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;func NewBotFactory(
    ctx context.Context,
) (*BotFactory, error) {
    runtime := wazero.NewRuntime(ctx)

    // ... Imports are compiled here if there are any

    // Compiling the module takes a LONG time, so we want to do it once and hold
    // onto it with the Runtime
    module, err := runtime.CompileModule(ctx, wasmFileBot)
    if err != nil {
            return nil, err
    }

    return &amp;amp;BotFactory{runtime, module}, nil
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Consumers construct this &lt;code&gt;BotFactory&lt;/code&gt; once by calling &lt;code&gt;NewBotFactory(ctx)&lt;/code&gt; and use it to create multiple instances via the &lt;code&gt;Instantiate&lt;/code&gt; method.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;func (f *BotFactory) Instantiate(ctx context.Context) (*BotInstance, error) {
    if module, err := f.runtime.InstantiateModule(ctx, f.module, wazero.NewModuleConfig()); err != nil {
            return nil, err
    } else {
            return &amp;amp;BotInstance{module}, nil
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Instantiation is very fast if the module has already been compiled, like we do with &lt;code&gt;runtime.CompileModule()&lt;/code&gt; when constructing the factory.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;BotInstance&lt;/code&gt; has functions which were exported from the Component Model definition.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;func (i *BotInstance) Detect(
    ctx context.Context,
    request string,
    options BotConfig,
) (BotResult, error) {
   // ... Lots of generated code for binding to Wazero
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Generally, after using a &lt;code&gt;BotInstance&lt;/code&gt;, we want to clean it up to ensure we’re not leaking memory. For this we provide the &lt;code&gt;Close&lt;/code&gt; function.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;func (i *BotInstance) Close(ctx context.Context) error {
    if err := i.module.Close(ctx); err != nil {
            return err
    }

    return nil
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you want to clean up the entire &lt;code&gt;BotFactory&lt;/code&gt;, that can be closed too:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;func (f *BotFactory) Close(ctx context.Context) {
    f.runtime.Close(ctx)
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can put all these APIs together to call functions on this WebAssembly module:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ctx := context.Background()
factory, err := NewBotFactory(ctx)
if err != nil {
  panic(err)
}
defer factory.Close(ctx)

instance, err := factory.Instantiate(ctx)
if err != nil {
    panic(err)
}
defer instance.Close(ctx)

result, err := instance.Detect(
  ctx,
  request,
  AllowedBotConfig{
    Entities: []BotEntity{"GOOGLE_CRAWLER"},
      SkipCustomDetect: true,
    },
)
if err != nil {
    panic(err)
}
fmt.Printf("%+v", result)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This pattern of factory and instance construction takes more code to use, but it was chosen to achieve as much performance as possible in the hot paths of the Arcjet service.&lt;/p&gt;

&lt;p&gt;By front-loading &lt;a href="https://wazero.io/docs/how_the_optimizing_compiler_works/?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;the compilation cost&lt;/u&gt;&lt;/a&gt;, we ensure that in the hot paths of the Arcjet service - where latency matters most - request handling is as efficient as possible. This trade-off does add some complexity to initialization code, but it pays off with substantially lower overhead per request - &lt;a href="https://blog.arcjet.com/lessons-from-running-webassembly-in-production-with-go-wazero/" rel="noopener noreferrer"&gt;&lt;u&gt;see our discussion of the tradeoffs&lt;/u&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Trade-offs
&lt;/h2&gt;

&lt;p&gt;Any time we need to integrate two or more languages, it is fraught with trade-offs that need to be made—whether &lt;a href="https://blog.arcjet.com/calling-rust-ffi-libraries-from-go/" rel="noopener noreferrer"&gt;&lt;u&gt;using native FFI&lt;/u&gt;&lt;/a&gt; or the Component Model. &lt;/p&gt;

&lt;p&gt;This post discussed a few of the challenges we’ve encountered at Arcjet and the reasoning behind our decisions. If we all build on the same set of primitives, such as the Component Model and WIT, we can all leverage the same set of high-quality primitives, such as &lt;a href="https://crates.io/crates/wit-bindgen?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;wit-bindgen&lt;/u&gt;&lt;/a&gt; or &lt;a href="https://crates.io/crates/wit-component?ref=blog.arcjet.com" rel="noopener noreferrer"&gt;&lt;u&gt;wit-component&lt;/u&gt;&lt;/a&gt;, and build tooling to suit every use case. This is why working towards standards helps everyone. &lt;/p&gt;

&lt;p&gt;The WebAssembly Component Model offers a powerful abstraction for cross-language integration, but translating its types into languages like Go introduces subtle design challenges. By choosing idiomatic patterns and selectively optimizing for performance - such as using a factory/instance pattern - we can provide a natural developer experience while maintaining efficiency. &lt;/p&gt;

&lt;p&gt;As tooling around the Component Model evolves, we can look forward to more refined codegen approaches that further simplify these integrations.&lt;/p&gt;

</description>
      <category>webassembly</category>
      <category>go</category>
      <category>javascript</category>
      <category>engineering</category>
    </item>
  </channel>
</rss>
