Generating a robots.txt in Next.js

#javascript #nextjs #react #webdev

A robots.txt file tells search engine crawlers which pages or files the crawler can or can’t request from your site.

Next.js does not generate a robots.txt out-of-the-box.

You can manually create one in the /public directory, but by doing so it will be used in all environments where you deploy your Next.js website — which might be problematic if you want to avoid indexing preview/testing environments.

To generate a robots.txt conditionally, based on the current environment, you can either generate it on the server side or through a build script.

Here are the two options in detail.

Rendering a robots.txt from a Next.js page

This is probably the “proper” Next.js way of handling this use case.

Just create a new page in pages/robots.txt that dynamically returns the robots.txt content and Next.js will take care of making it available on the right path:

import React from 'react';

const crawlableRobotsTxt = `User-agent: *\nAllow: /`;

const uncrawlableRobotsTxt = `User-agent: *\nDisallow: /`;

class Robots extends React.Component {
  public static async getInitialProps({ res }) {
    res.setHeader('Content-Type', 'text/plain');
    // Return a non-crawlable robots.txt in non-production environments
    res.write(process.env.VERCEL_ENV === "production"
      ? crawlableRobotsTxt
      : uncrawlableRobotsTxt
    );
    res.end();
  }
}

export default Robots;

One drawback of this approach is that using getInitialProps (or getServerSideProps) disables Automatic Static Optimization and doesn’t allow generating a static page (it works only on dynamic pages using Server Side Rendering).

Generating a robots.txt in the build process

Alternatively, we can generate the robots.txt directly in the build process with a small Node.js script (e.g.: scripts/generate-robots-txt.js):

const fs = require("fs");

const crawlableRobotsTxt = `User-agent: *\nAllow: /`;

const uncrawlableRobotsTxt = `User-agent: *\nDisallow: /`;

function generateRobotsTxt() {
  // Create a non-crawlable robots.txt in non-production environments
  const robotsTxt =
    process.env.VERCEL_ENV === "production"
      ? crawlableRobotsTxt
      : uncrawlableRobotsTxt;

  // Create robots.txt file
  fs.writeFileSync("public/robots.txt", robotsTxt);

  console.log(
    `Generated a ${
      process.env.VERCEL_ENV === "production" ? "crawlable" : "non-crawlable"
    } public/robots.txt`
  );
}

module.exports = generateRobotsTxt;

You can then run scripts/generate-robots-txt.js by invoking it in a prebuild script from your package.json:

{
  "scripts": {
    "prebuild": "scripts/generate-robots-txt",
    "build": "next build",
  },
}

Or by invoking it during the Webpack build step in next.config.js:

module.exports = {
  webpack(config, { isServer }) {
    if (isServer) {
      generateRobotsTxt();
    }
    return config;
  },
};

For what is worth, ☝ this is the approach I’m currently using in Remotebear (source code here).

You’ll probably want to add public/robots.txt to your .gitignore: since this is generated on each build, there’s no need to commit it anymore.

DEV Community

Generating a robots.txt in Next.js

Rendering a robots.txt from a Next.js page

Generating a robots.txt in the build process

Top comments (0)

Read next

Logging System with Proxy and Fetch

Create an Interactive Eraser Tool with HTML5 Canvas 🚀

Build a Single Page Application (SPA) Using HTML, CSS & JavaScript - No Frameworks Needed

Server-Side Rendering (SSR) vs. Client-Side Rendering (CSR) in Web Applications: A Complete Guide