DEV Community

Cesar Varela
Cesar Varela

Posted on • Originally published at cesarvarela.com

Improve your SEO by adding a sitemap to your Gatsby MDX blog

Our blog is getting better and better. Now it's time to add a sitemap.

Strictly speaking, it is not clear if a sitemap is required since Google's page rank algorithm is a black box. BUT Google recommends adding one so let's do it:

Adding gatsby-plugin-sitemap

First, we need to add the official Gatsby sitemap plugin:

npm i gatsby-plugin-sitemap
Enter fullscreen mode Exit fullscreen mode

or

yarn add gatsby-plugin-sitemap
Enter fullscreen mode Exit fullscreen mode

Next, as always, we have to add the newly installed plugin to our config file:

...
{
      resolve: `gatsby-plugin-sitemap`,
      options: {      
      },
...
Enter fullscreen mode Exit fullscreen mode

This will get the plugin loaded, but we need to specify its options because the defaults will do more harm than good otherwise.

1. Set the query option:

...
  query: `
        {
          site {
            siteMetadata {
              url
            }
          }
          allMdx(sort: {fields: frontmatter___date, order: DESC}, filter: {slug: {glob: "!*wip*"}}) {
            nodes {
              frontmatter {
                date(formatString: "YYYY-MM-DD")
              }
              slug
            }
          }
        }
      `,
...
Enter fullscreen mode Exit fullscreen mode

This option lets us get a list of all the pages we'll want to add to our site. We can add more entries manually too, which I'll do next.

2. Set the resolveSiteUrl

...
        resolveSiteUrl: ({ site: { siteMetadata: { url } } }) => url,
...
Enter fullscreen mode Exit fullscreen mode

The plugin uses this setting to set an absolute URL in the sitemap, and as you can see, it references the GraphQL result from the previous query setting.

3. Set the resolvePages option

        resolvePages: ({ allMdx: { nodes: mdxs } }) => {

          const posts = mdxs.map(mdx => {
            return {
              path: `/blog/${mdx.slug}`,
              lastmod: mdx.frontmatter.date,
              changefreq: 'weekly',
              priority: 0.7,
            }
          })

    // I manually add home and blog entries here

          const home = {
            path: '/',
            lastmod: posts[0].lastmod,
            changefreq: 'daily',
            priority: 0.3,
          }
          const blog = {
            path: '/blog',
            lastmod: posts[0].lastmod,
            changefreq: 'daily',
            priority: 0.3,
          }
          return [...posts, home, blog]
        },
Enter fullscreen mode Exit fullscreen mode

We build an array of entries that will form the sitemap.xml file. Because posts are added constantly, I'm making a list dynamically from the results of the previous GraphQL query. And then, because the Homepage and the Blog index page do not change their attributes often, I manually add them to the list of entries.'

4. Set the serialize option

...
        serialize: ({ path, lastmod, changefreq, priority }) => {
          return {
            url: path,
            lastmod,
            changefreq,
            priority,
          }
        },
...
Enter fullscreen mode Exit fullscreen mode

Finally, we need to map the array we built in the resolvePages option to the sitemap entries themselves. This will be only a passthrough of properties most of the time because resolvePages does most of the work.

For your copy/pasting pleasure, the final code looks like this:

 {
      resolve: `gatsby-plugin-sitemap`,
      options: {
        query: `
        {
          site {
            siteMetadata {
              url
            }
          }
          allMdx(sort: {fields: frontmatter___date, order: DESC}, filter: {slug: {glob: "!*wip*"}}) {
            nodes {
              frontmatter {
                date(formatString: "YYYY-MM-DD")
              }
              slug
            }
          }
        }
      `,
        resolveSiteUrl: ({ site: { siteMetadata: { url } } }) => url,
        resolvePages: ({ allMdx: { nodes: mdxs } }) => {

          const posts = mdxs.map(mdx => {
            return {
              path: `/blog/${mdx.slug}`,
              lastmod: mdx.frontmatter.date,
              changefreq: 'weekly',
              priority: 0.7,
            }
          })

          const home = {
            path: '/',
            lastmod: posts[0].lastmod,
            changefreq: 'daily',
            priority: 0.3,
          }

          const blog = {
            path: '/blog',
            lastmod: posts[0].lastmod,
            changefreq: 'daily',
            priority: 0.3,
          }

          return [...posts, home, blog]
        },
        serialize: ({ path, lastmod, changefreq, priority }) => {
          return {
            url: path,
            lastmod,
            changefreq,
            priority,
          }
        },
      },
    }

Enter fullscreen mode Exit fullscreen mode

5. View your sitemap

If you want to test your newly added sitemap, you have to run gatsby build (because the plugin only gets executed in the build phase) and then gatsby serve, which will run a web server usually at http://localhost:9000.

You can then navigate to localhost:9000/sitemap/sitemap-index.xml and it should look something like this:

<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://cesarvarela.com/sitemap/sitemap-0.xml</loc>
</sitemap>
</sitemapindex>
Enter fullscreen mode Exit fullscreen mode

There you can follow the first link where you will finally see all your blog entries:

...
<url>
<loc>https://cesarvarela.com/blog/add-microdata-to-your-gastby-mdx-blog</loc>
<lastmod>2022-10-08T00:00:00.000Z</lastmod>
<changefreq>weekly</changefreq>
<priority>0.7</priority>
</url>
<url>
<loc>https://cesarvarela.com/blog/fix-source-file-requires-different-compiler-version</loc>
<lastmod>2021-08-18T00:00:00.000Z</lastmod>
<changefreq>weekly</changefreq>
<priority>0.7</priority>
</url>
<url>
<loc>https://cesarvarela.com/blog/how-to-include-javascript-files-in-gatsby</loc>
<lastmod>2021-08-18T00:00:00.000Z</lastmod>
<changefreq>weekly</changefreq>
<priority>0.7</priority>
</url>
...
Enter fullscreen mode Exit fullscreen mode

Bonus: WTF are those Sitemap fields?

If you are new to sitemaps, it will feel a bit cryptic. Let's go over the most relevant options:

url

Must be the canonical URL of the page, in other words, the original source. Avoid having multiple entries on your sitemap pointing to the same page.

lastmod

As you probably guessed, this is the last modified date.
** If you mess up this value, Google will simply ignore it** so be careful, but don't worry much about it either since Google's crawler is smart enough to detect page changes all by itself.

priority

It is a value from 0.0 to 1. Lets you tell Google which pages are more critical relative to other pages on your site.

changefreq

specifies how often the contents of the page change, some possible values are always, hourly, daily, weekly.

*Again, these are hints, Google's crawler will do as it pleases, and you might as well have 1# rank with no sitemap. *

Top comments (0)