Steven Woodson

Posted on Feb 6, 2023 • Originally published at stevenwoodson.com on Dec 17, 2022

Pulling WordPress Content into Eleventy

#wordpress #eleventy #cms #headless

This is a follow up on my previous post Composable Architecture Powered by WordPress, focused specifically on how to pull content from WordPress via the built-in RESTful API into an Eleventy-based static site. If you still need to get WordPress set up and ready to act as a composable content source I’d recommend reading that one first.

Getting Started

There are some initial hurdles to pulling dynamic content into a static site generator like Eleventy, but the performance and portability of the end result is so worth it. I do believe – now that I’m a few posts in with this new setup – that this is going to be an easier authoring experience (WordPress GUI vs Markdown files) without sacrificing the user experience (more performant and portable static pages vs dynamic) I had curated previously.

Here’s an overview of the most critical parts transitioning from Markdown-based Eleventy collections to dynamic data pulled from WordPress , I go into further detail in dedicated sections below:

Pulling data using Fetch – This is the real bulk of the work, getting data dynamically from an outside source.
Computed Data rather than Collections – There’s some setup and changes needed to go from one content paradigm to the other. Including rebuilding the blog pages, updating the sitemap and meta, and some style modifications.
Triggering a new build on publish – Now that content is separated from the Eleventy codebase, we need a new way to trigger a rebuild when new content is published.

Pulling data using Fetch

To get all the benefits of a static site build, you need to collect and process as much content at build time as possible so that it can be served as static content. Sounds pretty straightforward, and it is when all your content is local files in your repo, but when you start loading dynamic data it gets a bit tricky.

By far the biggest paradigm shift is in moving from a collection of posts to computed data from a dynamic data source using a JavaScript data file. That necessitates pulling and processing all of the content at build time.

When pulling content from WordPress you need to be mindful of the 100 items per page limit imposed by the API. There are ways around this by resetting that limit but in my opinion that’s not necessary because we can accommodate in how we fetch the data instead. This makes our method of gathering all data more resilient to change, and we can reuse these concepts to pull content from other data sources that may not let you change the limits. That was the architectural decision behind getAllPosts and requestPosts in the code below, I’m using the former to coordinate pulling any number of pages of content in parallel using the latter method.

Finally, some code!

I’m going to start with the full source code of the JavaScript data file I’m using first, and then will explain each section/method within it afterwards and in the same order. This file is in my \site\_data directory saved as blogposts.js

const { AssetCache } = require("@11ty/eleventy-fetch");
const axios = require("axios");
const jsdom = require("jsdom");
const loadLanguages = require("prismjs/components/");
const Prism = require("prismjs");

const { JSDOM } = jsdom;
loadLanguages(["php"]);

// Config
const ITEMS_PER_REQUEST = 10;
const API_BASE = "https://mysite.com/wp-json/wp/v2/posts";

/**
 * Blog post API call by page
 *
 * @param {Int} page - Page number to fetch, defaults to 1
 * @return {Object} - Total, Pages, and full API data
 */
async function requestPosts(page = 1) {
  try {
    // https://developer.wordpress.org/rest-api/using-the-rest-api/pagination/
    const url = API_BASE;
    const params = {
      params: {
        page: page,
        per_page: ITEMS_PER_REQUEST,
        _embed: "wp:featuredmedia",
        order: "desc",
      },
    };
    const response = await axios.get(url, params);

    return {
      total: parseInt(response.headers["x-wp-total"], 10),
      pages: parseInt(response.headers["x-wp-totalpages"], 10),
      data: response.data,
    };
  } catch (err) {
    console.error("API not responding, no data returned", err);
    return {
      total: 0,
      pages: 0,
      data: [],
    };
  }
}

/**
 * Get all blog posts from the API
 * Use cached values if available, pull from API if not.
 *
 * @return {Array} - array of blog posts
 */
async function getAllPosts() {
  const cache = new AssetCache("blogposts");
  let requests = [];
  let apiData = [];

  if (cache.isCacheValid("2h")) {
    console.log("Using cached blogposts");
    return cache.getCachedValue();
  }

  // make first request and marge results with array
  const request = await requestPosts();
  console.log(
    "Using API blogposts, retrieving " +
      request.pages +
      " pages, " +
      request.total +
      " total posts."
  );
  apiData.push(...request.data);

  if (request.pages > 1) {
    // create additional requests
    for (let page = 2; page <= request.pages; page++) {
      const request = requestPosts(page);
      requests.push(request);
    }

    // resolve all additional requests in parallel
    const allResponses = await Promise.all(requests);
    allResponses.map((response) => {
      apiData.push(...response.data);
    });
  }

  // return data
  await cache.save(apiData, "json");
  return apiData;
}

/**
 * Clean up and convert the API response for our needs
 */
async function processPosts(blogposts) {
  return Promise.all(
    blogposts.map(async (post) => {
      // remove HTML-Tags from the excerpt for meta description
      let metaDescription = post.excerpt.rendered.replace(/(<([^>]+)>)/gi, "");
      metaDescription = metaDescription.replace("\n", "");

      // Code highlighting with Eleventy Syntax Highlighting
      // https://www.11ty.dev/docs/plugins/syntaxhighlight/
      let content = highlightCode(post.content.rendered);

      // Return only the data that is needed for the actual output
      return await {
        content: content,
        date: post.date,
        modifiedDate: post.modified,
        excerpt: post.excerpt.rendered,
        formattedDate: new Date(post.date).toLocaleDateString("en-US", {
          year: "numeric",
          month: "long",
          day: "numeric",
        }),
        heroImageFull:
          post._embedded && post._embedded["wp:featuredmedia"].length > 0
            ? post._embedded["wp:featuredmedia"][0].media_details.sizes.full
                .source_url
            : null,
        heroImageThumb:
          post._embedded && post._embedded["wp:featuredmedia"].length > 0
            ? post._embedded["wp:featuredmedia"][0].media_details.sizes
                .medium_large.source_url
            : null,

        metaDescription: metaDescription,
        slug: post.slug,
        title: post.title.rendered,
      };
    })
  );
}

/**
 * Use Prism.js to highlight embedded code
 */
function highlightCode(content) {
  // since Prism.js works on the DOM,
  // we need an instance of JSDOM in the build
  const dom = new JSDOM(content);

  let preElements = dom.window.document.querySelectorAll("pre");

  // WordPress delivers a `code`-tag that is wrapped in a `pre`
  // the used language is specified by a CSS class
  if (preElements.length) {
    preElements.forEach((pre) => {
      let code = pre.querySelector("code");

      if (code) {
        // get specified language from css-classname
        let codeLanguage = "html";
        const preClass = pre.className;

        var matches = preClass.match(/language-(.*)/);
        if (matches != null) {
          codeLanguage = matches[1];
        }

        // save the language for later use in CSS
        pre.dataset.language = codeLanguage;

        // set grammar that prism should use for highlighting
        let prismGrammar = Prism.languages.html;

        if (
          codeLanguage === "javascript" ||
          codeLanguage === "js" ||
          codeLanguage === "json"
        ) {
          prismGrammar = Prism.languages.javascript;
        }

        if (codeLanguage === "css") {
          prismGrammar = Prism.languages.css;
        }

        if (codeLanguage === "php") {
          prismGrammar = Prism.languages.php;
        }
        // highlight code
        code.innerHTML = Prism.highlight(
          code.textContent,
          prismGrammar,
          codeLanguage
        );

        code.classList.add(`language-${codeLanguage}`);
      }
    });

    content = dom.window.document.body.innerHTML;
  }

  return content;
}

// export for 11ty
module.exports = async () => {
  const blogposts = await getAllPosts();
  const processedPosts = await processPosts(blogposts);
  return processedPosts;
};

Prism is a lightweight, extensible syntax highlighter, built with modern web standards in mind. Check out prismjs.com for more details and examples.

I’m using:

Axios – a promise based HTTP client for the browser and node.js – to get the API data
AssetCache from Fetch to cache that response so I don’t have to hit my API quite as often especially when I’m working locally
Prism – a lightweight syntax highlighter built with modern web standards – to preprocess code block formatting
jsdom since Prism works on the DOM

I’m also setting some configuration to load the php Prism language, and setting some defaults for items I want per page and the API base URL.

`requestPosts`

I’ve split out requestPosts as its own method with a page parameter so I can gather individual pages of content at a time. This method will return the resulting data along with the total number of items (total) and the total number of pages (pages) for this post type.

This is where you can customize your request to perform any filtering, including extra embedded data, setting an order, etc. See Using the REST API for more details on these options. I’m setting the page based on the provided page parameter, setting per_page to the config value, and am requesting the wp:featuredmedia data so I can grab the featured image URLs I need.

`getAllPosts`

I also needed a method that could facilitate gathering multiple pages of post content and combining them together, hence getAllPosts. This checks the cache first before kicking off a new API request, if it’s cached that cached response is returned instead. Otherwise it runs one (or more) requestPosts calls in parallel to get all pages of content. The result is then cached and returned.

`processPosts`

Next, I do some light processing of the post content to simplify it for my needs here. This includes things like reducing post.excerpt.rendered to post.excerpt and drilling down into the wp:featuredmedia object to get the full and thumbnail versions of the featured image for each post. I’m also filtering the post content through highlightCode, more on that next.

`highlightCode`

I tend to have a good deal of code blocks in my posts and I want them to be at least somewhat readable with proper formatting and highlighting, that’s what this method does. highlightCode will:

run through the incoming content and find all <pre> tags that denotes a code block
for each code block identified it’ll look for a classname that starts with language- and will set the codeLanguage and prismGrammar accordingly and defaults to HTML.
triggers a Prism.highlight based on what was determined above.

This is where you’d need to add other languages you plan to support, there’s also a Prism autoloader that I opted against using but it may be helpful for you if you regularly post a lot of language types.

Recap

Shout out to Martin Schneider’s Building a Blog with 11ty and WordPress and Jérôme Coupé’s Performant data fetching with promises and Eleventy, these excellent articles gave me a fairly substantial boost on the format of this data file and in performing the API calls in parallel.

I need to gather all post content, in order to avoid per page limits I’m determining how many pages of content I need to pull and am running those API calls – one per page – in parallel. The results are then combined, processed, and then parsed for code block formatting, before being returned.

Computed Data rather than Collections

Now that we have the computed data ready to go, we now move onto rebuilding the blog section which includes the main paginated blog index as well as the layout of individual posts. Let’s dive right in!

Rebuilding the blog

Rebuilding the blog section of my Eleventy site to use computed data rather than a collection was surprisingly simple, here’s the major steps:

Remove the static folder of blog posts and accompanying images
Update/replace the main blog index page to use computed data
Update/replace the blog post template to use computed data
Other tie ins you may have, for example showing latest posts on the homepage

The biggest change for me was in shifting from a blogpost layout at /site/_includes/layouts/blogpost.nkj to a top level page at /site/blogpost.njk. this new top level page needed some additional front matter data to define where the data is coming from and what the permalink should be. Here’s what I ended up with:

---
layout: layouts/base.njk
templateClass: tmpl-post
pagination:
  data: blogposts
  size: 1
  alias: blogpost
permalink: blog/{{ blogpost.slug }}/
---

This is pulling all blog posts from the /site/_data/blogposts.js computed data and paginating it based on the blog post slug, effectively creating a unique page per post. Now I can go to https://stevenwoodson.com/blog/composable-architecture-powered-by-wordpress/ and see that post.

Sitemap and Metadata adjustments

It’s starting to look like a blog again! But there’s a few other items you’re likely going to need to address that may not be immediately apparent, following are what I had to address.

Meta tag tweaks in the <head> section

Now that we’re dealing with separate data for blog posts, we can’t rely on basic data like title, for example I had to swap from

<title>{{ title or metadata.title }}</title>
<meta name="description" content="{{ description or metadata.description }}">

<title>{{ title or blogpost.title | safe or metadata.title }}</title>
<meta name="description" content="{{ description or blogpost.metaDescription or metadata.description }}">

for the title and description. Note that I had to add that | safe after the blogpost.title because of the potential for HTML special characters. WordPress automatically changes quotes (“…”) into quotation marks (“…”) for example.

Changes to how the sitemap is generated

You can’t rely on collections.all to grab all pages of content anymore, for me I had to add the following to my sitemap in order to ensure the blog posts were still added:

{%- for blog in blogposts %}
  {% set absoluteUrl %}{{ metadata.url }}/blog/{{ blog.slug }}{% endset %}
  <url>
    <loc>{{ absoluteUrl }}</loc>
    <lastmod>{{ blog.modifiedDate | htmlDateString }}</lastmod>
  </url>
{%- endfor %}

Updates to XML/JSON blog post feeds

Similarly, I had to rebuild the XML- and JSON-based feeds to use the new data. Following is my code update for the XML feed:

{%- for blog in blogposts %}
{% set absoluteUrl %}{{ metadata.url }}/blog/{{ blog.slug }}{% endset %}
<entry>
    <title>{{ blog.title | safe }}</title>
    <link href="{{ absoluteUrl }}"/>
    <published>{{ blog.date }}</published>
    <updated>{{ blog.modifiedDate }}</updated>
    <id>{{ absoluteUrl }}</id>
    {% if blog.heroImageFull %}<image>{{ blog.heroImageFull }}</image>>{% endif %}
    {% if blog.excerpt %}<summary>{{ blog.excerpt | striptags(true) }}</summary>{% endif %}
    <content type="html">{{ blog.content | htmlToAbsoluteUrls(absoluteUrl) }}</content>
</entry>
{%- endfor %}

Here it is for the JSON version:

"items": [
    {%- for blog in blogposts %}
    {% set absoluteUrl %}{{ metadata.url }}/blog/{{ blog.slug }}{% endset %}
    {
        "id": "{{ absoluteUrl }}",
        "url": "{{ absoluteUrl }}",
        "title": "{{ blog.title | safe }}",
        {% if blog.excerpt %}"summary": "{{ blog.excerpt | striptags(true) }}",{% endif %}
        {% if blog.heroImageFull %}"image": "{{ blog.heroImageFull }}",{% endif %}
        "content_html": {% if blog.content %}{{ blog.content | dump | safe }}{% else %}""{% endif %},
        "date_published": "{{ blog.date }}",
        "date_modified": "{{ blog.modifiedDate }}"
    }
    {%- if not loop.last -%}
    ,
    {%- endif -%}
    {%- endfor %}
]

Rewrite rules added to .htaccess if your blog URLs are changing

This last one is dependent on the changes you’re making, in my case I moved from “/posts” to “/blog” as part of this switch so I made sure to add rewrite rules to 301 redirect users to the right location. Here’s my /site/static/.htaccess file you can start from, the first RewriteRule redirects the main blog page and the second redirects individual blog posts that used to be in subdirectories based on year and month:

Options +FollowSymLinks
RewriteEngine On
RewriteRule ^posts/$ /blog/ [R=301,NC,L]
RewriteRule ^posts/(.*)/(.*)/$ /blog/$2 [R=301,NC,L]

Structure & Style modifications

If you made use of any shortcodes specific to your blog posts, it’s probably a good idea to get rid of those to keep your code clean. For example I had ones for custom blockquotes and asides that are now no longer necessary.

I also found that WordPress wraps content in slightly different ways than I had with my CSS styles, so I needed to adjust my stylesheets to follow suit. This part is going to be rather dependent on how much of the baked-in styles you’re going to want to utilize from WordPress. It might be easiest to import everything wholesale but I opted to keep it cleaner by only defining what I knew for sure I’d need. Here’s my /site/assets/scss/layout/_wordpress.scss file in case you wanted to use that as a starting point for your own. I like having it separate in its own file as a reminder that this is CSS specifically for WordPress-provided content.

/* ==========================================================================
   CSS to support content generated by WordPress
   ========================================================================== */
.cp_embed_wrapper {
  margin: $global-spacing calc($global-spacing * -1);

  @media (min-width: $bp_small_desktop_min) {
    margin: $global-spacing calc($global-spacing-double * -1);
  }
}

/* WordPress Blockquote
   ========================================================================== */
.wp-block-quote {
  @extend .quote;

  cite {
    @extend .font-secondary;
    @extend .quote__attribution;
    display: block;
    font-style: normal;
    font-weight: $fw-bold;
    text-align: center;
  }

  p {
    @extend .quote__content;
  }
}

/* WordPress Column layouts
   ========================================================================== */
.wp-block-columns {
  display: flex;
  margin-bottom: 1.75em;
  box-sizing: border-box;
  flex-wrap: wrap !important;
  align-items: initial !important;
  /**
  * All Columns Alignment
  */
}
@media (min-width: 782px) {
  .wp-block-columns {
    flex-wrap: nowrap !important;
  }
}
.wp-block-columns.are-vertically-aligned-top {
  align-items: flex-start;
}
.wp-block-columns.are-vertically-aligned-center {
  align-items: center;
}
.wp-block-columns.are-vertically-aligned-bottom {
  align-items: flex-end;
}
@media (max-width: 781px) {
  .wp-block-columns:not(.is-not-stacked-on-mobile) > .wp-block-column {
    flex-basis: 100% !important;
  }
}
@media (min-width: 782px) {
  .wp-block-columns:not(.is-not-stacked-on-mobile) > .wp-block-column {
    flex-basis: 0;
    flex-grow: 1;
  }
  .wp-block-columns:not(.is-not-stacked-on-mobile)
    > .wp-block-column[style*="flex-basis"] {
    flex-grow: 0;
  }
}
.wp-block-columns.is-not-stacked-on-mobile {
  flex-wrap: nowrap !important;
}
.wp-block-columns.is-not-stacked-on-mobile > .wp-block-column {
  flex-basis: 0;
  flex-grow: 1;
}
.wp-block-columns.is-not-stacked-on-mobile
  > .wp-block-column[style*="flex-basis"] {
  flex-grow: 0;
}

:where(.wp-block-columns.has-background) {
  padding: 1.25em 2.375em;
}

.wp-block-column {
  flex-grow: 1;
  min-width: 0;
  word-break: break-word;
  overflow-wrap: break-word;
  /**
  * Individual Column Alignment
  */
}
.wp-block-column.is-vertically-aligned-top {
  align-self: flex-start;
}
.wp-block-column.is-vertically-aligned-center {
  align-self: center;
}
.wp-block-column.is-vertically-aligned-bottom {
  align-self: flex-end;
}
.wp-block-column.is-vertically-aligned-top,
.wp-block-column.is-vertically-aligned-center,
.wp-block-column.is-vertically-aligned-bottom {
  width: 100%;
}

Triggering a new build on publish

Last but certainly not least, I needed a way to trigger a fresh build as I published new content. I wanted to make this as easy as possible on myself so I put some time into figuring out how to automate these builds rather than having to do it manually every time. There are two parts to this, a build pipeline that gets triggered and a way to automatically trigger it.

CI/CD Build Pipelines

This is only going to work if you have some sort of build pipeline that you can tap into, if you’re manually triggering new static site builds within your Eleventy setup then you’ll have to continue to do that after publishing new content to your WordPress blog.

Otherwise, there are options for Github Actions, Azure Pipelines, and Bitbucket Pipelines but your unique setup is going to dictate how this works for you. Be sure, as part of this pipeline, that you’re removing the cache so it’s pulling fresh blog content each time. Here’s the NPM part of my pipeline:

rm -rf .cache/
npm ci
npm run build

Triggering a build on publish

After a bit of research I opted for the WP Webhooks WordPress plugin for this, it does a whole lot more than I need right now and is fairly straightforward. Once you’ve got it installed go to the settings and click on the “Send Data” menu item. From there you’ll get a list of available webhook triggers. The two I ended up sticking with were “Post updated” and “Post deleted”, I didn’t keep “Post created” because – as far as I could tell – “Post updated” triggered for that too so it wasn’t necessary.

For both of those triggers, I clicked on the “Add Webhook URL” button and added my unique CI/CD pipeline URL that triggers a new build.

Wrapping up

This post ended up being a lot longer than I anticipated and I hope it doesn’t make this process seem especially daunting, I went into a lot more detail than usual to try to help you avoid some of the mistakes that I made the first time around. I got so excited to launch this that I initially forgot about proper redirects, updating the sitemap, and all the XML/RSS feeds – essentially cutting the site off from feed readers, backlinks, and search engines. Don’t be like me!

If you give this a shot and run into any trouble (or just want to say thanks!) please do feel free to get in touch or start a conversation at one of the links below. Thanks!

DEV Community