DEV Community

Alex Edwards
Alex Edwards

Posted on • Originally published at alexedwards.co

Parsing Markdown from a CMS

Parsing is a heavy topic, so this post is a bit code heavy, however, I'll try my best to explain away.

I'll save some time here; I spend 1/2 a day invested in getting MDX working with Contentful; I couldn't get it to work.

Instead, I figured after a moment, with Remark-Rehype, I can kill two birds with one stone. I can get Markdown to work, that covers 90% of my use case, and when I need another component rendering, adding custom components when required.

Rehype-React is the package that takes in trusted HAST and responds with react components rendered. HAST, as I had to learn, is an acronym for HTML abstract syntax tree. It breaks down HTML into an object that contains arrays of other objects containing properties and children.
Sounds complicated, huh?

Don't need to worry about it until tempted to write a plugin. Until that moment; this is the moment to change how that HAST turns into rendered code:

const renderAst = new rehypeReact({
  createElement: React.createElement,
  Fragment: React.Fragment,
  components: {
    h1: Heading.H1,
    h2: Heading.H2,
    h3: Heading.H3,
    h4: Heading.H4,
    h5: Heading.H5,
    h6: Heading.H6,
    p: Body.P,
    a: Link,
    ol: Body.OL,
    ul: Body.UL,
    li: Body.LI,
    blockquote: Body.BQ,
    table: Body.Table,
    th: Body.TH,
    td: Body.TD,
    hr: Body.HR,
  }
}).Compiler;

What's to be understood here is that unlike the Gatsby plugin for Rehype-React, this allows the given HAST to have custom components rendering its content. Passing our Contentful HAST into renderAst() we can ensure that each HTML heading, paragraph, and anything else follows suit with the rest of the site. That even includes anchor links that can and will take full advantage of Gatbsy's Link component.

Custom Link logic

To do that, we need to implement our component. I know, sounds counter-intuitive. However, if done correctly, we can drop in a different import and done.

First, we need to check the to or href attribute. Why? Because in our code, we'll pass in the to argument, however, our HAST will pass in href. Everything else will be down to what we give to them, or any additional parameters—aspects such as activeClass. All other logic will come from regex passed to the link.

First, let's see if it's an internal link by matching it with:

const internal = /^\/(?!\/)/

This regex looks to the front of the string for /, but not //.
If it's an internal link, it count be a download. So let's match it the response with looking to the end of the string. That's relatively straight forward too; the line starts with / and ends with '.(something), in regex that translates to:

const file = /\.[0-9a-z]+$/i

The $ is for the end of the string, and the + is to any combination of 0-9, a-z, in any amount.

The other difficult one would be a local navigation link. That's something that begins with a # or ?. It's difficult only for that it doesn't start with a / and won't work for accessibility if we pass in an 'onClick' prop.

const local = /^[#?]/

If none of that matches, then as a fail, return a bog-standard, standard anchor link that fulfils all accessibility and security concerns. All of it combined looks something like this:

import {Link as GLink} from 'gatsby'

function Link({ children, to, href, activeClassName, partiallyActive, ...rest }: LinkComponent): React.ReactElement {
  const internal = /^\/(?!\/)/
  const file = /\.[0-9a-z]+$/i
  const local = /^[#?]/
  const ref = to || href;

  if (!ref) return (<a {...rest}></a>)

  if (internal.test(ref)) {
    if (file.test(ref)) return (<a href={ref} download {...rest}>{children}</a>)
    return (<GLink to={`${ref === '/' ? '' : ref}/`} activeClassName={activeClassName} partiallyActive={partiallyActive} {...rest}>{children}</GLink>)
  }

  if (local.test(ref)) return (<a href={ref} {...rest}>{children}</a>)
  // All else, this is an external link
  return <a href={ref} target="_blank" rel="noreferrer" {...rest}>{children}</a>

}

Back to Rehype

We are dropping in this logic to Rehype so that within Markdown, we can link to pages, downloads, local anchors, external sites, and everything will work as expected. So far, everything else is purely stylistic (using styled-components). However, this option that Rehype exposes is key to quickly and easily integrating stylistic choices and translating Markdown into react components.

To fully utilise rehype, we now parse the HtmlAst field from any long-field text:

query Homepage {
  page: contentfulPage(title: {regex: "/homepage/i"}) {
    title
    blocks {
      ... on ContentfulPageHome {
        title
        subTitle
        authorBio {
          blurbTitle
          blurbDesc {
            childMarkdownRemark {
              htmlAst
            }
          }
          blurbLead
renderAst(blocks?.authorBio.blurbDesc?.childMarkdownRemark?.htmlAst)

Using renderAst() keeps components, pages, and templates clean as there aren't any further imports needed than our function, and generated code, writing integration tests or end-to-end tests ensure the output plays well with the code we write. Using styled-components can keep everything aligned when themes change, typography adjusts, and later in other components.

The last thing to address is images.

Gatsby Image with Contentful API

Thankfully someone, just like sourcing from Contentful, has already created a plugin for this. gatsby-remark-images-contentful will pull in any asset hosted on Contentful and allow Sharp to process images into fixed or fluid types to be used by gatsby-image. Under the hood, it works in a similar way to how querying an image would normally. Something like this:

profilePicture {
  fluid(maxWidth: 260) {
    ...GatsbyContentfulFluid_tracedSVG
  }
}
<Img fluid={blocks?.authorBio?.profilePicture?.fluid} />

Only images hosted on Contentful itself will be processed by images-contentful. Remark and Rehype will parse 3rd-Party hosts into standard image nodes.

Gatsby-Remark-Images is another option; however, at the time of writing this, installing alongside or installing instead of images-contentful causes problems. Namely doesn't accurately process the images from Contentful at all. Since I'm already utilising Contentful, I'm doubling down on image handling.


This post is an ongoing series along with my site development. All current process on my website and in the open-source repo is open for review. As I publish new features (Posts and Projects templates, changes in style), I'll continue to add to this series as a reflection of what I've learnt, and what reflections I have on the process.

Top comments (0)