Improving SEO of Single Page Applications using Rendertron

#webdev #javascript #html #go

The Problem

In this era, single-page applications (SPA) dominate the Internet due to its better user experience compared to the traditional server-side rendered (SSR) pages. However, the added benefits of SPAs come with several costs, one of which is poor search engine optimization (SEO).

Because SPAs are typically rendered on the client-side, aka, client-side rendering (CSR), the contents of a website might not be visible until the Javascript codes are executed on the client-side. This is not good for search engines and social media crawlers because the meta tags required by these crawlers might not exist in the initial HTML file. This would cause the SPA site not getting indexed properly on search engines and would not get displayed properly when the link of the SPA is shared on social medias.

For example, pasting a link of a client-side rendered page on Facebook might look like this:

In contrast, pasting a link of a server-side rendered page on Facebook might look like this:

How can we solve the issues related to SEO and link sharing of SPAs?

The Solution

There are several workarounds to overcome the shortcomings related to link sharing and SEO of SPAs among which are pre-rendering and dynamic rendering. In this article, we will look at the approach of dynamic rendering to improve the SEO of our SPAs due to its easier implementation than pre-rendering. In fact, Google itself recommends dynamic rendering.

Dynamic rendering is good for indexable, public JavaScript-generated content that changes rapidly, or content that uses JavaScript features that aren't supported by the crawlers you care about. Not all sites need to use dynamic rendering, and it's worth noting that dynamic rendering is a workaround for crawlers.

What is Dynamic Rendering?

Dynamic rendering is a technique where your server serves different contents to requests coming from different user-agents. If the request is coming from a crawler, the server would route the request to a renderer which will then render out the requested page before returning the fully-rendered flat HTML page to the crawler. Otherwise, if the request is coming from a user, the server would serve normally.

What is Rendertron?

Rendertron is a tool built by the Google Chrome team that can be used for dynamic rendering. Rendertron runs as a HTTP server and it renders the requested pages using Headless Chrome. It is built on top of Puppeteer. Rendertron is highly configurable and offers alot of features out of the box. One such very useful feature is caching, which allows the renderer to cache a site on the first request and serve it faster on subsequent requests.

The Implementation

Rendertron can be Dockerised and deployed to your cloud provider. Your web server can then identify requests coming from crawlers using the user-agent header and then route the request to your deployed Rendertron service. Below is a sample Docker file and config file for Rendertron. More configs can be found on their Github repository.

Docker file:

FROM node:14.11.0-stretch

RUN apt-get update \
    && apt-get install -y wget gnupg \
    && wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - \
    && sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' \
    && apt-get update \
    && apt-get install -y google-chrome-stable fonts-ipafont-gothic fonts-wqy-zenhei fonts-thai-tlwg fonts-kacst fonts-freefont-ttf libxss1 \
    --no-install-recommends \
    && rm -rf /var/lib/apt/lists/*

# This directoty will store cached files as specified in the config.json.
# If you haven't defined the cacheConfig.snapshotDir property you can remove the following line
RUN mkdir /cache

RUN git clone https://github.com/GoogleChrome/rendertron.git

WORKDIR /rendertron

RUN npm install && npm run build

# If you aren't using a custom config.json file you must remove the following line
ADD config.json .

EXPOSE 3000

CMD ["npm", "run", "start"]

Config file:

{
    "cache": "filesystem",
    "cacheConfig": {
        "cacheDurationMinutes": 7200,
        "cacheMaxEntries": 1000,
        "snapshotDir": "/cache"
    }
}

Once deployed, you can call the /render/<url> endpoint to get the flat HTML page of the requested URL. If you are using GoFiber for your web server, the crawler requests can be routed as follows:

....

func dynamicServer(c *fiber.Ctx) error {
    userAgent := string(c.Context().UserAgent())
    reqUrl := c.Request().URI().String()
    ua := ua.Parse(userAgent)

    switch {
    case ua.Bot:
        {
            resp, err := http.Get("https:<renderer url>/render/" + reqUrl)
            if err != nil {
                log.Println(err)
            }

            defer resp.Body.Close()

            body, err := ioutil.ReadAll(resp.Body)
            if err != nil {
                log.Println(err)
            }

            return c.SendString(string(body))
        }
    default:
        return c.SendFile("dist/client/store/index.html")
    }

    return c.Next()
}

func main() {

    app := fiber.New()

    app.Get("/store*", dynamicServer)

    app.Listen(":3000")
}

We are using this package to parse and identify the user agent in Golang. However, Rendertron also provides middlewares for other languages and tools, you may refer their Github repository page.

Conclusion

The web was not initially designed for SPAs, it was only meant to serve static HTMLs. However, as computers got better, the ways websites served to us have also changed to improve our user experience. As discussed earlier, these changes come with several costs, but the tech community always develop ways to overcome those costs, one of which is dynamic rendering which is very useful to overcome SEO issues when building SPAs.

References
https://developers.google.com/search/docs/advanced/javascript/dynamic-rendering
https://www.deepcrawl.com/knowledge/ebooks/javascript-seo-guide/different-rendering-methods/
https://googlechrome.github.io/rendertron/deploy.html