gophobic

Posted on Dec 17, 2018 • Edited on Dec 19, 2018

Solving the Eternal SEO Problem and Providing SSR for Modern JavaScript Websites without Writing a Single Line of Code

#go #javascript #react #webdev

What is the problem anyway?

Whenever you develop a website with a modern frontend javascript framework such as React.js, Vue.js, Angular.js, etc... sooner or later you have to deal with the painful eternal SEO problem. Since most search engines don't even execute javascript to generate the final DOM that contains most of the valuable page content, your website will definitely be hurt in SEO rankings as search engines see almost nothing of value in your HTML body. Native framework SSR (server-side rendering) and/or developing your website as isomorphic can be the ideal solutions but it needs to be taken care of as early as your first line of code and its complexity grows with your webapp and also becomes instantly invalid with a single non-conformant dependency. Simpler websites (small commercial websites, technical documentation websites, etc..) may just use a static site generation framework such as gatsby.js or Docusaurus to solve this problem. But if you're dealing with a more complex webapp, such frameworks will never be a good choice. Also if you have a big project that's already in production, native framework SSR might be too complex and too late. And that is how SEO became an eternal problem for modern webapps.

However, something happened a year ago, Google announced shipping "headless" Chrome starting from version 59. Along with Chrome Devtools Protocol, this has opened a new world for developers to remotely control Chrome. Headless Chrome is mainly used for automated testing. But most interestingly, headless Chrome became an unlikely complete solution for the eternal SEO problem, a solution that is totally independent of whatever frontend frameworks, stacks, versions, dependencies or backend stacks you might use! Sounds too good to be true, right?

Rendora?

Rendora is a new FOSS golang project that has been trending in GitHub for the past few days and deserves some highlight. Rendora is a dynamic renderer that uses headless Chrome to effortlessly provide SSR to web crawlers and thus improving SEO. Dynamic rendering simply means that the server provides server-side rendered HTML to web crawlers such as GoogleBot and BingBot and at the same time provides the typical initial HTML to normal users in order to be rendered at the client side. Dynamic rendering has been recommended lately by both Google and Bing and also has been talked about in Google I/O' 18.

rendora / rendora

dynamic server-side rendering using headless Chrome to effortlessly solve the SEO problem for modern javascript websites

Rendora

Rendora is a dynamic renderer to provide zero-configuration server-side rendering mainly to web crawlers in order to effortlessly improve SEO for websites developed in modern Javascript frameworks such as React.js, Vue.js, Angular.js, etc... Rendora works totally independently of your frontend and backend stacks

Main Features

Zero change needed in frontend and backend code
Filters based on user agents and paths
Single fast binary written in Golang
Multiple Caching strategies
Support for asynchronous pages
Prometheus metrics
Choose your configuration system (YAML, TOML or JSON)
Container ready

What is Rendora?

Rendora can be seen as a reverse HTTP proxy server sitting between your backend server (e.g. Node.js/Express.js, Python/Django, etc...) and potentially your frontend proxy server (e.g. nginx, traefik, apache, etc...) or even directly to the outside world that does actually nothing but transporting requests and responses as they are except when it detects whitelisted requests according to the config. In that…

View on GitHub

Rendora works by acting as a reverse HTTP proxy in front of your backend server (e.g. Node.js, Golang, Django, etc...) and checking incoming requests according to the configuration file; if it detects a "whitelisted" request for server-side rendering, it commands headless Chrome to request and render the corresponding page and then return the final SSR'ed HTML response back to the client. If the request is blacklisted, Rendora simply acts as a useless reverse HTTP proxy and returns the response coming from the backend as is. Rendora differs from the other great project in the same area, rendertron, in that not only it offers better performance by using golang instead of Node.js, using caching to store SSR'ed pages and skipping fetching unnecessary assets such as fonts and images which slows down rendering on headless Chrome but also it doesn't require any change in both backend and frontend code at all! Let's see Rendora in action to understand how it works.

Rendora in action

Let's write the simplest React.js application

import * as React from "react"
import * as ReactDOM from "react-dom"

class App extends React.Component {
    render() {
        return (
            <div>
                <h1>Hello World!</h1>
            </div>
        )
    }
}

ReactDOM.render(
    <App />,
    document.getElementById("app")
)

Now let's build it to canonical javascript using webpack and babel. This will produce the final javascript file bundle.js. Then let's write a simple index.html file.

<!DOCTYPE html>
<html lang="en">

<head>
    <meta charset="UTF-8" />
</head>

<body>
    <div id="app"></div>
    <script src="/bundle.js"></script>
</body>

</html>

Now let's serve index.html using any simple HTTP server, I wrote one in golang that's listening to the port 8000. Now whenever you address the server http://127.0.0.1:8000 using your browser and view the page source, you will simple see exactly the same as the above HTML code. That's expected since the Hello World header of our React app is generated and added to the DOM after bundle.js gets executed by the browser javascript engine. Now let's put Rendora into use and write a simple config file in YAML

listen:
    port: 3001

backend:
    url: http://127.0.0.1:8000

target:
    url: http://127.0.0.1:8000

filters:
    userAgent:
        defaultPolicy: whitelist

What does this config file mean? We told rendora to listen to the port 3001, our backend can be addressed on http://127.0.0.1:8000 so that rendora proxies requests to and from it, and that our headless Chrome instance should use it as the target url for whitelisted requests, but since we whitelisted all user agents for the sake of this tutorial, all requests are then valid for server-side rendering. Now let's run headless Chrome and Rendora. I will use Rendora's provided docker images:

docker run --tmpfs /tmp --net=host rendora/chrome-headless
docker run --net=host -v ~/config.yaml:/etc/rendora/config.yaml rendora/rendora

Now comes the big moment, let's try to address our server again but through rendora this time using the address http://127.0.0.1:3001. If we check the page source this time, it will be:

<!DOCTYPE html>
<html lang="en">

<head>
    <meta charset="UTF-8" />
</head>

<body>
    <div id="app"><div><h1>Hello World!</h1></div></div>
    <script src="/bundle.js"></script>
</body>

</html>

did you see the difference? the content inside the <div id="app"></div> is now part of the HTML sent by the server. It's that easy! whether you use React, Vue, Angular, Preact with whatever versions and dependencies, and also no matter what your backend stack is (e.g. Node.js, Golang, Django, etc...), whether you have a very complex website with complex components or just a "Hello World" app, writing that YAML configuration file is all what you need to provide SSR to search engines. I's worth mentioning that you normally don't want to whitelist all requests, you just want to whitelist certain user agent keywords corresponding to web crawlers (e.g. googlebot, bingbot, etc...) while keeping the default policy as blacklist.

Rendora also provides Prometheus metrics so that you can get a histogram of the SSR latencies and other important counters such as the total number of requests, total number of SSR'ed requests and total number of cached SSR'ed requests.

Are you required to use Rendora as reverse HTTP proxy in front of your backend server in order to get it working? The answer is fortunately NO! Rendora provides another optional HTTP API server listening to the port 9242 by default to provide a rendering endpoint. So you may implement your own filtering logic and just ask Rendora to get you the SSR'ed page. Let's try it and ask Rendora to render the above page again but using the API rendering endpoint with curl this time:

curl --header "Content-Type: application/json" --data '{"uri": "/"}' -X POST 127.0.0.1:9242/render

you simply get a JSON response

{
    "status":200,
    "content":"<!DOCTYPE html><html lang=\"en\"><head>\n    <meta charset=\"UTF-8\">\n</head>\n\n<body>\n    <div id=\"app\"><div><h1>Hello World!</h1></div></div>\n    <script src=\"/bundle.js\"></script>\n\n\n</body></html>",
    "headers":{"Content-Length":"173","Content-Type":"text/html; charset=utf-8","Date":"Sun, 16 Dec 2018 20:28:23 GMT"},
    "latency":15.307418
}

You may have noticed the latency to render this "Hello World" React app took only around 15ms on my very busy and old machine without using caching! So Headless Chrome and Rendora are really that fast.

Other uses

While rendora is mainly meant to be used for server-side rendering or SSR, you can easily use its API for scraping websites whose DOM is mostly generated by javascript.

Top comments (2)

Murrough Foley • Jul 18 '20

This project stalled. What are your thoughts on how Nuxt tackles the problem?

اب.را.هیم • May 25 '22

now, i have a question. could anybody validate the ssr on flightio.com/ ? i mean it's equiped with it, but i still get the sense there is something wrong with the rendering.