DEV Community

Cover image for How search engines & social media crawlers render JavaScript
Rachel Costello
Rachel Costello

Posted on • Originally published at deepcrawl.com

How search engines & social media crawlers render JavaScript

JavaScript is a widely discussed topic in the SEO community, because it can cause significant issues for search engines and other crawlers that are trying to access the pages on our sites.

The information that SEOs are gathering on the topic of JavaScript rendering should be more widely shared, as these findings will impact everyone who has a JavaScript-heavy website that they want to be visible to new users.

That’s why I’ve put together this guide to explain some of the key considerations to be aware of.

How search engines render JavaScript

From looking at this example code, a search engine like Google won’t have any idea what the page is meant to be about:

<body>
<app-root></app-root>
<script src="runtime.js"></script>
<script src="polyfills.js"></script>
<script src="main.js"></script>
</body>

The JavaScript contained within this code needs to be processed and executed so that the output code can be displayed for the client. For the contents of a JavaScript-heavy page to mean anything to a search engine or social media crawler, they need to render the page.

However, rendering is an expensive, resource-intensive process which the majority of search engine bots and social media bots struggle with. So it’s important to understand their rendering capabilities, so you can be aware of what they will struggle to see on your site.

It’s important to bear in mind that most search engines can’t render at all, and those that do have their own rendering limitations, as I’ll explain later in this article.

If your website relies on JavaScript to power its content and navigation, search engines could end up seeing a blank screen with nothing of value to crawl or index.

I’ve put together the latest updates on how the main search engines are currently equipped for rendering, as well as some key considerations for building sites that can be crawled and indexed.

Google’s rendering capabilities

Google is one of the few search engines that currently renders JavaScript, and provides a lot of documentation and resources on JavaScript best practice for search.

This means we’re able to build a pretty clear picture of what we need to do to get our websites indexed in Google’s SERPs (Search Engine Results Pages).

When Google renders, it generates markup from templates and the data available from a database or an API. The key step in this process is to get this fully generated markup, because this is what’s readable for Google’s web crawler, Googlebot.

The key stage of rendering for Google

Source: Martin Splitt, AngularUP Conference

To carry out this process, Googlebot uses a headless browser for its web rendering service (WRS). Google’s WRS used to be based on Chrome 41, which was an outdated version launched in 2015.

However, Google have now made their WRS ‘evergreen’, meaning that it will be regularly updated to run the latest version of Chrome on an ongoing basis.

This change allows Googlebot to process features that it was previously unable to, such as ES6, IntersectionObserver and Web Components.

The crawling and indexing process is usually very quick for sites that don’t rely on JavaScript, however, Google can’t crawl, render and index in one instantaneous process due to the scale of the internet and the processing power that would be required to do so.

“The internet is gigantic, that’s the problem. We see over 160 trillion documents on the web daily, so Googlebot is very busy. Computing resources, even in the cloud age, are pretty tricky to come by. This process takes a lot of time, especially if your pages are really large, we have to render a lot of images, or we have to process a lot of megabytes of JavaScript.”

-Martin Splitt, Webmaster Trends Analyst at Google

This is why Google has a two-wave indexing process. In the first wave of indexing, HTML pages are crawled and indexed and Googlebot will use a classifier to determine pages with JavaScript on them that need to be rendered.

These pages will be added to a queue to be rendered at a later date when enough resources become available, in the second wave of indexing. A page will only be added to the index in the second wave after it has been rendered.

“Google determines if pages need to be rendered by comparing content found in the initial HTML and the rendered DOM.”

-Martin Splitt, Google Webmaster Hangout

Google's two waves of indexing

Source: Google I/O 2018

When resources do become available, there isn’t a specific way of prioritising the pages that will be rendered first, which means that there are no guarantees on when pages will actually be rendered after they are initially discovered by Googlebot.

What is the gap between the first and second wave of indexing then? According to Google’s Tom Greenaway and Martin Splitt during Chrome Dev Summit 2018, it could take “minutes, an hour, a day or up to a week” for Google to render content after a page has been crawled.

If your website gets stuck between these two waves of indexing, any new content you add or any changes you make to your website won’t be seen or indexed for an undetermined amount of time.

This will have the biggest impact on sites that rely on fresh search results, such as ecommerce or news sites.

“Ecommerce websites should avoid serving product page content via JavaScript.”

-John Mueller, Google Webmaster Hangout

“News sites should avoid content that requires JavaScript to load.”

-John Mueller, Google Webmaster Hangout

Bing’s rendering capabilities

Bing’s crawler allegedly does render JavaScript, but is limited in being able to process the latest browser features and render at scale.

The team at Bing recommended implementing dynamic rendering to make sure Bingbot is able to crawl and index your JavaScript-powered content and links.

“In general, Bing does not have crawling and indexing issues with web sites using JavaScript, but occasionally Bingbot encounters websites heavily relying on JavaScript to render their content, especially in the past few years. Some of these sites require far more than one HTTP request per web page to render the whole page, meaning that it is difficult for Bingbot, like other search engines, to process at scale on every page of every large website.

Therefore, in order to increase the predictability of crawling and indexing of websites relying heavily on JavaScript by Bing, we recommend dynamic rendering as a great alternative. Dynamic rendering is about detecting the search engine’s bot by parsing the HTTP request user agent, prerendering the content on the server-side and outputting static HTML, helping to minimize the number of HTTP requests needed per web page and ensure we get the best and most complete version of your web pages every time Bingbot visits your site.

When it comes to rendering content specifically for search engine crawlers, we inevitably get asked whether this is considered cloaking... and there is nothing scarier for the SEO community than getting penalized for cloaking. The good news is that as long as you make a good faith effort to return the same content to all visitors, with the only difference being that the content is rendered on the server for bots and on the client for real users, this is acceptable and not considered cloaking.”

-Fabrice Canel, Principal Program Manager at Bing

Even though Bing can render in some capacity, it isn’t able to extract and follow URLs that are contained within JavaScript.

“Don’t bury links to content inside JavaScript.”

-Bing Webmaster Guidelines

Yahoo’s rendering capabilities

Yahoo cannot currently render at all. It is recommended to make sure that content isn’t ‘hidden’ behind JavaScript, as the search engine won’t be able to render to be able to find any content generated by the script. Only content that is served within the HTML will be picked up.

You can get around this by using the <noscript> element.

“Unhide content that’s behind JavaScript. Content that’s available only through JavaScript should be presented to non-JavaScript user agents and crawlers with noscript HTML elements.”

-Yahoo Webmaster Resources

Yandex’s rendering capabilities

Yandex’s documentation explains that their search engine doesn’t render JavaScript and can’t index any content that is generated by it. If you want your site to appear in Yandex, make sure your key content is returned in the HTML upon the initial request for the page.

“Make sure that the pages return the full content to the robot. If they use JavaScript code, the robot will not be able to index the content generated by the script. The content you want to include in the search should be available in the HTML code immediately after requesting the page, without using JavaScript code. To do this, use HTML copies.”

-Yandex Support

Other search engines’ rendering capabilities

DuckDuckGo, Baidu, AOL and Ask are much less open about their rendering capabilities and lack official documentation as reference guides. The only way to find this out currently is to run tests ourselves.

In 2017, Bartosz Góralewicz ran some experiments using a test site that used different JavaScript frameworks to serve content and analysed which search engines were able to render and index the content they generated.

We can never make definitive conclusions based on the indexing of test sites alone, but the results showed that only Google and, surprisingly, Ask were able to index rendered content.

Search engine JavaScript rendering comparison chart

Source: Moz
“Bing, Yahoo, AOL, DuckDuckGo, and Yandex are completely JavaScript-blind and won’t see your content if it isn’t in the HTML.”

-Bartosz Góralewicz, CEO of Onely

Take a look at the full article covering the experiment and results to learn more about Bartosz’s conclusions.

How social media platforms render JavaScript

It’s important to know that social media and sharing platforms generally can’t render any JavaScript client-side.

“Facebook and Twitter’s crawlers can’t render JavaScript.”

-Martin Splitt, Google Webmaster Hangout

If you rely on JavaScript to serve content that would feed into Open Graph tags, Twitter Cards or even meta descriptions that would show when you share an article on Slack, for example, this content wouldn’t be able to be shown.

URL sharing content dropdown example in Slack

Make sure you pre-render, server-side render or dynamically content like featured images, titles and descriptions for crawlers like Twitterbot and Facebot, so they can display your site and its content properly.

Latest comments (6)

Collapse
 
cypruska profile image
Nicolás D'Andrade

Great read! Thank you, as always, Rachel! Insightful JS data.

Collapse
 
rachellcostello profile image
Rachel Costello

Thanks so much, Nicolás! Really pleased you found the post useful!

Collapse
 
mateuschmitz profile image
Mateus Schmitz

Great article. Thanks!

Collapse
 
sandeepbalachandran profile image
Sandeep Balachandran

See those continues cross mark for angular 2+ ? Thats where things get interesting for a google product. Why google? why? Let the other ones suffer. Not us. I think we can make angular 2+ SEO friendly by applying server side rendering.

Collapse
 
chrisachard profile image
Chris Achard

Nice dive into seo and js - I always worry a bit when I use React that it won't be indexed very well, and it looks like Google will handle it ok now (though not as good as straight html) - but based on that Moz image, they're basically the only search engine that does handle it well... hm.

Thanks! I'll have to keep that in mind when working on sites that should be indexed :)

Collapse
 
rachellcostello profile image
Rachel Costello

It can always be a worry regardless of framework, so that's understandable! The main thing to consider is whether your crucial content and internal links are being served via JavaScript.

If you're dynamically rendering, pre-rendering or server-side rendering then this should be fine for search engines, but it's always worthwhile to test what they can actually see. Glad you found the article useful :)