Prevent Google from indexing your site

#google #searchengines #webdev #webmastering

While I'm currently developing a client's website, the root of their website has a simple coming soon page. I decided to also setup a subdomain so that I can use it as a development environment and also send it as a link to the client so that they can see a close representation of the progress and actually interact with the website.

One challenge with this is, while I want the root domain with the coming soon page to be index by Google, I didn't want the subdomain to be indexed because at some point when the site is done, I'd probably delete the subdomain.

`noindex` method

According to Google, including a meta tag with the content value of noindex and name value of robots will cause the Googlebot to completely drop the page from the Google Search results when it next crawls.

This is what the noindex meta tag looks like in the head of your web page.

<head>
  <meta name="robots" content="noindex">
  <title>Your cool website</title>
</head>

The meta tag will need to be included in every single page you want the Googlebot not to index. If you want to block the bot completely instead of telling which individual pages not to index, you'll want to use the robots.txt method.

robots.txt method

The other method is to block all search engine crawler bots from indexing your site. To do this, you'll create a robots.txt file and place it at the root of the domain. This method also assumes you have file upload access to your server.

The contents of robots.txt will be:

User-agent: *
Disallow: /

Which tells all crawlers to not crawl the entire domain. So for example if I've got a subdomain of dev.example-url.com and I want just the subdomain of dev to be blocked, I'll want to place the robots.txt file at the root for the subdomain.

http://dev.example-url.com/robots.txt

Do I Need Both?

Nope, you only need one method, but remember with the noindex tag, you'll need to add it to every page you desire not to be indexed, while the robots.txt will instruct the crawler to not index the entire subdomain.

Originally posted on Michael Lee

Top comments (1)

Vijay Gadage • Oct 26 '18

dev.to/vijayg/what-is-robotstxt--a...

noindex method

robots.txt method

Do I Need Both?

`noindex` method