How to Build and Test a robots.txt File Step by Step Using a Visual Generator

#tools #productivity #webdev

Writing robots.txt manually is straightforward when you understand the syntax. The challenge is testing it correctly before it goes live. A rule that looks right can behave differently than expected when applied to specific URL patterns - and the feedback loop is slow if your only testing option is waiting for Google Search Console to report problems after deployment.

This guide walks through building a correct robots.txt file using a visual generator that validates rules in real time and lets you test paths before pushing anything live. Each step corresponds to a real decision you need to make during setup.

Step 1: Identify What You Need to Block

Before opening any tool, list the paths you have a specific reason to restrict. The most common candidates:

/admin/ or /wp-admin/ - CMS admin area
/staging/ or /dev/ - internal preview environments
/api/ - back-end API endpoints
/internal-search/ - search result pages that generate duplicate content
/?sort= or /?filter= - parameterized URL variants

The goal is a minimal list. Every path you Disallow reduces what gets crawled, which affects how quickly new and updated content is discovered. Over-blocking is a real risk, especially when wildcards are involved.

If you're not sure whether a path should be blocked, the default is to leave it open. Restricting it later is easier than discovering you've blocked something important after the fact.

Step 2: Open the Generator and Add Your User-Agent Block

Open the EvvyTools Robots.txt Generator. The tool opens with a template pre-populated based on your selected platform (WordPress, Next.js, Shopify, Laravel, or generic).

Start with a universal block using User-agent: * to apply rules to all compliant crawlers. If you have crawler-specific requirements - for example, a Crawl-delay directive for Bingbot only - you can add a second block for User-agent: Bingbot after the universal rules.

Most sites need only a single user-agent block. Multiple blocks are useful when you have different access policies for different bots, which is rare in practice.

Step 3: Add Disallow Rules for Each Path

For each path on your list, add a Disallow rule. In the generator, this is a form field that accepts path strings and validates them as you type.

A few syntax details to confirm while you're adding rules:

Disallow: /admin/ (with trailing slash) blocks everything under /admin/. Disallow: /admin (without slash) blocks only the exact string /admin, not subdirectories.
Disallow: / blocks the entire site. This is almost never what you want unless you're staging a site that shouldn't be crawled at all.
An empty Disallow: value means "allow everything" for that user-agent block - it's used to explicitly override a more restrictive rule elsewhere.

The generator flags trailing slash issues and empty-value edge cases as you add rules, which catches most syntax errors before they become deployment problems.

Photo by Pixabay on Pexels

Step 4: Add Allow Rules Where Needed

If you're blocking a broad path but need specific sub-paths to remain crawlable, add Allow rules before the Disallow in the same block.

The most common example with WordPress:

User-agent: *
Disallow: /wp-content/
Allow: /wp-content/uploads/

This blocks the WordPress content directory but keeps the uploads folder (which contains images and downloadable assets) accessible for crawling. Blocking /wp-content/ entirely prevents Google from loading JavaScript and CSS files, which impairs rendering.

Allow rules must be placed before the Disallow they're overriding. The generator handles ordering automatically when you use the form fields - but if you're editing the raw output, order matters.

Step 5: Add a Sitemap Directive

The Sitemap directive is the most underused line in a robots.txt file. One line:

Sitemap: https://yourdomain.com/sitemap_index.xml

This tells all compliant crawlers where to find your sitemap, not just the bots you've manually submitted to via Search Console. You can include multiple Sitemap lines if you have separate sitemap files for different content types.

The generator adds this line in the output section. If your sitemap URL isn't standard (for example, a plugin puts it at /sitemap.xml instead of /sitemap_index.xml), update it in the generator before copying the output.

Step 6: Test Your Rules Against Real Paths

This is the step most developers skip, and it's where most issues originate. The generator includes a URL tester: enter a path and it shows you whether the current rules allow or block that path for each user-agent block.

Test at minimum:

The homepage (/)
A representative product or content page
Your admin path to confirm it's blocked
A parameterized URL variant if you added wildcard rules
A static asset path (/js/main.js, /css/style.css) to confirm assets aren't accidentally blocked

If any static asset path shows as blocked, find the rule that's blocking it and either narrow the Disallow pattern or add an Allow exception. Blocked CSS and JavaScript files prevent proper page rendering in Google's crawl.

"The URL tester is the highest-value feature of any robots.txt tool. Writing rules is easy. Knowing which paths those rules actually affect - including paths you didn't intend to restrict - is what prevents the kind of silent blocking that only shows up in Search Console three weeks later." - Dennis Traina, founder of 137Foundry

Step 7: Copy and Deploy

Once your rules look correct and the URL tests pass, copy the generated output. The file should contain only plain text with no special characters, no quotes around values, and no HTML formatting.

Deploy it to the root of your domain at exactly /robots.txt. Test the URL directly in a browser after uploading to confirm the file is accessible and correctly formatted.

Step 8: Verify in Google Search Console

After deploying, Google Search Console has two useful verification paths:

First, the URL Inspection tool - enter a URL you intended to remain accessible and confirm Googlebot has access. It will specifically flag if a robots.txt rule is blocking the URL.

Second, check the legacy robots.txt tester (accessible through the index settings for your property) to see the currently cached version of your file. This confirms the deployed file is what Google is reading.

It can take up to 24 hours for Google to refresh its cached copy after a change. Don't expect immediate results.

What Stays the Same Over Time

A robots.txt file that's correct on deploy day should need minimal changes unless your URL structure changes. The maintenance tasks are:

Review after any major URL restructure or platform migration
Update Sitemap line if your sitemap URL changes
Add new Disallow rules if new admin or utility paths are introduced
Remove outdated rules when old paths no longer exist

How to Write a robots.txt File That Actually Works covers the full theory behind these decisions - the directives, platform-specific configurations, and what Google actually does with the file. The free robots.txt tools by EvvyTools handle the generation and validation workflow in one browser-based tool without any setup required.

The robots.txt standard is documented in RFC 9309 for anyone who wants the authoritative reference on how Googlebot and Bingbot parse the file. Ahrefs includes a robots.txt checker as part of its site audit tool, useful for post-deployment validation at scale.

Photo by jarmoluk on Pixabay