FixThatApp

Robots.txt Generator Guide: Syntax, Common Mistakes & Best Practices

Updated March 19, 2026

A robots.txt file is one of the most misunderstood files on the web. Many site owners either ignore it entirely or misconfigure it in ways that unintentionally block important content from search engines. This guide covers what robots.txt actually does (and, critically, what it does not do), the complete syntax, practical configuration examples, and the mistakes that cause the most SEO damage.

What Robots.txt Does — and What It Doesn't

Robots.txt is a plain text file placed at the root of your domain (https://www.example.com/robots.txt) that communicates crawling preferences to web crawlers — search engine spiders, link checkers, AI training crawlers, and other automated bots.

The critical word is "communicates preferences" — not enforces. Robots.txt is a polite request, not a security barrier. Reputable crawlers like Googlebot, Bingbot, and most established web indexers honor the file. Malicious crawlers, scrapers, and vulnerability scanners typically do not. Never rely on robots.txt to protect sensitive content — anyone can read your robots.txt file and use it as a map to find the very pages you're trying to hide.

For actually protecting sensitive content, use proper authentication (login requirements), server-level access controls, or the noindex meta tag combined with authentication.

The Robots.txt Syntax Explained

User-agent

Specifies which crawler the following rules apply to. The wildcard * matches all crawlers not otherwise specified:

User-agent: *          # applies to all crawlers
User-agent: Googlebot  # applies only to Google's crawler
User-agent: Bingbot    # applies only to Bing's crawler

Disallow

Tells the crawler not to request the specified URL path or pattern. An empty Disallow: value means nothing is disallowed (allows everything):

Disallow: /admin/       # block the /admin/ directory and all sub-paths
Disallow: /private.html # block a specific page
Disallow: /search?      # block all query strings starting with /search?
Disallow:               # allow everything (empty = no restriction)

Allow

Explicitly permits access to a path that would otherwise be blocked by a broader Disallow rule. More specific rules take precedence:

User-agent: *
Disallow: /private/
Allow: /private/public-doc.html  # this specific page is still allowed

Sitemap

Tells crawlers the location of your XML sitemap. This is optional but recommended — it helps crawlers discover all indexable URLs efficiently:

Sitemap: https://www.example.com/sitemap.xml

Crawl-delay

Requests that the crawler wait a specified number of seconds between requests. Note: Googlebot does not respect Crawl-delay — use Google Search Console's crawl rate settings for Googlebot instead. Other crawlers like Bingbot do honor it:

Crawl-delay: 2   # request 2-second delay between requests

Generate Your Robots.txt File

Configure your crawl rules visually and download a ready-to-deploy robots.txt file in seconds.

Open Robots.txt Generator

Common Configuration Examples

Allow All Crawlers (Default Behavior)

If you want all crawlers to access everything — which is the default when robots.txt is absent — you can either omit the file entirely or use this minimal configuration:

User-agent: *
Disallow:

Sitemap: https://www.example.com/sitemap.xml

Block Admin and Internal Pages

The most common legitimate use of robots.txt: preventing admin interfaces, login pages, and internal tools from appearing in search results.

User-agent: *
Disallow: /admin/
Disallow: /wp-admin/
Disallow: /login
Disallow: /cart/
Disallow: /checkout/

Sitemap: https://www.example.com/sitemap.xml

Block All Crawlers (Staging Environment)

Staging environments should be completely blocked from indexing to prevent duplicate content issues and accidental indexing of unfinished pages:

User-agent: *
Disallow: /

This is one of the few situations where blocking everything is correct. Crucially, remember to change this before deploying to production — a production site with Disallow: / will be completely removed from search results within days.

Allow Google but Block Other Bots

User-agent: Googlebot
Allow: /

User-agent: *
Disallow: /

Block Specific AI Training Crawlers

Several AI companies operate crawlers to collect training data. You can block specific crawlers by user-agent name:

User-agent: GPTBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: *
Allow: /

What Happens if You Block Googlebot

If you add Disallow: / for Googlebot or block specific important pages, those pages will not be crawled or indexed. But there's a subtlety: Google may still show blocked pages in search results if other sites link to them. When Googlebot is blocked from crawling a URL but the URL is linked from elsewhere, Google can list it in search results with a message like "no information available about this page." The page may appear in search results even though Google couldn't read it.

To fully remove a page from search results, you need the noindex meta tag on the page itself — which requires Googlebot to be able to crawl the page in order to see that tag. This means you generally cannot use both Disallow and noindex for the same page effectively — they conflict with each other.

The Biggest Robots.txt Mistakes

Blocking CSS and JavaScript Files

This is one of the most damaging robots.txt mistakes and is surprisingly common, particularly on WordPress sites. If Googlebot cannot access your CSS and JavaScript files, it cannot fully render your pages. Google uses rendering to evaluate page quality, user experience, and Core Web Vitals. A page that looks complete to users but cannot be rendered by Google is evaluated as a poorly performing, potentially low-quality page.

Check: if your robots.txt contains rules like Disallow: /wp-content/ or Disallow: /*.css, you may be blocking the files Google needs to render your site. Remove these rules.

Accidentally Disallowing Everything on Production

The classic disaster: a staging robots.txt (Disallow: /) is accidentally deployed to production, or a developer adds a blanket disallow during a migration and forgets to revert it. A site can lose essentially all its search traffic within days. If you ever see a sudden dramatic traffic drop, checking robots.txt should be among the first diagnostics.

Expecting robots.txt to Secure Sensitive Content

As noted earlier, robots.txt is public and readable by anyone. A rule like Disallow: /confidential-reports/ literally advertises the existence of that directory to anyone reading the file. Malicious bots will specifically target paths listed in robots.txt. Use authentication and server-level access controls for anything genuinely sensitive.

Using Robots.txt When noindex Is More Appropriate

Robots.txt controls crawling; the noindex meta tag controls indexing. If you want a page crawled (for link equity to flow through it) but not shown in search results, use noindex. If you want to reduce crawl budget consumption on low-value pages (like filtered product pages with thousands of variations), use Disallow. Using Disallow when you actually want noindex is a common confusion that produces unexpected results.

How to Test Your Robots.txt File

Google Search Console

Google Search Console includes a robots.txt tester that shows exactly how Googlebot interprets your robots.txt file. You can test specific URLs to see whether they would be allowed or blocked. Access it at: Settings > Crawling > robots.txt. This is the most authoritative tool for Googlebot-specific validation.

Direct URL Check

Visit https://www.yourdomain.com/robots.txt in a browser. If the file loads correctly, you'll see the plain text content. If you get a 404 or any other error, the file isn't configured correctly at the server level.

Third-Party Validators

Tools like Screaming Frog (desktop SEO crawler) and various online robots.txt validators can parse your file and flag syntax errors, conflicting rules, and paths that may be accidentally blocked.

Robots.txt vs. noindex: Choosing the Right Tool

GoalUse Robots.txtUse noindex Tag
Reduce crawl budgetYes — Disallow blocks crawlingNo — page still gets crawled
Remove page from search resultsNo — can still appearYes — definitively removes
Block staging from indexingYes — Disallow: /Also valid per-page
Pass link equity through non-indexed pageNo — blocks link discoveryYes — noindex + crawlable
Prevent image indexingCan block image crawlersUse X-Robots-Tag header

How to Use the Robots.txt Generator

  1. Open the Robots.txt Generator.
  2. Choose whether to allow or block all crawlers as the baseline.
  3. Add specific user-agent rules for crawlers you want to treat differently (Googlebot, Bingbot, AI crawlers).
  4. Enter paths you want to Disallow, one per line.
  5. Add your sitemap URL.
  6. Set a Crawl-delay if appropriate for your server capacity.
  7. Preview the generated file, then copy or download it.
  8. Place the file at the root of your domain: yourdomain.com/robots.txt.