Free Tool

Robots.txt Generator

Generate, preview, and download a valid robots.txt file for your website. Set crawl rules, block bots, add sitemaps — all in your browser.

Free·No signup·Client-side·Live preview·Validation

Template

User-Agent Groups

Global Directives

Website URL (optional)

Used to auto-generate sitemap when applying templates.

Sitemap URL

Host (optional)

Preferred domain. Supported by Yandex.

Crawl-delay (seconds)

Google ignores Crawl-delay.

Comments (optional)

Advanced Options

Import Existing robots.txt

Live Preview

0 lines·0 words

Validation

Checking configuration...

What Is robots.txt?

The Robots Exclusion Protocol explained.

robots.txt is a plain text file placed in the root directory of a website. It tells compliant web crawlers which URLs they may or may not access. The file follows the Robots Exclusion Protocol (REP), a voluntary standard that most search engines respect.

When a crawler visits a site, it first checks for /robots.txt at the root of the domain. If the file exists, the crawler reads the rules and adjusts its behavior accordingly. If no file exists, the crawler assumes all URLs are accessible.

Every website should have a robots.txt file — even if it allows all crawlers. A valid file prevents crawl warnings in search consoles and gives you a place to reference your XML Sitemap.

What robots.txt Does

Understand the specific role robots.txt plays in technical SEO.

Controls Crawling Behavior

robots.txt tells crawlers which parts of your site they may access. By setting Allow and Disallow rules, you guide crawlers toward important content and away from low-value pages. Crawlers that respect the protocol will follow these rules.

Manages Crawl Budget

Search engines allocate a limited crawl budget to each site. By blocking unnecessary URLs — such as admin panels, search results, or staging environments — you help search engines spend their crawl budget on pages that matter for SEO.

Reduces Server Load

Aggressive crawlers can consume significant server resources. robots.txt can block or reduce the crawl rate of resource-intensive bots. The Crawl-delay directive (supported by Bing and Yandex) lets you specify a pause between requests.

References Your XML Sitemap

The Sitemap directive in robots.txt tells crawlers where to find your XML sitemap files. This is the primary way search engines discover your sitemap and is supported by Google, Bing, Yahoo, and Yandex. You can reference multiple sitemaps in the same file.

What robots.txt Does NOT Do

These are the most common misconceptions about robots.txt.

Does NOT Prevent Indexing

This is the most common misunderstanding. robots.txt controls crawling, not indexing. If a page is disallowed in robots.txt but linked to from external sites, search engines may still index the URL — they just cannot crawl its content. To prevent indexing, use the noindex meta tag or X-Robots-Tag HTTP header.

Does NOT Protect Private Content

robots.txt is a publicly visible file. Anyone can read it and see which URLs you are trying to block. Malicious users and non-compliant crawlers will ignore robots.txt entirely. Never use robots.txt as a security mechanism — always use proper authentication and access controls.

Does NOT Remove Pages from Search Results

Pages already indexed before a robots.txt block was applied will remain in search results. Google may show them without a snippet because the content cannot be crawled, but the URL will still appear. To remove indexed pages, use a noindex directive instead.

Does NOT Replace Authentication

Blocking a URL in robots.txt does not prevent users from accessing it directly. Anyone with the URL can still visit the page. If you need to restrict access to specific users, implement server-side authentication, password protection, or IP whitelisting.

When Should You Use robots.txt?

Practical scenarios where robots.txt is the right solution.

Admin and CMS Areas

Block crawlers from /wp-admin/, /admin/, and similar backend paths. These pages have no SEO value and should not consume crawl budget.

Faceted Navigation

Ecommerce filter and sort URLs create thousands of near-duplicate pages. Block parameter-based URLs to prevent crawl waste on low-value variations.

Internal Search Pages

Search result pages (e.g. /search/?q=...) produce infinite crawlable URLs with thin content. Blocking them prevents search engines from wasting resources.

Duplicate Content

Printer-friendly versions, paginated archives, or URL parameter variants can create duplicate content issues. Use robots.txt to guide crawlers to canonical versions.

Staging Environments

Development or staging sites should always block all crawlers with Disallow: / to prevent test content from appearing in search results.

Specific Crawler Control

Block individual crawlers that consume excessive bandwidth, ignore crawl-rate directives, or are not needed for your SEO strategy. Examples include certain AI crawlers or SEO tool crawlers.

When NOT to Use robots.txt

Situations where robots.txt is the wrong tool for the job.

Blocking Pages You Want Indexed

Never block high-value content pages in robots.txt. Search engines cannot crawl blocked pages, which means they cannot evaluate their quality, extract content, or rank them. If you want pages indexed, they must be crawlable.

Protecting Sensitive Files

robots.txt does not prevent humans or malicious bots from accessing URLs. It is not a security tool. Never rely on it to protect private data, user information, or confidential documents. Use server-side authentication and encryption instead.

Replacing the noindex Directive

If you want a page to be accessible but not indexed, use the noindex meta tag or X-Robots-Tag: noindex HTTP header. Unlike robots.txt, these directives explicitly tell search engines not to include the page in their index.

Blocking CSS, JavaScript, or Images

Blocking stylesheets, scripts, and image files can harm your SEO. Google needs to render pages to evaluate them, and blocking essential resources prevents proper rendering. Only block resources that are truly unnecessary for indexing.

Best Practices for robots.txt

Follow these recommendations to get the most out of your robots.txt file.

✓

Always Include Your XML Sitemap

Add a Sitemap directive pointing to your XML sitemap. This is the most reliable way for search engines to discover all your important pages.

✓

Block Only Low-Value Paths

Be specific about what you block. Broad blocks like Disallow: / prevent all crawling. Only disallow paths that provide no SEO value.

✓

Do Not Block CSS, JS, or Fonts

Google needs to render your pages to understand their layout and content. Blocking essential resources prevents proper rendering and can negatively impact SEO.

✓

Test Changes Before Deploying

Use Google Search Console's robots.txt tester or this generator's live preview to validate your file before uploading it to your server. A typo in robots.txt can accidentally block important content.

✓

Keep the File in the Site Root

robots.txt must be placed at the root of the origin server. It is not read from subdirectories. The correct URL is always https://example.com/robots.txt.

✓

Review After Major Site Changes

When you redesign your site, change your CMS, or restructure URLs, review your robots.txt to ensure it still reflects your current architecture. Stale rules can inadvertently block new content.

✓

Use Specific User-Agent Rules

When you need to block a specific crawler, create a dedicated group for it instead of adding rules to the catch-all * group. This keeps your configuration maintainable.

Common robots.txt Mistakes

Avoid these frequent errors that can harm your SEO.

Disallow: /

Blocking the entire site is rarely the right choice. This tells all crawlers they cannot access any page on your domain. Use this only on staging or development environments. On production, it prevents all crawling and indexing.

Blocking Important Content

Accidentally blocking CSS, JavaScript, images, or high-value content pages prevents search engines from properly evaluating your site. Always verify that your disallow rules only target low-value paths.

Forgetting to Add Your Sitemap

Without a Sitemap directive, search engines must discover your content through links alone. Adding your sitemap URL to robots.txt is one of the most impactful SEO improvements you can make.

Using robots.txt Instead of noindex

If you want a page to not appear in search results, use noindex. robots.txt only prevents crawling, not indexing. A page blocked by robots.txt can still be indexed if it has external links pointing to it.

Incorrect Wildcard Usage

Wildcards (*) and end-of-URL markers ($) are not universally supported across all crawlers. Google supports them, but other search engines may not. Test wildcard patterns thoroughly.

Duplicate or Conflicting Rules

Having multiple groups for the same user-agent or conflicting Allow/Disallow directives for the same path can produce unexpected behavior. Keep your file clean and well-organized to avoid ambiguity.

robots.txt Examples by Use Case

Real-world robots.txt configurations for common website types.

Basic Website

A standard configuration that allows all crawlers to access the entire site while referencing the sitemap. Suitable for most content-driven websites.

User-agent: *
Allow: /

Sitemap: https://example.com/sitemap.xml

Blog

Blocks admin and search paths that are common in CMS platforms. The /search/ path is blocked because search result pages have thin content and create infinite crawlable URLs.

User-agent: *
Allow: /
Disallow: /admin/
Disallow: /search/

Sitemap: https://example.com/sitemap.xml

Ecommerce Store

Ecommerce sites generate thousands of URLs through faceted navigation and filtering. This configuration blocks admin panels, cart, checkout, user accounts, and parameter-based URLs that create duplicate content.

User-agent: *
Allow: /
Disallow: /admin/
Disallow: /cart/
Disallow: /checkout/
Disallow: /my-account/
Disallow: /*?sort=
Disallow: /*?filter=

Sitemap: https://example.com/sitemap.xml

WordPress

WordPress generates many system URLs that should not be crawled. This configuration blocks wp-admin, login, RSS feeds, and common WordPress paths while keeping the main content accessible.

User-agent: *
Allow: /
Disallow: /wp-admin/
Disallow: /wp-login.php
Disallow: /wp-includes/
Disallow: /feed/
Disallow: /trackback/

Sitemap: https://example.com/sitemap.xml

Shopify

Shopify stores use specific URL patterns for cart, checkout, and administrative functions. These paths have no SEO value and should be blocked to preserve crawl budget for product and collection pages.

User-agent: *
Allow: /
Disallow: /cart/
Disallow: /checkout/
Disallow: /account/
Disallow: /*?variant=

Sitemap: https://example.com/sitemap.xml

Astro

An Astro-powered static site typically has no server-side routes that need blocking. The standard configuration allows all crawling and references the sitemap for optimal search engine discovery.

User-agent: *
Allow: /

Sitemap: https://example.com/sitemap.xml

Next.js

Next.js sites may include API routes and internal paths that should not be crawled. Blocking /api/ is standard practice, but ensure your static content and pages remain accessible.

User-agent: *
Allow: /
Disallow: /api/

Sitemap: https://example.com/sitemap.xml

Documentation

Technical details about robots.txt directives, patterns, and best practices.

User-agent

The User-agent directive specifies which web crawler the following rules apply to. It is required at the start of every rule group. The value * matches all crawlers. Specific names like Googlebot or Bingbot apply only to that crawler.

Crawlers use the most specific matching group. If you have both User-agent: * and User-agent: Googlebot, Googlebot uses the Googlebot-specific rules, and other crawlers use the general rules.

Syntax: User-agent: [crawler-name] — one per group, always the first line. Common mistake: placing directives before the User-agent line or using multiple User-agent lines in the same group (only the last one is used by some parsers).

Allow and Disallow

Allow and Disallow control which URL paths a crawler may access. Both take a path value starting with /. Allow: / permits access to the entire site. Disallow: / blocks the entire site.

When both Allow and Disallow apply to the same path, the most specific match wins. If they are equally specific, Allow takes precedence. This allows you to whitelist specific files inside a disallowed directory.

Syntax: Allow: /path or Disallow: /path. Common mistake: forgetting the leading / on paths, or using Disallow: with no value (which some parsers interpret as allow all).

Sitemap

The Sitemap directive points crawlers to the location of your XML sitemap file. It must be a full absolute URL. You can include multiple Sitemap directives to reference multiple sitemap files.

Supported by Google, Bing, Yahoo, and Yandex. The Sitemap directive does not require a User-agent and can appear anywhere in the file.

Syntax: Sitemap: https://example.com/sitemap.xml. Common mistake: using a relative URL instead of an absolute URL, or forgetting to update the sitemap URL after changing domains.

Host

The Host directive specifies the preferred domain for your website. It was introduced by Yandex and is primarily used by Yandex to resolve canonical hostname issues between www and non-www versions. Google does not support this directive.

If used, include only one Host directive with the full domain including protocol: Host: https://example.com.

Syntax: Host: https://example.com. Common mistake: including multiple Host directives (only the first is used by Yandex; others are ignored) or omitting the protocol.

Crawl-delay

The Crawl-delay directive sets a delay in seconds between consecutive requests from a crawler. It was originally created to reduce server load from aggressive crawlers.

Google does not support Crawl-delay. Use Google Search Console to set Google's crawl rate. Bing and Yandex may honor Crawl-delay if specified. The value should be a positive number (e.g. Crawl-delay: 10 for 10 seconds).

Syntax: Crawl-delay: [seconds]. Common mistake: assuming Google respects this directive (it does not), or setting an excessively high delay that prevents efficient crawling.

Wildcards and Pattern Matching

Google and some other crawlers support limited wildcard pattern matching in Allow and Disallow paths:

* — Matches any sequence of characters. For example, Disallow: /private/* blocks all URLs under /private/.
$ — Marks the end of a URL. For example, Disallow: /*.pdf$ blocks all PDF files.

Not all crawlers support wildcards. The Robots Exclusion Protocol standard only specifies prefix matching.

Placement and File Location

The robots.txt file must be placed in the root directory of the origin server. For a site at https://www.example.com, the robots.txt must be accessible at https://www.example.com/robots.txt.

The file should be a plain UTF-8 text text file. Each directive should be on its own line. It is not case-sensitive by convention, though most crawlers prefer lowercase. The file must return a 200 OK or 404 status; a redirect or non-standard status may cause crawlers to ignore it.

Syntax: One directive per line. Common mistake: placing robots.txt in a subdirectory, behind a login wall, or serving it with an incorrect HTTP status code.

Comments

Comments in robots.txt start with #. They are ignored by crawlers and are used for human-readable notes. Comments can appear on their own line or at the end of a directive line.

Use comments to document why certain rules exist, note the date of changes, or explain complex configurations. They are a best practice for team-maintained websites.

Syntax: # This is a comment. Common mistake: placing inline comments on the same line as a directive without the # symbol, which causes a parse error.

robots.txt vs Meta Robots vs XML Sitemap

These three mechanisms serve different purposes in search engine optimization:

robots.txt — Controls crawling. Tells crawlers which URL paths they may access. Applied at the domain level. Does not prevent indexing.
Meta Robots — Controls indexing on individual pages. The <meta name="robots" content="noindex"> tag tells search engines not to index a specific page.
XML Sitemap — Tells search engines about all the pages on your site. Provides metadata like last modified dates, change frequency, and priority.

Best practice is to use all three together: robots.txt to manage crawl budget, meta robots to prevent indexing of specific pages, and XML sitemaps to highlight important content.

More Free SEO Tools

Complement your robots.txt with campaign tracking links.

UTM Builder

Generate campaign tracking URLs with Google Analytics GA4 parameters. Copy as plain URL, Markdown, or HTML.

Bulk UTM Generator

Import hundreds of URLs from CSV, TSV, or JSON and generate UTM links for every row at once.

Frequently Asked Questions

What is robots.txt?

robots.txt is a plain text file placed in the root of a website that tells compliant web crawlers which parts of the site they may or may not access. It follows the Robots Exclusion Protocol (REP) and is voluntarily followed by search engines like Google, Bing, and others.

Where should robots.txt be uploaded?

The robots.txt file must be placed in the root directory of your domain. For example, if your site is example.com, the file should be accessible at https://example.com/robots.txt. It cannot be read from subdirectories.

Does robots.txt prevent indexing?

No. robots.txt controls crawling, not indexing. If a page is disallowed in robots.txt but linked from external sites, search engines may still index the URL without crawling its content. To prevent indexing, use the noindex meta tag or X-Robots-Tag HTTP header.

Can I block Googlebot?

Yes. You can block Googlebot by adding a User-agent: Googlebot group with Disallow directives. Googlebot respects robots.txt and will not crawl the specified paths.

Should every website have a robots.txt file?

It is good practice to have one even if you want all crawlers to have full access. An empty or permissive robots.txt (User-agent: * Allow: /) prevents crawl errors in search console and gives you a place to add your sitemap URL.

Can I have multiple Sitemap directives?

Yes. You can include multiple Sitemap directives pointing to different sitemap files. Google and other major search engines support multiple sitemap references in robots.txt.

What happens if no robots.txt exists?

If no robots.txt file exists, crawlers assume they are allowed to crawl every URL on the site. Your pages will still be indexed normally, but you miss the opportunity to provide sitemap references or block unwanted crawlers.

Does Google support Crawl-delay?

No. Google does not support the Crawl-delay directive. To manage Google's crawl rate, use the Google Search Console crawl rate settings instead. Some other crawlers including Bing and Yandex may honor Crawl-delay.

Can robots.txt protect private information?

No. robots.txt is a publicly visible file that only blocks compliant crawlers. Malicious bots and humans can still access blocked URLs. Never rely on robots.txt for security — use proper authentication and access controls instead.

What is the difference between robots.txt and meta robots?

robots.txt controls crawling at the directory or file level across your entire site. Meta robots tags control indexing on a per-page basis in the HTML head. They serve different purposes: robots.txt = crawl control, meta robots = index control.

What is the difference between robots.txt and XML Sitemap?

robots.txt tells crawlers what not to crawl. An XML Sitemap tells crawlers what content is available and how it is organized. They work together: robots.txt excludes unimportant content, the sitemap highlights important content.

Can I block specific file types?

Yes. You can use pattern matching to block specific file types. For example, Disallow: /*.pdf$ blocks all PDF files. However, support for pattern matching varies by crawler.

How do I test my robots.txt file?

You can test your robots.txt by visiting the URL directly (yoursite.com/robots.txt), using Google Search Console's robots.txt Tester, or using the live preview in this generator to verify the syntax.

What are wildcards in robots.txt?

Google and some other crawlers support wildcards: * matches any sequence of characters, and $ marks the end of a URL. For example, Disallow: /*.pdf$ blocks all PDF files while Disallow: /private/* blocks everything under /private/.

Can I use robots.txt on subdomains?

Yes. Each subdomain requires its own robots.txt file. The file at subdomain.example.com/robots.txt only applies to subdomain.example.com, not example.com or other subdomains.

Can robots.txt block AI crawlers?

Yes. You can block known AI crawlers like GPTBot, CCBot, and others by adding dedicated User-agent groups with Disallow: /. However, new AI crawlers appear frequently, and not all respect robots.txt. Blocking AI crawlers is a valid use of robots.txt, but it is not a guaranteed solution.

Does robots.txt affect SEO?

Indirectly. robots.txt does not directly influence search rankings. However, it affects which pages search engines can crawl. Blocking important pages can prevent them from being indexed and ranked. Properly configured, robots.txt helps search engines discover and crawl your most valuable content.

How do I create a robots.txt file?

Use a text editor or a robots.txt generator like this tool. Start with a User-agent directive, then add Allow or Disallow rules for specific paths. Include a Sitemap directive pointing to your XML sitemap. Save the file as robots.txt and upload it to the root directory of your domain.

What happens if my robots.txt contains errors?

Crawlers handle errors differently. Some may ignore the entire file and crawl everything. Others may stop crawling certain sections. A syntax error in one directive can cause the rest of the file to be misinterpreted. Always validate your robots.txt before deploying.

Is robots.txt case-sensitive?

By convention, directives are case-insensitive (<code class="text-foreground bg-muted/50 rounded-sm px-1 py-0.5 text-xs font-mono">User-agent</code> and <code class="text-foreground bg-muted/50 rounded-sm px-1 py-0.5 text-xs font-mono">user-agent</code> are both valid). However, path values are typically case-sensitive on most web servers. <code class="text-foreground bg-muted/50 rounded-sm px-1 py-0.5 text-xs font-mono">Disallow: /Admin</code> may not match <code class="text-foreground bg-muted/50 rounded-sm px-1 py-0.5 text-xs font-mono">/admin</code>. It is safest to use consistent casing.

What is the Robots Exclusion Protocol?

The Robots Exclusion Protocol (REP) is the standard that defines how robots.txt works. It specifies the format and meaning of User-agent, Allow, Disallow, Sitemap, and Crawl-delay directives. The REP is a voluntary standard — compliant crawlers choose to follow it, but malicious bots may ignore it entirely.

100% Client-Side

All robots.txt generation happens locally in your browser. No data is uploaded to any server.

No Tracking

We do not collect analytics on the robots.txt configurations you create.

No Storage

Your configuration is saved locally in your browser for convenience only.

Start generating your robots.txt now

No account needed. No data leaves your browser.

Generate robots.txt