Last updated
What Is robots.txt?
The robots.txt file is a plain text file placed at the root of a website
(https://example.com/robots.txt) that tells web crawlers which pages or
sections they are allowed or not allowed to access. It follows the Robots Exclusion Protocol
(REP), a standard that all major search engines respect.
robots.txt Syntax
# Allow all crawlers to access everything
User-agent: *
Allow: /
# Block all crawlers from admin area
User-agent: *
Disallow: /admin/
Disallow: /private/
# Block only Googlebot from a specific path
User-agent: Googlebot
Disallow: /no-google/
# Block all crawlers entirely
User-agent: *
Disallow: /
# Sitemap location
Sitemap: https://example.com/sitemap.xml
Directive Reference
| Directive | Description |
|---|---|
User-agent | The crawler this rule applies to. * matches all. |
Disallow | Path the crawler should not access. Empty value means allow all. |
Allow | Explicitly allow a path (overrides Disallow for that path). |
Sitemap | Full URL to the XML sitemap. Can appear multiple times. |
Crawl-delay | Seconds to wait between requests (not supported by Google). |
Important Limitations
robots.txt is advisory, not enforced. Malicious bots ignore it entirely.
It does not prevent pages from being indexed if other sites link to them —
use the noindex meta tag for that. Disallowing a URL in robots.txt
can actually prevent Google from reading the noindex tag on that page.