Last updated
How robots.txt Validation Works
A robots.txt validator parses the file according to the Robots Exclusion Protocol
and checks for syntax errors, conflicting rules, and common misconfigurations.
Google's implementation of the protocol has some extensions beyond the original spec,
including support for the Allow directive and wildcard patterns.
Common robots.txt Errors
| Error | Example | Fix |
|---|---|---|
| Blocking all crawlers | Disallow: / under User-agent: * | Change to Allow: / or remove the rule |
| Missing User-agent | Disallow: /admin without User-agent | Add User-agent: * before Disallow |
| Relative sitemap URL | Sitemap: /sitemap.xml | Use full URL: Sitemap: https://example.com/sitemap.xml |
| Trailing slash missing | Disallow: /admin | Use Disallow: /admin/ to block the directory |
| Case sensitivity | Disallow: /Admin/ | URLs are case-sensitive — match exactly |
Testing with Google's Tool
Google Search Console provides a robots.txt tester that shows exactly how Googlebot interprets your file and lets you test specific URLs against the rules. Access it at Search Console → Settings → robots.txt.
// Simple robots.txt parser
function isAllowed(robotsTxt, userAgent, url) {
const lines = robotsTxt.split('
').map(l => l.trim());
let applicable = false;
let rules = [];
for (const line of lines) {
if (line.startsWith('User-agent:')) {
const agent = line.split(':')[1].trim();
applicable = agent === '*' || agent.toLowerCase() === userAgent.toLowerCase();
if (applicable) rules = [];
} else if (applicable && line.startsWith('Disallow:')) {
const path = line.split(':')[1].trim();
if (path && url.startsWith(path)) return false;
} else if (applicable && line.startsWith('Allow:')) {
const path = line.split(':')[1].trim();
if (path && url.startsWith(path)) return true;
}
}
return true; // default: allow
}