Robots + Sitemap Inspector — User Guide
This tool fetches a site's robots.txt, the page itself, and its first sitemap server-side, then reports indexing signals and crawler rules — with special attention to AI crawlers.
Quick start
- Enter a domain or a full page URL and click Inspect.
- Review the page signals (canonical, indexability, meta robots, X-Robots-Tag).
- Review the robots.txt parse, including AI/LLM crawler verdicts.
- Review the sitemap type and URL count.
Page signals
For the URL you enter, the tool reads:
- Canonical — the
rel="canonical"link, i.e. the page's preferred URL. - Indexability — whether
noindexappears in the meta robots tag or theX-Robots-Tagheader. - Meta robots — the
<meta name="robots">content (defaults toindex, followif absent). - X-Robots-Tag — the response header equivalent, often used for non-HTML files.
robots.txt
The robots.txt is fetched from the site origin and parsed into user-agent groups (allow/disallow rules) and declared sitemaps.
AI / LLM crawler rules
The tool evaluates well-known AI crawlers — GPTBot, ClaudeBot, Google-Extended, CCBot, PerplexityBot, Bytespider, Applebot-Extended, and more — and labels each:
- blocked — fully disallowed (
Disallow: /). - limited — some paths disallowed.
- allowed — no specific rule (falls back to
*), or explicitly allowed.
This is the fastest way to confirm whether your content is opted in or out of AI training and answer engines.
Sitemap
The first sitemap (from robots.txt, or /sitemap.xml as a fallback) is fetched and classified as a sitemap index (links to child sitemaps) or a urlset (links to pages), with a count and a few sample entries.
Caveats
- Results reflect what the server returned to this tool; sites may serve different robots.txt or headers by user-agent or region.
- Only the first sitemap is fetched — large sites split URLs across many child sitemaps under an index.
- A missing robots.txt means all crawlers are allowed by default.