Back to tool

Robots + Sitemap Inspector — User Guide

This tool fetches a site's robots.txt, the page itself, and its first sitemap server-side, then reports indexing signals and crawler rules — with special attention to AI crawlers.

Quick start

  1. Enter a domain or a full page URL and click Inspect.
  2. Review the page signals (canonical, indexability, meta robots, X-Robots-Tag).
  3. Review the robots.txt parse, including AI/LLM crawler verdicts.
  4. Review the sitemap type and URL count.

Page signals

For the URL you enter, the tool reads:

  • Canonical — the rel="canonical" link, i.e. the page's preferred URL.
  • Indexability — whether noindex appears in the meta robots tag or the X-Robots-Tag header.
  • Meta robots — the <meta name="robots"> content (defaults to index, follow if absent).
  • X-Robots-Tag — the response header equivalent, often used for non-HTML files.

robots.txt

The robots.txt is fetched from the site origin and parsed into user-agent groups (allow/disallow rules) and declared sitemaps.

AI / LLM crawler rules

The tool evaluates well-known AI crawlers — GPTBot, ClaudeBot, Google-Extended, CCBot, PerplexityBot, Bytespider, Applebot-Extended, and more — and labels each:

  • blocked — fully disallowed (Disallow: /).
  • limited — some paths disallowed.
  • allowed — no specific rule (falls back to *), or explicitly allowed.

This is the fastest way to confirm whether your content is opted in or out of AI training and answer engines.

Sitemap

The first sitemap (from robots.txt, or /sitemap.xml as a fallback) is fetched and classified as a sitemap index (links to child sitemaps) or a urlset (links to pages), with a count and a few sample entries.

Caveats

  • Results reflect what the server returned to this tool; sites may serve different robots.txt or headers by user-agent or region.
  • Only the first sitemap is fetched — large sites split URLs across many child sitemaps under an index.
  • A missing robots.txt means all crawlers are allowed by default.