AI crawlers list: user-agents and robots.txt control
AI assistants fetch the web using named crawlers such as GPTBot, ClaudeBot and PerplexityBot. This page lists the main ones and shows exactly how to allow or block each in robots.txt.
Last updated: 18 May 2026
The list
These are the 13 AI user-agents and opt-out tokens XEOscan checks, the same set named in the open rubric. The exact, current behaviour of each is defined by its operator's own documentation, linked below the table; treat the "what it is for" column as a general category, not a guarantee of behaviour.
| User-agent / token | Operator | What it is for |
|---|---|---|
| GPTBot | OpenAI | Crawls pages for OpenAI |
| ChatGPT-User | OpenAI | Fetches a page in response to a user action in ChatGPT |
| OAI-SearchBot | OpenAI | Supports OpenAI search features |
| ClaudeBot | Anthropic | Crawls pages for Anthropic |
| Claude-User | Anthropic | Fetches a page in response to a user action in Claude |
| Claude-SearchBot | Anthropic | Supports Claude search features |
| PerplexityBot | Perplexity | Crawls pages for Perplexity |
| Perplexity-User | Perplexity | Fetches a page in response to a user action in Perplexity |
| CCBot | Common Crawl | Builds the public Common Crawl dataset, widely reused for AI training |
| Amazonbot | Amazon | Crawls pages for Amazon services |
| Meta-ExternalAgent | Meta | Crawls pages for Meta |
| Google-Extended | robots.txt opt-out token: controls use of content for AI training, not crawling | |
| Applebot-Extended | Apple | robots.txt opt-out token: controls use of content for AI training, not crawling |
Authoritative references for current behaviour: OpenAI's bot documentation and Google's list of crawlers. robots.txt itself follows the Robots Exclusion Protocol (RFC 9309).
How to allow AI crawlers
The simplest welcoming pattern allows everything for all agents, then names the AI agents you specifically want to be explicit about. A typical welcoming robots.txt names a subset of agents and relies on User-agent: * with Allow: / for the rest, so it does not need to enumerate all 13 tokens. XEOscan checks 13 tokens, but a clean robots.txt like the one below is enough because the wildcard already permits the others:
User-agent: *
Allow: /
User-agent: GPTBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: PerplexityBot
Allow: /
Sitemap: https://example.com/sitemap.xml
How to block specific AI crawlers
To ask a specific crawler not to fetch your pages, disallow its user-agent:
User-agent: GPTBot
Disallow: /
Be deliberate. Blocking a crawler reduces the content that operator can fetch to ground its answers, which is rarely what a site that wants citations intends. Note that Google-Extended and Applebot-Extended are training opt-out tokens, not crawl blocks: listing them does not stop normal crawling, it signals that content should not be used for AI training.
A note on noai and noimageai
You may see noai and noimageai suggested as page-level signals. These are unofficial AI opt-out signals with no W3C or IETF standard behind them, so support is not guaranteed and they should not be relied on as a control.
How to verify your robots.txt
XEOscan checks all 13 tokens above against your robots.txt and reports which AI agents you allow, which you block, and which training opt-out tokens are set, and it can suggest a corrected robots.txt when something is off. Run a free scan to see exactly where you stand.
FAQ
Should I block AI crawlers?
If you want to be cited by AI assistants, generally no: blocking their crawlers removes the content they can fetch to ground answers. Block only if you have a specific reason not to be used by a given operator.
Does blocking GPTBot remove me from ChatGPT?
Blocking GPTBot in robots.txt asks OpenAI's GPTBot not to fetch your pages, which reduces the content available to ground answers. The precise effect on any specific product is defined by the operator's documentation, not by this page.
What is the difference between a crawl block and a training opt-out?
A crawl block (disallowing a crawler user-agent) asks a bot not to fetch your pages at all. A training opt-out token such as Google-Extended or Applebot-Extended does not block crawling; it signals that already-crawled content should not be used for AI training.
Published by XEOscan, a free tool operated by Constantin Ungureanu.
← Back to the scanner