XEOscan

AI crawlers list: user-agents and robots.txt control

AI assistants fetch the web using named crawlers such as GPTBot, ClaudeBot and PerplexityBot. This page lists the main ones and shows exactly how to allow or block each in robots.txt.

Last updated: 18 May 2026

The list

These are the 13 AI user-agents and opt-out tokens XEOscan checks, the same set named in the open rubric. The exact, current behaviour of each is defined by its operator's own documentation, linked below the table; treat the "what it is for" column as a general category, not a guarantee of behaviour.

User-agent / tokenOperatorWhat it is for
GPTBotOpenAICrawls pages for OpenAI
ChatGPT-UserOpenAIFetches a page in response to a user action in ChatGPT
OAI-SearchBotOpenAISupports OpenAI search features
ClaudeBotAnthropicCrawls pages for Anthropic
Claude-UserAnthropicFetches a page in response to a user action in Claude
Claude-SearchBotAnthropicSupports Claude search features
PerplexityBotPerplexityCrawls pages for Perplexity
Perplexity-UserPerplexityFetches a page in response to a user action in Perplexity
CCBotCommon CrawlBuilds the public Common Crawl dataset, widely reused for AI training
AmazonbotAmazonCrawls pages for Amazon services
Meta-ExternalAgentMetaCrawls pages for Meta
Google-ExtendedGooglerobots.txt opt-out token: controls use of content for AI training, not crawling
Applebot-ExtendedApplerobots.txt opt-out token: controls use of content for AI training, not crawling

Authoritative references for current behaviour: OpenAI's bot documentation and Google's list of crawlers. robots.txt itself follows the Robots Exclusion Protocol (RFC 9309).

How to allow AI crawlers

The simplest welcoming pattern allows everything for all agents, then names the AI agents you specifically want to be explicit about. A typical welcoming robots.txt names a subset of agents and relies on User-agent: * with Allow: / for the rest, so it does not need to enumerate all 13 tokens. XEOscan checks 13 tokens, but a clean robots.txt like the one below is enough because the wildcard already permits the others:

User-agent: *
Allow: /

User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

Sitemap: https://example.com/sitemap.xml

How to block specific AI crawlers

To ask a specific crawler not to fetch your pages, disallow its user-agent:

User-agent: GPTBot
Disallow: /

Be deliberate. Blocking a crawler reduces the content that operator can fetch to ground its answers, which is rarely what a site that wants citations intends. Note that Google-Extended and Applebot-Extended are training opt-out tokens, not crawl blocks: listing them does not stop normal crawling, it signals that content should not be used for AI training.

A note on noai and noimageai

You may see noai and noimageai suggested as page-level signals. These are unofficial AI opt-out signals with no W3C or IETF standard behind them, so support is not guaranteed and they should not be relied on as a control.

How to verify your robots.txt

XEOscan checks all 13 tokens above against your robots.txt and reports which AI agents you allow, which you block, and which training opt-out tokens are set, and it can suggest a corrected robots.txt when something is off. Run a free scan to see exactly where you stand.

See which AI crawlers your site allows. Run a free scan, no signup. Every token is graded against the same open rubric. For the file that highlights your best pages to those crawlers, see what llms.txt is, and for the full method, the AI SEO guide.

FAQ

Should I block AI crawlers?

If you want to be cited by AI assistants, generally no: blocking their crawlers removes the content they can fetch to ground answers. Block only if you have a specific reason not to be used by a given operator.

Does blocking GPTBot remove me from ChatGPT?

Blocking GPTBot in robots.txt asks OpenAI's GPTBot not to fetch your pages, which reduces the content available to ground answers. The precise effect on any specific product is defined by the operator's documentation, not by this page.

What is the difference between a crawl block and a training opt-out?

A crawl block (disallowing a crawler user-agent) asks a bot not to fetch your pages at all. A training opt-out token such as Google-Extended or Applebot-Extended does not block crawling; it signals that already-crawled content should not be used for AI training.

Published by XEOscan, a free tool operated by Constantin Ungureanu.

← Back to the scanner