AI crawlers list: user-agents and robots.txt control

AI assistants fetch the web using named crawlers such as GPTBot, ClaudeBot and PerplexityBot. This page lists the main ones and shows exactly how to allow or block each in robots.txt.

Last updated: 18 May 2026

The list

These are the 13 AI user-agents and opt-out tokens XEOscan checks, the same set named in the open rubric. The exact, current behaviour of each is defined by its operator's own documentation, linked below the table; treat the "what it is for" column as a general category, not a guarantee of behaviour.

User-agent / token	Operator	What it is for
GPTBot	OpenAI	Crawls pages for OpenAI
ChatGPT-User	OpenAI	Fetches a page in response to a user action in ChatGPT
OAI-SearchBot	OpenAI	Supports OpenAI search features
ClaudeBot	Anthropic	Crawls pages for Anthropic
Claude-User	Anthropic	Fetches a page in response to a user action in Claude
Claude-SearchBot	Anthropic	Supports Claude search features
PerplexityBot	Perplexity	Crawls pages for Perplexity
Perplexity-User	Perplexity	Fetches a page in response to a user action in Perplexity
CCBot	Common Crawl	Builds the public Common Crawl dataset, widely reused for AI training
Amazonbot	Amazon	Crawls pages for Amazon services
Meta-ExternalAgent	Meta	Crawls pages for Meta
Google-Extended	Google	robots.txt opt-out token: controls use of content for AI training, not crawling
Applebot-Extended	Apple	robots.txt opt-out token: controls use of content for AI training, not crawling

Authoritative references for current behaviour: OpenAI's bot documentation and Google's list of crawlers. robots.txt itself follows the Robots Exclusion Protocol (RFC 9309).

How to allow AI crawlers

The simplest welcoming pattern allows everything for all agents, then names the AI agents you specifically want to be explicit about. A typical welcoming robots.txt names a subset of agents and relies on User-agent: * with Allow: / for the rest, so it does not need to enumerate all 13 tokens. XEOscan checks 13 tokens, but a clean robots.txt like the one below is enough because the wildcard already permits the others:

User-agent: *
Allow: /

User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

Sitemap: https://example.com/sitemap.xml

How to block specific AI crawlers

To ask a specific crawler not to fetch your pages, disallow its user-agent:

User-agent: GPTBot
Disallow: /

Be deliberate. Blocking a crawler reduces the content that operator can fetch to ground its answers, which is rarely what a site that wants citations intends. Note that Google-Extended and Applebot-Extended are training opt-out tokens, not crawl blocks: listing them does not stop normal crawling, it signals that content should not be used for AI training.

A note on noai and noimageai

You may see noai and noimageai suggested as page-level signals. These are unofficial AI opt-out signals with no W3C or IETF standard behind them, so support is not guaranteed and they should not be relied on as a control.

How to verify your robots.txt

XEOscan checks all 13 tokens above against your robots.txt and reports which AI agents you allow, which you block, and which training opt-out tokens are set, and it can suggest a corrected robots.txt when something is off. Run a free scan to see exactly where you stand.

See which AI crawlers your site allows. Run a free scan, no signup. Every token is graded against the same open rubric. For the file that highlights your best pages to those crawlers, see what llms.txt is, and for the full method, the AI SEO guide.

FAQ

Should I block AI crawlers?

If you want to be cited by AI assistants, generally no: blocking their crawlers removes the content they can fetch to ground answers. Block only if you have a specific reason not to be used by a given operator.

Does blocking GPTBot remove me from ChatGPT?

Blocking GPTBot in robots.txt asks OpenAI's GPTBot not to fetch your pages, which reduces the content available to ground answers. The precise effect on any specific product is defined by the operator's documentation, not by this page.

What is the difference between a crawl block and a training opt-out?

A crawl block (disallowing a crawler user-agent) asks a bot not to fetch your pages at all. A training opt-out token such as Google-Extended or Applebot-Extended does not block crawling; it signals that already-crawled content should not be used for AI training.

Published by XEOscan, a free tool operated by Constantin Ungureanu.

← Back to the scanner