The AI SEO guide for 2026
AI SEO is the work of making your website easy for AI assistants to fetch, understand and cite, without losing the classic search signals that still bring traffic. This guide covers what changed, what to fix first, and how to verify it.
Last updated: 18 May 2026
You can check any site against everything in this guide with a free scan, no signup. If you only read one section, read "What to fix first".
What AI SEO is (and is not)
AI SEO is the practice of making a site readable and citable by AI assistants such as ChatGPT, Claude, Perplexity and Google AI Overviews, while keeping the classic search signals that still drive clicks. It is sometimes called AI search optimization. It is not a separate project bolted onto SEO and it is not a trick: it is the same discipline of clear, crawlable, trustworthy content, judged by a new kind of reader.
The difference from classic SEO is the outcome you optimise for. Classic SEO aims to rank a link in a list. AI SEO also aims to be the source an assistant quotes inside the answer it generates, often before the user ever sees a list of links. The two are complementary: the foundations overlap almost entirely, so the work compounds. For the specific vocabulary, see what GEO means and how GEO, AEO, AIO and SEO differ.
Why AI crawlers change the rules
AI assistants do not browse the web the way a person does. They send named crawlers to fetch pages, then use what those crawlers retrieved to ground their answers. The single most important consequence: most major AI crawlers fetch raw HTML and do not reliably execute JavaScript, so they reward content that is already present in the page source and shaped as clear answers.
JavaScript and the raw-HTML test
If your headline, your main copy, your prices or your answers only appear after a client-side framework runs in a browser, an AI crawler that does not run that JavaScript sees an almost empty page. The fix is not to remove JavaScript; it is to make sure the substantive content exists in the HTML the server sends, before any script runs. A quick test: fetch your page with JavaScript disabled, or view source, and check that the real content and headings are there. This is exactly what XEOscan evaluates, because it deliberately does not execute JavaScript, mirroring how those crawlers actually read you.
How AI decides what to cite
No assistant publishes a ranking formula, so treat specifics with caution. What is consistent and defensible: assistants prefer sources that are easy to fetch, easy to parse, unambiguous, and visibly trustworthy. In practice that means content that is present in raw HTML, structured with real headings and lists, written so a single sentence answers a single question without back-references, and attributable to a clear author and publisher with a visible date. You cannot guarantee a citation. You can make your page the easiest correct source to quote.
What to fix first
Work in this order, because each step is worthless if the one above it fails.
- Let AI crawlers in. Confirm your robots.txt does not block the assistants you want to be cited by. A welcoming robots.txt typically names a few agents explicitly and relies on
User-agent: *withAllow: /for the rest. XEOscan checks 13 AI agents and opt-out tokens: GPTBot, ChatGPT-User, OAI-SearchBot, ClaudeBot, Claude-User, Claude-SearchBot, PerplexityBot, Perplexity-User, CCBot, Amazonbot, Meta-ExternalAgent, plus the Google-Extended and Applebot-Extended robots.txt opt-out tokens, which control AI training rather than crawling. - Serve content in raw HTML. The body text, headings and answers must be in the server response, not injected later by a script.
- Be discoverable. Use HTTPS with a valid certificate, a self-referencing canonical, an XML sitemap, and clean URLs. An llms.txt file at the site root is an emerging convention, not an adopted web standard, and support across assistants is not guaranteed, but it is cheap to add and XEOscan checks for it.
- Structure for extraction. One H1, headings that descend without skipping levels, lists and tables where they aid reading, and JSON-LD that matches what is on the page.
- Earn citation trust. A named author, a clear publisher, a visible last-updated date, and outbound links to authoritative sources.
Let AI crawlers in
Blocking a crawler is the one mistake that makes every other optimisation pointless. Two patterns matter. The welcoming pattern allows the assistants you want to reach. The restrictive pattern blocks specific agents you do not want training on or fetching your content. Be deliberate: blocking GPTBot, for example, removes you from the content ChatGPT can fetch for grounding, which is rarely what a site that wants citations intends. Google-Extended and Applebot-Extended are training opt-out tokens, not crawl blocks, so they behave differently from a normal user-agent disallow. The full per-agent reference, with how to allow or block each one, is on the AI crawlers list. The rules themselves follow the Robots Exclusion Protocol (RFC 9309), and each operator documents its own agents, for example OpenAI's crawler documentation and Google's list of crawlers.
Be discoverable
Discoverability is unglamorous and decisive. Ship a valid HTTPS certificate, because an insecure or mixed-content page is a trust and crawl problem. Set a self-referencing canonical so duplicates do not split signals. Maintain an XML sitemap and reference it from robots.txt. Keep URLs short, lowercase and meaningful. Add an llms.txt file pointing assistants at your most important pages: it is an emerging convention with uneven adoption, so treat it as a low-cost signal, not a guarantee.
Structure your content for extraction
Structure is how a machine turns your prose into a quotable answer. Use exactly one H1 that states the page's subject. Let headings descend H1 to H2 to H3 with no skipped levels, because a jump from H1 to H3 reads as a broken hierarchy to parsers and is penalised by the rubric. Put genuinely list-shaped information in lists and genuinely tabular information in tables. Add JSON-LD that mirrors the visible page exactly: Article or TechArticle for guides, FAQPage where real questions and answers exist, BreadcrumbList for position, DefinedTerm or DefinedTermSet for definitions. Do not add schema that describes content not on the page; mismatched markup is a quality problem, not a shortcut. The full open scoring rubric lists every signal and its weight.
Earn citation trust
Assistants quote sources they can defend. Make defending you easy. Put a real author name on substantive pages, with a link to a credible profile. Name the publisher clearly. Show a visible last-updated date and keep it honest. Link out to primary, authoritative sources where you make a factual claim. State facts plainly in self-contained sentences, so a single sentence can be lifted and still be true on its own. None of this guarantees a citation; all of it makes you the safest correct source to quote.
Optimise per engine
The foundations above earn citations across every assistant. Beyond them, each surface has its own emphasis: ChatGPT and ChatGPT search reward clear answer-shaped pages its crawlers can fetch; Perplexity surfaces and links the sources behind its answers, so visible sourcing and dates matter; Google AI Overviews sit on top of a strong classic-search foundation. Read the per-engine playbooks for the detail: get cited by ChatGPT, optimise for Perplexity, and Google AI Overviews.
Quick checklist (summary)
A short version to skim. For the complete, copy-friendly version see the full AI SEO checklist; this is just the teaser. Unsure on a term? The glossary defines them.
- robots.txt does not block the AI agents you want.
- Main content is in raw HTML, not script-injected.
- HTTPS valid, canonical self-referencing, sitemap present.
- One H1, no skipped heading levels.
- Lists and tables used where they aid extraction.
- JSON-LD matches the visible page.
- Named author, clear publisher, visible date.
- Self-contained, quotable sentences.
- llms.txt present (emerging convention, low cost).
- Re-scan after any template or content change.
How to verify your site
Do not guess whether AI can read you: measure it. Run a free scan and XEOscan fetches your site the way an AI crawler does, with no JavaScript executed, then scores about 40 signals across eight areas and prioritises the findings by severity so you fix what actually blocks citation first. Re-scan after any significant content or template change, or monthly. Results expire after 7 days.
Frequently asked questions
Is AI SEO different from classic SEO?
It overlaps heavily and shares the same foundations of crawlable, structured, authoritative content. The difference is the goal: classic SEO aims to rank a link, while AI SEO also aims to be the source an AI assistant quotes inside a generated answer.
Do AI crawlers run JavaScript?
Most major AI crawlers fetch raw HTML and do not reliably execute JavaScript, so content that only appears after client-side rendering is often invisible to them. Put the substantive content in the server HTML.
How long until an AI assistant cites a new page?
There is no fixed time. It varies by assistant, by how often each crawler revisits, and by whether the content is clear and citable. Treat it as ongoing work, not a one-time switch.
Does AI SEO replace classic SEO?
No. It complements it. The signals that make a page rank in classic search also make it easier for AI to fetch, understand and cite, so the work compounds rather than competes.
Published by XEOscan, a free tool operated by Constantin Ungureanu.
← Back to the scanner