The AI SEO checklist

This is a prioritised AI SEO checklist: do the blocking items first (let AI crawlers in, serve raw HTML), then structure and citability, then verify with a scan.

Last updated: 18 May 2026

The items map to the scored areas of the open rubric (areas A to G; the rubric's "Free extras" are tools, not checklist items). Work top to bottom: a later item is wasted if an earlier one fails. The narrative behind each item is in the full AI SEO guide.

Blocking essentials (do these first)

robots.txt exists, is reachable and parseable.
robots.txt does not block the AI agents you want to be cited by; rely on User-agent: * with Allow: / for the rest. See the AI crawlers list.
No noai / noimageai / noindex signal is unintentionally blocking AI use or indexing.
The substantive content is in the raw HTML, not injected by client-side JavaScript.
The site is served over valid HTTPS.

Discoverability

An XML sitemap is reachable and referenced from robots.txt.
A self-referencing canonical tag is present on each page.
The <html> tag declares a lang attribute.
URLs are clean, lowercase and free of session parameters.
No page needs more than one redirect hop to resolve.
An llms.txt file is present at the root and non-empty. See what llms.txt is.

Structure and extraction

Exactly one H1 per page, and headings descend without skipping levels.
A semantic <main> or <article> landmark wraps the content.
Each page has well over 100 visible words in raw HTML.
List-shaped and tabular content uses real lists and tables.
At least one valid JSON-LD block, matching the visible page (Article, Organization, FAQPage where genuine).
Open Graph and Twitter Card tags are present.
Images that carry meaning have alt text.

Citability

A named author is present (byline, meta, or JSON-LD author).
A visible publish or last-updated date, kept honest.
Claim-making pages link out to primary, authoritative sources.
Key passages are self-contained sentences that can be quoted standalone.
The home page has a reasonable internal link density.

Answer-shape

Headings include real questions where natural (What / How / Why, or ending in a question mark).
A genuine FAQ section uses FAQPage schema (FAQ rich results were restricted to a small set of authoritative government and health sites in 2023, so this now mainly helps machine extraction and AI parsing).
Answers sit directly under their question heading, concise and complete.

Classic SEO basics

Title tag in a sensible length range and unique per page.
Meta description present and within a sensible length range.
Favicon and apple-touch-icon present.
Mobile viewport declared.
Core Web Vitals (LCP, INP, CLS) in good shape on mobile and desktop.

Per-engine spot checks

ChatGPT: crawler access and answer-shape. See the ChatGPT playbook.
Perplexity: clear sourcing and fresh dates. See the Perplexity playbook.
Google AI Overviews: strong classic foundation. See the AI Overviews playbook.

Verify

Run the whole list against your site at once instead of checking by hand. Run a free scan and XEOscan grades every item above, prioritised by severity, using the same open rubric. Unsure what a term means? See the glossary.

Check every item above in one pass. Run a free scan, no signup. The checklist mirrors the open rubric, so the result is reproducible.

Published by XEOscan, a free tool operated by Constantin Ungureanu.

← Back to the scanner