The AI SEO checklist
This is a prioritised AI SEO checklist: do the blocking items first (let AI crawlers in, serve raw HTML), then structure and citability, then verify with a scan.
Last updated: 18 May 2026
The items map to the scored areas of the open rubric (areas A to G; the rubric's "Free extras" are tools, not checklist items). Work top to bottom: a later item is wasted if an earlier one fails. The narrative behind each item is in the full AI SEO guide.
Blocking essentials (do these first)
- robots.txt exists, is reachable and parseable.
- robots.txt does not block the AI agents you want to be cited by; rely on
User-agent: *withAllow: /for the rest. See the AI crawlers list. - No
noai/noimageai/noindexsignal is unintentionally blocking AI use or indexing. - The substantive content is in the raw HTML, not injected by client-side JavaScript.
- The site is served over valid HTTPS.
Discoverability
- An XML sitemap is reachable and referenced from robots.txt.
- A self-referencing canonical tag is present on each page.
- The
<html>tag declares alangattribute. - URLs are clean, lowercase and free of session parameters.
- No page needs more than one redirect hop to resolve.
- An llms.txt file is present at the root and non-empty. See what llms.txt is.
Structure and extraction
- Exactly one H1 per page, and headings descend without skipping levels.
- A semantic
<main>or<article>landmark wraps the content. - Each page has well over 100 visible words in raw HTML.
- List-shaped and tabular content uses real lists and tables.
- At least one valid JSON-LD block, matching the visible page (Article, Organization, FAQPage where genuine).
- Open Graph and Twitter Card tags are present.
- Images that carry meaning have alt text.
Citability
- A named author is present (byline, meta, or JSON-LD author).
- A visible publish or last-updated date, kept honest.
- Claim-making pages link out to primary, authoritative sources.
- Key passages are self-contained sentences that can be quoted standalone.
- The home page has a reasonable internal link density.
Answer-shape
- Headings include real questions where natural (What / How / Why, or ending in a question mark).
- A genuine FAQ section uses FAQPage schema (FAQ rich results were restricted to a small set of authoritative government and health sites in 2023, so this now mainly helps machine extraction and AI parsing).
- Answers sit directly under their question heading, concise and complete.
Classic SEO basics
- Title tag in a sensible length range and unique per page.
- Meta description present and within a sensible length range.
- Favicon and apple-touch-icon present.
- Mobile viewport declared.
- Core Web Vitals (LCP, INP, CLS) in good shape on mobile and desktop.
Per-engine spot checks
- ChatGPT: crawler access and answer-shape. See the ChatGPT playbook.
- Perplexity: clear sourcing and fresh dates. See the Perplexity playbook.
- Google AI Overviews: strong classic foundation. See the AI Overviews playbook.
Verify
Run the whole list against your site at once instead of checking by hand. Run a free scan and XEOscan grades every item above, prioritised by severity, using the same open rubric. Unsure what a term means? See the glossary.
Check every item above in one pass. Run a free scan, no signup. The checklist mirrors the open rubric, so the result is reproducible.
Published by XEOscan, a free tool operated by Constantin Ungureanu.
← Back to the scanner