The AI Readiness Maturity Model is a 5-level framework for evaluating whether a website is prepared to be discovered, parsed, and cited by AI engines such as ChatGPT, Perplexity, Gemini, and Claude. It is based on 7 technical signals that AI engines use to decide which sources to trust and quote.

We developed this model after scanning the top 500 websites in the world and finding a counterintuitive result: no site scored Level 4. The internet's most visible brands are not technically ready for AI search. Read the full report.

The 5 levels

Level Name Score range Meaning
0 Invisible 0 to 19 / 100 Blocks AI crawlers or no signals at all.
1 Discoverable 20 to 39 / 100 Crawlable. Low structured data.
2 Indexable 40 to 59 / 100 Organization and Article schemas present.
3 Retrievable 60 to 89 / 100 llms.txt, FAQPage, author Person schema.
4 Cited 90 to 100 / 100 All signals present. Observed citations in AI answers.

Level 0: Invisible

The site is either blocking AI crawlers in robots.txt (GPTBot, ClaudeBot, PerplexityBot, or others) or has no structured data at all. An AI engine cannot parse the site as a distinct entity. No matter what signals are present elsewhere, a blocked crawler forces Level 0 because citation is impossible without crawl access.

Real examples from our Top 500 scan: sites that explicitly disallow GPTBot while still being major brands. Blocking happens for different reasons (copyright protection, bandwidth savings, platform policy) but the net effect is the same: invisibility in AI answers.

Level 1: Discoverable

AI crawlers can access the site, but there is little or no structured metadata. The site exists in the crawl corpus but AI engines have limited ability to classify its content, extract facts, or identify the author. Most of the Top 500 sits here.

To move up from Level 1, add Organization schema on the homepage and BlogPosting or Article schema on content pages.

Level 2: Indexable

Structured data is present (Organization schema on homepage, Article or BlogPosting schema on content pages) and robots.txt references a Sitemap. AI engines can classify the site, understand its content type, and identify key entities.

To move up from Level 2, add a llms.txt file at the root, FAQPage schema on high-value pages, and Person schema for authors with sameAs links to their LinkedIn or other verifiable identity.

Level 3: Retrievable

Rich signals are in place: llms.txt, FAQPage schema, and author Person schema with sameAs. AI engines have everything they need to quote the site confidently. FAQPage in particular is cited at disproportionately high rates because its structured Q&A format matches the shape of AI engine answers.

To move up from Level 3, the only remaining work is earning actual citations: AI engines need to re-crawl the site, index the signals, and start quoting it in responses. This typically takes 1 to 4 weeks.

Level 4: Cited

Every technical signal is present and the site is observed being cited by AI engines for its target queries. Reaching Level 4 requires both technical readiness and active measurement: tools like Appearly's Monitor query AI engines directly with your target keywords and record which sources the engines cite.

In the Top 500 scan, no site reached Level 4 based on technical signals alone. The maximum score was 65 out of 100, which places the best-performing sites squarely in Level 3 territory.

The 7 signals behind the score

Each signal is checked at the level it realistically appears. Site-level signals live in robots.txt or at the homepage. Content-level signals require a detectable content page (blog post or article). The multi-location FAQPage signal is checked across several locations.

  • AI crawler access (20 pts, site-level): robots.txt allows GPTBot, ChatGPT-User, anthropic-ai, ClaudeBot, PerplexityBot, and Google-Extended. All 6 must be allowed.
  • llms.txt present (10 pts, site-level): A valid /llms.txt file at the domain root, containing plain-text or Markdown content.
  • Organization schema (15 pts, homepage): JSON-LD with @type Organization, LocalBusiness, or Corporation on the homepage.
  • Article schema (15 pts, content page): JSON-LD with @type Article, BlogPosting, NewsArticle, or TechArticle on a content page we detect by crawling the homepage links. N/A if no content page is detectable.
  • FAQPage schema (15 pts, multi-location): JSON-LD with @type FAQPage on the homepage, content page, or /faq endpoints. Pass if found in any location.
  • Author Person schema (10 pts, content page): JSON-LD with @type Person and a sameAs field linking to LinkedIn or another verifiable identity, on the sampled content page. N/A if no content page is detectable.
  • Sitemap in robots.txt (15 pts, site-level): A Sitemap: directive in robots.txt pointing to a valid sitemap.xml.

Why split by level? Checking all 7 signals on the homepage (as some early tools do) would be methodologically weak. Author Person schema belongs on blog posts, not homepages. Article schema belongs on articles. The model reflects where each signal actually lives.

How to use this model

Treat the model as a diagnostic, not a race. Most sites benefit most from moving out of Level 0 or 1, where the gap in citation probability is largest. The jump from Level 2 to Level 3 is the second-most valuable move: adding llms.txt, FAQPage, and author schema requires a single deploy and meaningfully changes how AI engines parse the site.

Run a scan on your own domain to see where you stand: AI Readiness Checker.