What is the AI Readiness Maturity Model?

The AI Readiness Maturity Model is a 5-level framework (Invisible, Discoverable, Indexable, Retrievable, Cited) that measures how well a website is prepared to be discovered, parsed, and cited by AI engines such as ChatGPT, Perplexity, Gemini, and Claude.

How do I measure my AI readiness level?

Run a free scan at appearly.ai/ai-readiness-checker/ (no signup required). The tool checks 7 signals: AI crawler access in robots.txt, presence of llms.txt, Organization schema, Article schema, FAQPage schema, author Person schema with sameAs, and Sitemap directive in robots.txt.

How does the AI Readiness Maturity Model differ from Google PageRank or Domain Rating?

PageRank and Domain Rating measure link authority for traditional search. The AI Readiness Maturity Model measures technical signals that determine whether an AI engine can discover, parse, and cite your site. A site can rank #1 on Google and still score Level 0 if it blocks AI crawlers or lacks structured data.

What is the difference between Level 3 Retrievable and Level 4 Cited?

Level 3 means the site has all the technical signals needed to be cited: crawler access, structured data, llms.txt, FAQPage, author schema. Level 4 adds confirmed observation of the site in actual AI engine responses for its target queries. Technical readiness is necessary but not sufficient for citation.

How long does it take to move up a level?

Most signals are a single deployment: updating robots.txt, adding a llms.txt, embedding JSON-LD schema. A site can move from Level 0 to Level 3 in one afternoon of engineering work. Moving from Level 3 to Level 4 requires the AI engines to re-crawl and index your changes, which typically takes 1 to 4 weeks.

The AI Readiness Maturity Model: 5 Levels from Invisible to Cited

The AI Readiness Maturity Model is a 5-level framework for evaluating whether a website is prepared to be discovered, parsed, and cited by AI engines such as ChatGPT, Perplexity, Gemini, and Claude. It is based on 7 technical signals that AI engines use to decide which sources to trust and quote.

We developed this model after scanning 500 top websites and finding a counterintuitive result: no site scored Level 4. The internet's most visible brands are not technically ready for AI search. Read the full report.

The 5 levels

Level	Name	Score range	Meaning
0	Invisible	0 to 19 / 100	Blocks AI crawlers or no signals at all.
1	Discoverable	20 to 39 / 100	Crawlable. Low structured data.
2	Indexable	40 to 59 / 100	Organization and Article schemas present.
3	Retrievable	60 to 89 / 100	llms.txt, FAQPage, author Person schema.
4	Cited	90 to 100 / 100	All signals present. Observed citations in AI answers.

Level 0: Invisible

The site is either blocking AI crawlers in robots.txt (GPTBot, ClaudeBot, PerplexityBot, or others) or has no structured data at all. An AI engine cannot parse the site as a distinct entity. No matter what signals are present elsewhere, a blocked crawler forces Level 0 because citation is impossible without crawl access.

Real examples from our Top 500 scan: sites that explicitly disallow GPTBot while still being major brands. Blocking happens for different reasons (copyright protection, bandwidth savings, platform policy) but the net effect is the same: invisibility in AI answers.

Level 1: Discoverable

AI crawlers can access the site, but there is little or no structured metadata. The site exists in the crawl corpus but AI engines have limited ability to classify its content, extract facts, or identify the author. Most of the Top 500 sits here.

To move up from Level 1, add Organization schema on the homepage and BlogPosting or Article schema on content pages.

Level 2: Indexable

Structured data is present (Organization schema on homepage, Article or BlogPosting schema on content pages) and robots.txt references a Sitemap. AI engines can classify the site, understand its content type, and identify key entities.

To move up from Level 2, add a llms.txt file at the root, FAQPage schema on high-value pages, and Person schema for authors with sameAs links to their LinkedIn or other verifiable identity.

Level 3: Retrievable

Rich signals are in place: llms.txt, FAQPage schema, and author Person schema with sameAs. AI engines have everything they need to quote the site confidently. FAQPage in particular is cited at disproportionately high rates because its structured Q&A format matches the shape of AI engine answers.

To move up from Level 3, the only remaining work is earning actual citations: AI engines need to re-crawl the site, index the signals, and start quoting it in responses. This typically takes 1 to 4 weeks.

Level 4: Cited

Every technical signal is present and the site is observed being cited by AI engines for its target queries. Reaching Level 4 requires both technical readiness and active measurement: tools like Appearly's Monitor query AI engines directly with your target keywords and record which sources the engines cite.

In the Top 500 scan, no site reached Level 4 based on technical signals alone. The maximum score was 65 out of 100, which places the best-performing sites squarely in Level 3 territory.

The 7 signals behind the score

Each signal is checked at the level it realistically appears. Site-level signals live in robots.txt or at the homepage. Content-level signals require a detectable content page (blog post or article). The multi-location FAQPage signal is checked across several locations.

AI crawler access (20 pts, site-level): robots.txt allows GPTBot, ChatGPT-User, anthropic-ai, ClaudeBot, PerplexityBot, and Google-Extended. All 6 must be allowed.
llms.txt present (10 pts, site-level): A valid /llms.txt file at the domain root, containing plain-text or Markdown content.
Organization schema (15 pts, homepage): JSON-LD with @type Organization, LocalBusiness, or Corporation on the homepage.
Article schema (15 pts, content page): JSON-LD with @type Article, BlogPosting, NewsArticle, or TechArticle on a content page we detect by crawling the homepage links. N/A if no content page is detectable.
FAQPage schema (15 pts, multi-location): JSON-LD with @type FAQPage on the homepage, content page, or /faq endpoints. Pass if found in any location.
Author Person schema (10 pts, content page): JSON-LD with @type Person and a sameAs field linking to LinkedIn or another verifiable identity, on the sampled content page. N/A if no content page is detectable.
Sitemap in robots.txt (15 pts, site-level): A Sitemap: directive in robots.txt pointing to a valid sitemap.xml.

Why split by level? Checking all 7 signals on the homepage (as some early tools do) would be methodologically weak. Author Person schema belongs on blog posts, not homepages. Article schema belongs on articles. The model reflects where each signal actually lives.

How to use this model

Treat the model as a diagnostic, not a race. Most sites benefit most from moving out of Level 0 or 1, where the gap in citation probability is largest. The jump from Level 2 to Level 3 is the second-most valuable move: adding llms.txt, FAQPage, and author schema requires a single deploy and meaningfully changes how AI engines parse the site.

Run a scan on your own domain to see where you stand: AI Readiness Checker.