Technical SEO

llms.txt for SaaS in 2026: Implementation Guide

By Nadia Mohamed · 19 May 2026 · 12 min read · Updated 3 June 2026

What is llms.txt and does it actually do anything for SaaS SEO?

llms.txt is a proposed text-file standard, placed at a site’s root directory, that lists Markdown-formatted pointers to a website’s most authoritative content — intended to help large language models find and cite the right resources without parsing the entire site. As of June 2026 it is unofficial, no major LLM provider has committed to honouring it, and the available data shows AI crawlers visit llms.txt files at a tiny fraction of the rate at which they visit regular content. For SaaS sites the practical advice is: implement it as a 30-minute future-proofing exercise if you have well-structured documentation, but do not let it displace higher-leverage AI citation work (server-rendered schema, expert-attributed content, primary-source citations) where the measurable returns actually live in 2026.

TL;DR — Key takeaways

llms.txt was proposed in 2024 by Jeremy Howard (Answer.AI) as a way for sites to advertise their most LLM-relevant content to AI crawlers. It is a plain Markdown file at /llms.txt listing URLs grouped by H2 section headings, each with a brief description of what the URL covers.
No major LLM vendor (OpenAI, Anthropic, Google, Perplexity) has officially confirmed honouring llms.txt in production crawlers. At Google’s Search Central Deep Dive event in July 2025, Gary Illyes stated that Google does not use llms.txt as a ranking or crawling signal, despite community speculation that they might be experimenting with it.
OtterlyAI’s 90-day experiment on its own site found that AI crawlers hit /llms.txt in roughly 0.1% of AI-bot visits — meaningful as a signal that adoption is currently negligible, with the caveat that one site over one window is a thin evidence base for confident projection.
The implementation cost is low (a static text file, no dependencies, no build pipeline integration) and the maintenance cost is low (quarterly review of listed URLs). The opportunity cost is mainly attention: every hour spent perfecting llms.txt is an hour not spent on schema implementation, primary-source citations, or content quality — all of which have measurable impact on AI citation eligibility today.
The pragmatic recommendation for SaaS sites in 2026: ship an llms.txt file if your documentation is clean and stable, treat it as low-priority maintenance, and do not over-invest in optimising it. Measure success through your AI referral traffic baseline (covered in the AI referral traffic tracking piece), not through llms.txt visit counts.
llms.txt sits inside the broader 12-phase SEO & GEO audit framework — specifically Phase 8 (AI citation readiness), where it ranks below schema, entity signals, and content structure as a citation-eligibility driver.

How does llms.txt differ from robots.txt for AI crawling?

The two files solve different problems despite the surface similarity. robots.txt is a deny/allow file — it tells crawlers which URLs they are or are not permitted to fetch, and the rules are enforceable in the sense that mainstream crawlers respect them (or risk reputational and legal consequences when they don’t). llms.txt is a recommend file — it tells AI systems “if you’re going to crawl this site for content to cite, start with these URLs because they are the most useful.” There is no enforcement mechanism, no industry consensus on whether to honour it, and no Search Console equivalent to confirm whether any particular crawler used it.

The other functional difference is encoding. robots.txt uses a strict directive syntax (User-agent: GPTBot / Disallow: /); llms.txt uses Markdown, with H2 headings to group resources and bullet lists to point at URLs with brief descriptions. The Markdown choice was deliberate — LLMs are unusually good at parsing Markdown — but it also means there is no formal validator, no error spec, and no canonical example beyond the proposal document.

Where Google specifically stands: Gary Illyes (Google Search Relations) stated in July 2025 that no current AI service has said they use llms.txt, which is among the most authoritative on-the-record statements from any major vendor. Community claims that Google “endorsed” llms.txt through the Agent2Agent (A2A) protocol launch in April 2025 are unsupported by the official A2A announcement, which does not mention llms.txt. Treat the protocol as a community-driven proposal with no current vendor endorsement.

For SaaS sites, robots.txt remains the file that controls AI crawler access. The standard 2026 robots.txt pattern allows or blocks GPTBot, ClaudeBot, PerplexityBot, and Google-Extended at the user-agent level, with separate rules for each. llms.txt does not replace this — it sits alongside it, offering a curated pointer list to crawlers that have already been granted access. Implementing llms.txt without first deciding the robots.txt policy is the wrong sequence; the access decision precedes the discoverability decision.

What does the available evidence actually say about llms.txt impact?

The most-cited piece of empirical work is OtterlyAI’s 2025 experiment: they hosted an llms.txt file on their own site for 90 days and logged AI-crawler visits. The headline result was that /llms.txt received roughly 0.1% of AI-bot traffic — and average content pages on the same site received roughly 3× more AI-crawler visits than the llms.txt endpoint did. This is the best available point-in-time data on adoption. It is not, however, a randomised experiment, it is a single-site observation, and it does not tell us how AI crawlers behaved on sites with different content structures, different domain authority, or different llms.txt formats.

0.1%

of AI-bot visits went to /llms.txt over a 90-day experiment

Source: OtterlyAI

Content pages on the same site drew roughly 3× more AI-crawler visits than the llms.txt endpoint.

The interpretation gap matters because llms.txt advocacy and llms.txt scepticism both lean on this data. The advocacy framing is “early adoption gives a competitive head-start for when adoption catches up.” The sceptical framing is “if a year-long experiment shows 0.1% utilisation, the protocol is not on a clear path to becoming load-bearing.” Both readings are defensible from the same data; both readings should be held loosely until more independent studies emerge.

What is harder to argue with is the relative-leverage picture. ConvertMate’s analysis of 80M+ AI citations across ChatGPT, Perplexity, Claude, and Google AI Overviews found that pages with valid schema markup had 67% higher citation eligibility, pages with expert-attributed quotes had 41% higher citation rates, pages with original data had a 4.1× citation multiplier, and pages with content updated within 30 days had a 3.2× freshness multiplier. None of those signals depend on llms.txt. All of them are validated against a far larger dataset than any llms.txt experiment to date.

The cluster-consistent reading: llms.txt is plausibly useful at the margin, but the citation-eligibility drivers that move the needle today are upstream of it. The right priority order in 2026 is schema → expert attribution → primary-source citations → content freshness → entity signals → (then) llms.txt.

If you do implement it, what should the file actually look like?

The format proposed by Jeremy Howard’s original specification is a Markdown file at the site root with a few structural conventions:

The H1 at the top names the site or company.
An optional blockquote provides a one-paragraph description of what the site does.
H2 sections group resources by category — Documentation, API Reference, Pricing, Tutorials.
Each section uses a Markdown list of links: [link text](url): one-sentence description.
An optional “Optional” H2 section at the bottom lists resources that are less essential but still useful for deeper context.

A minimum viable llms.txt for a SaaS site might look like this:

# Acme Analytics

> Acme Analytics is a product-analytics platform for SaaS companies, helping product teams track user behaviour, retention, and funnel performance.

## Documentation
- [Getting Started](https://acmeanalytics.com/docs/getting-started): Installation and first-event walkthrough
- [SDKs](https://acmeanalytics.com/docs/sdks): Official SDKs for JavaScript, iOS, Android, Python, Go
- [API Reference](https://acmeanalytics.com/docs/api): Complete REST and GraphQL endpoint reference

## Product
- [Pricing](https://acmeanalytics.com/pricing): Per-event pricing tiers and feature comparison
- [Features](https://acmeanalytics.com/features): Funnel analysis, retention, A/B testing, cohort analysis

## Optional
- [Blog](https://acmeanalytics.com/blog): Product updates and analytics best practices
- [Changelog](https://acmeanalytics.com/changelog): Release notes for the past 12 months

Three implementation decisions matter more than the rest:

What to include

The right material for llms.txt is stable, authoritative, reference-quality content — documentation, API reference, pricing, feature pages. Avoid blog posts, case studies, and marketing landing pages, both because they change frequently (the URL list goes stale) and because AI crawlers are already good at finding them via standard discovery. The point of llms.txt is to surface the content that’s structurally important but might not be obviously discoverable from the homepage navigation.

How long to make it

The llms.txt specification does not formally cap file size, but community implementations converge on keeping the main file small and curated — often under ~10KB of Markdown, typically 20-50 resource entries. (Bulk content belongs in an optional llms-full.txt, not in llms.txt itself.) SaaS sites with extensive documentation should resist the urge to list every URL: the value is curation, not enumeration. If the file becomes a sitemap clone, AI crawlers gain nothing from parsing it versus parsing the actual sitemap.

Where to host it

The file lives at /llms.txt on the root domain. Served as text/plain or text/markdown, no authentication, no redirects, no canonical conflicts. CDN caching is fine and recommended — the file should be one of the cheapest endpoints on the site.

When does it not make sense to implement llms.txt at all?

Three scenarios where the file is unlikely to help and may distract:

The site doesn’t have well-structured documentation yet

If the SaaS product’s docs are still being built out, the URLs in the llms.txt file will go stale quickly, the maintenance overhead exceeds the upside, and the LLMs that do parse the file will encounter broken or outdated references. Fix the docs structure first; ship llms.txt after.

The schema implementation isn’t done

Schema markup affects AI citation eligibility with measurable, published evidence. llms.txt’s effect is at-best speculative. If both are on the to-do list, schema ships first. The structured data for AI search piece covers the deployment pattern; the order matters because schema is what AI crawlers use to evaluate citation eligibility once they find the content, regardless of how they found it.

The site uses heavy client-side rendering

Most AI crawlers (GPTBot, ClaudeBot, PerplexityBot) do not execute JavaScript — Google-Extended is the exception, since it inherits Googlebot’s rendering infrastructure. If primary content is client-rendered and invisible in the initial HTML, no llms.txt file changes that for the crawlers that don’t render. The fix is server-side rendering — covered in the headless technical SEO piece — and llms.txt is downstream of that fix in priority.

How do you actually measure whether your llms.txt is doing anything?

The honest answer is: with current tooling, the signal is weak and indirect. Three measurement layers, in order of usefulness:

Server log analysis

Filter access logs for requests to /llms.txt, grouped by user-agent. The user agents to watch are GPTBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot (Perplexity), Google-Extended (Google AI Overviews), and CCBot (Common Crawl, which feeds many training pipelines). A flat-line monthly count of 1-5 hits is consistent with the OtterlyAI baseline; a sharp inflection upward would indicate adoption is changing — which, in 2026, would be newsworthy in its own right.

AI referral traffic monitoring

Track the absolute volume of AI-referred sessions to your site over time and the URL distribution within those sessions. If llms.txt is working, you’d expect AI-referred sessions to skew toward the URLs you listed in the file — particularly the documentation and API reference URLs that benefit most from explicit pointer. The measurement methodology is covered in the AI referral traffic tracking piece, which lays out the GA4 channel grouping pattern needed to separate AI referrals from generic direct traffic.

Manual citation tracking

Periodically query the major AI assistants (ChatGPT Search, Perplexity, Claude, Google AI Overviews) with prompts about your product category and see which URLs they cite. If they cite docs URLs you listed in llms.txt, the file may be helping; if they cite the same URLs they’d have cited from sitemap discovery, the file is decorative. This is qualitative and labour-intensive, but it’s the only signal that maps to the question you actually care about (does the file change citation patterns?).

Where does llms.txt sit in the broader GEO toolkit?

The cluster-consistent framing is that llms.txt is a small lever in a larger system. Generative Engine Optimization (GEO) is the discipline of making content discoverable, parsable, and cite-worthy for AI engines. Within that discipline, the levers stack roughly as:

Content quality and primary-source attribution — the largest lever; AI engines preferentially cite content with named experts, original data, and verifiable references.
Schema markup — second-largest; ConvertMate’s data shows 67% higher citation eligibility for schema-rich pages.
Server-side rendering of primary content — gate condition; if AI crawlers can’t see the content in the initial HTML, none of the other levers apply.
Entity signals — Person schema with sameAs to LinkedIn / academic profiles / verifiable affiliations; covered in the E-E-A-T and author entity piece.
Content freshness — pages updated within the last 30 days have a documented citation multiplier.
llms.txt and analogous discoverability hints — marginal lever as of 2026; cheap to implement, low ceiling, potential upside if adoption broadens.

The SaaS-specific consideration: if your documentation is the primary product surface (developer tools, API-first platforms, integration-heavy products), the upside of an llms.txt file is materially higher than for marketing-led SaaS. Developers and API integrators query AI assistants directly for “how do I integrate X with Y” questions, and the AI engines pulling from your docs is exactly the citation pattern llms.txt is trying to influence. For marketing-led SaaS where the buyer journey runs through landing pages and case studies, the lever is much smaller.

FAQ

Do any AI platforms officially honour llms.txt today?

No official support exists. Gary Illyes (Google Search Relations) stated in July 2025 that “no current AI service has said they use llms.txt.” Crawler-log data shows utilisation at roughly 0.1% of AI-bot traffic.

How is llms.txt different from robots.txt?

robots.txt is an access-control file with industry-wide enforcement; llms.txt is a discoverability-hint file with no enforcement mechanism. robots.txt decides crawler access; llms.txt suggests prioritised URLs once admitted.

Should I prioritise llms.txt over schema markup for AI citations?

No. Schema markup shows 67% citation eligibility uplift in ConvertMate’s 80M+ AI citation analysis. Schema implementation takes priority over llms.txt.

What content should I list in my llms.txt file?

Include stable, reference-quality content (documentation, API reference, pricing, features). Avoid blog posts and marketing pages that change frequently.

Can I measure whether llms.txt is actually doing anything for my site?

Use server log analysis (track /llms.txt hits by user-agent), AI referral traffic monitoring in GA4, and manual citation testing via ChatGPT, Perplexity, Claude, and Google AI Overviews.

Where this fits

Service GEO & Technical SEO Architecture Audit Read → More like this All Technical SEO articles Read → Talk to me Book a 30-min discovery call Read →