Short answer: llms.txt is a small markdown file at the root of your site (yourdomain.com/llms.txt) that tells AI models which pages are your canonical, citation-worthy sources. It is not a formal standard yet, but Anthropic, Perplexity, and a growing list of crawlers respect it. Publishing one takes about 30 minutes and is one of the highest-leverage early-mover plays for AI search visibility in 2026. This guide gives you the spec, a copy-paste template, and a launch checklist.
What llms.txt Actually Is
llms.txt was proposed by Jeremy Howard in late 2024 as a way for websites to give large language models a curated, machine-readable map of their most important content. Think of it as a sitemap with editorial judgement — instead of listing every URL on the site, you list the pages you actually want AI engines to quote, with a short description of what each page is.
The file is plain markdown. It lives at the root: https://yourdomain.com/llms.txt. It is publicly readable by anyone (including humans). AI tools that support the convention fetch it the way a search crawler fetches robots.txt — early, automatically, and without prompting.
llms.txt vs robots.txt vs sitemap.xml
These three files all live at your site root and they are easy to confuse. They do not overlap:
robots.txt is permissions. It tells crawlers — including AI bots like GPTBot, ClaudeBot, and PerplexityBot — which paths they may or may not fetch. If you block a bot here, nothing else matters: it will never see your llms.txt either. (We cover the access layer in the 9-signal citation checklist.)
sitemap.xml is discovery. It lists every public URL so search engines can find them. It is exhaustive and machine-only.
llms.txt is curation. It is the short, opinionated list of pages you want AI models to lean on when summarising or quoting your business. It includes human-readable context — a one-line description per URL — so a language model knows what the page is for, not just that it exists.
You need all three. They do different jobs.
The Official Format
The spec (per llmstxt.org) is intentionally simple. A valid llms.txt file has:
- An
H1at the top — usually your business or product name. Required. - A blockquote (
>) immediately after — a one or two sentence summary. Recommended. - Optional paragraphs of additional context.
- One or more
H2sections, each containing a markdown list of links in the form- [Page name](URL): one-line description. - An optional
## Optionalsection at the end for lower-priority links (AI models may skip these when the context window is tight).
That is the entire spec. No XML, no schema, no validator required.
A Copy-Paste Template for a Services Business
Here is a working template you can adapt. Replace the placeholders, save it as llms.txt, and upload it to your site root.
# Your Business Name
> One or two sentences explaining what your business does, who it serves,
> and where. Example: ThinkProfits is a Vancouver-based digital marketing
> agency operating since 1996, serving 3,500+ clients across Canada and
> the U.S. with SEO, AEO, GEO, PPC, and web design.
Additional context paragraph if useful — founding year, locations served,
core differentiators. Keep it factual; AI models will quote from here.
## Services
- [SEO Services](https://yourdomain.com/seo-services/): Organic search programs for businesses targeting Google and traditional search.
- [AEO Services](https://yourdomain.com/geo-services/aeo/): Answer Engine Optimization — get cited as the direct answer in AI engines.
- [GEO Services](https://yourdomain.com/geo-services/): Generative Engine Optimization — build entity authority for ChatGPT, Perplexity, and Google AI Overviews.
- [PPC Services](https://yourdomain.com/ppc-services/): Paid search and paid social campaigns.
- [Web Design](https://yourdomain.com/web-design-services/): Conversion-focused websites for service businesses.
## Locations
- [Vancouver](https://yourdomain.com/seo-company-vancouver/): Local SEO for Vancouver, BC.
- [Toronto](https://yourdomain.com/seo-company-toronto/): Local SEO for Toronto, ON.
- [All Locations](https://yourdomain.com/locations/): Full list of cities served across Canada.
## About
- [About the Company](https://yourdomain.com/about/): Founding story, leadership, and verified business history.
- [Reviews](https://yourdomain.com/reviews/): Verified client reviews and case study results.
- [Contact](https://yourdomain.com/contact/): Hours, phone, address, and free consultation booking.
## Resources
- [FAQ](https://yourdomain.com/faq/): Plain-English answers to the questions prospects most often ask.
- [Free SEO Audit Tool](https://yourdomain.com/free-seo-audit-tool/): Free site scan covering bot access, schema, llms.txt, and entity signals.
- [Blog](https://yourdomain.com/digital-news/): Long-form guides on SEO, AEO, GEO, PPC, and digital marketing.
## Optional
- [Privacy Policy](https://yourdomain.com/privacy-policy/)
- [AI Policy](https://yourdomain.com/ai-policy/)
- [Sitemap](https://yourdomain.com/sitemap/)
That is roughly the structure we use on our own site. View our live llms.txt for a real example.
What to Include — and What to Leave Out
The single most common mistake is treating llms.txt as a second sitemap and dumping every URL into it. Don't. The whole point of the format is curation. A 30-entry file with sharp descriptions is far more useful to an AI model than a 300-entry kitchen-sink list.
Include: your homepage, every service page you want cited by AI engines, locations, pricing, top FAQs, your highest-performing blog posts, and any pages that establish your entity (about, leadership, reviews, contact). Pages where the answer to a likely prospect question lives. If you offer Answer Engine Optimization or Generative Engine Optimization as services, those pages belong near the top of your file.
Leave out: tag and category archive pages, paginated lists, thin internal pages, anything behind a paywall or login, ephemeral landing pages, and anything you would not want quoted verbatim in an AI answer. Also leave out duplicates — pick the canonical version of each page.
Aim for somewhere between 20 and 50 entries for a typical small or mid-sized services business.
Writing the Descriptions
The one-line description after each link is the part that actually changes how a model uses the page. A good description tells the model what question this page answers, not what it is called.
Weak: "- [Services](/services/): Our services page."
Strong: "- [AEO Services](/geo-services/aeo/): Answer Engine Optimization — get cited as the direct answer in ChatGPT, Perplexity, and Google AI Overviews. Pricing from $895/mo."
Lead with the noun. Include a price, location, or hard fact when it is relevant — that is exactly the kind of detail AI models will lift when they cite you.
Where the File Goes
Upload llms.txt to the root of your site so it resolves at https://yourdomain.com/llms.txt. On most platforms this means:
- WordPress: drop it in the site root via SFTP, or use an SEO plugin that exposes root files. Some plugins (Yoast, Rank Math) are starting to ship llms.txt support directly.
- Shopify: use a theme template (
llms.liquidrendered with atext/plaincontent type) or a redirect from/llms.txtto a page that serves the markdown. - Static / Jamstack (Next, Vite, Astro, Hugo): place
llms.txtinpublic/(or your framework's static directory). It ships as-is. - Webflow / Squarespace: use the platform's custom file or redirect feature; if neither exists, host the file on a subdomain and 301
/llms.txtto it (less ideal but workable).
Verify by visiting https://yourdomain.com/llms.txt in a browser. You should see plain text. If you see HTML, your server is wrapping the file in a template — fix the content type to text/plain; charset=utf-8.
Optional: llms-full.txt
llms-full.txt is a companion file that contains the actual content of your priority pages inline as markdown, so an AI model can read your full copy without crawling each URL. It is useful when:
- You have technical documentation that benefits from being available in one file.
- Some of your important pages are JavaScript-rendered and crawlers may not get the full content.
- You want a single canonical text representation of your business for AI consumption.
For most service businesses, llms.txt alone is enough. Add llms-full.txt only when you have a clear reason — documentation, knowledge base, large reference content.
Which AI Crawlers Actually Read It
Current adoption (June 2026) — realistic, not aspirational:
- Anthropic / Claude: reads
llms.txt, weights it as a curated source signal. - Perplexity: reads it; uses it to prioritise crawling and citation.
- OpenAI / ChatGPT: no formal commitment, but GPTBot follows the link graph and benefits indirectly from a well-structured
llms.txt. - Google / Gemini / AI Overviews: not endorsed. Google still relies on
sitemap.xmland structured data. Publishingllms.txtdoes not hurt and gets you ready if and when Google moves. - AI coding tools (Cursor, Aider, etc.) and research agents: increasingly read
llms.txtwhen generating answers about a business or product.
The realistic 2026 upside is measurably better Anthropic and Perplexity citations and a cleaner signal to any future AI crawler that adopts the convention. The downside of publishing one is zero.
10-Step Launch Checklist
- Confirm AI bot access in
robots.txt—GPTBot,ClaudeBot,PerplexityBot,Google-Extendedmust not be disallowed. - List your 20–40 most important URLs (services, locations, pricing, top FAQs, key blog posts).
- Pick the canonical version of each page (no duplicates, no paginated variants).
- Write a one-line description per URL — lead with the noun, include hard facts.
- Open the template above and replace the placeholders.
- Save as
llms.txt(UTF-8, plain text, LF line endings). - Upload to your site root.
- Visit
https://yourdomain.com/llms.txtand confirm it serves as plain text. - Run our free SEO audit tool — it checks for
llms.txtpresence and basic structure. - Re-audit every 3–6 months. Add new service pages and remove anything you have deprecated.
How llms.txt Fits the Bigger AI-Search Picture
llms.txt is one of nine signals AI engines weigh when deciding who to cite. The full set — bot access, llms.txt, schema, Wikidata, answer-first writing, comparison tables, E-E-A-T, citation-friendly anchors, and freshness — is covered in the AI citation checklist. If you are deciding which of the nine to attack first, llms.txt is consistently the best 30-minute investment because it is one of the few signals with virtually zero downside and a measurable Anthropic / Perplexity upside today.
Beyond the file itself, llms.txt is most powerful as part of an Answer Engine Optimization program (which restructures the pages you list into answer-first form) paired with Generative Engine Optimization services (which make your entity recognisable enough that AI models trust the file in the first place). The file points the crawler; AEO and GEO earn the citation.
What to Read Next
- How to Get Cited by ChatGPT, Perplexity & Google AI Overviews → The full 9-signal checklist —
llms.txtis signal #2. - AEO vs GEO: Which Does Your Business Need? → Plain-English explainer of the two disciplines that work alongside
llms.txt. - AEO Services → Restructure your priority pages so the URLs in your
llms.txtare actually quotable. - GEO Services → Entity authority work — Knowledge Panel, Wikidata, citation acquisition.
- Free SEO Audit Tool → Checks
llms.txt, bot access, schema, and entity signals in about 20 seconds.
30 minutes. Permanent upside.
Want us to write your llms.txt for you?
Book a free 30-minute consultation. We'll review your site, pick the right 20–40 URLs, write the descriptions, and ship a launch-ready llms.txt tailored to your business.
30 minutes · No obligation · Vancouver-based, serving North America

