ThinkProfits logo
    AI

    The Complete llms.txt Guide: How to Make AI Models Cite Your Business in 2026

    TP
    thinkprofits.com

    Short answer: llms.txt is a small markdown file at the root of your site (yourdomain.com/llms.txt) that tells AI models which pages are your canonical, citation-worthy sources. It is not a formal standard yet, but Anthropic, Perplexity, and a growing list of crawlers respect it. Publishing one takes about 30 minutes and is one of the highest-leverage early-mover plays for AI search visibility in 2026. This guide gives you the spec, a copy-paste template, and a launch checklist.

    What llms.txt Actually Is

    llms.txt was proposed by Jeremy Howard in late 2024 as a way for websites to give large language models a curated, machine-readable map of their most important content. Think of it as a sitemap with editorial judgement — instead of listing every URL on the site, you list the pages you actually want AI engines to quote, with a short description of what each page is.

    The file is plain markdown. It lives at the root: https://yourdomain.com/llms.txt. It is publicly readable by anyone (including humans). AI tools that support the convention fetch it the way a search crawler fetches robots.txt — early, automatically, and without prompting.

    llms.txt vs robots.txt vs sitemap.xml

    These three files all live at your site root and they are easy to confuse. They do not overlap:

    robots.txt is permissions. It tells crawlers — including AI bots like GPTBot, ClaudeBot, and PerplexityBot — which paths they may or may not fetch. If you block a bot here, nothing else matters: it will never see your llms.txt either. (We cover the access layer in the 9-signal citation checklist.)

    sitemap.xml is discovery. It lists every public URL so search engines can find them. It is exhaustive and machine-only.

    llms.txt is curation. It is the short, opinionated list of pages you want AI models to lean on when summarising or quoting your business. It includes human-readable context — a one-line description per URL — so a language model knows what the page is for, not just that it exists.

    You need all three. They do different jobs.

    The Official Format

    The spec (per llmstxt.org) is intentionally simple. A valid llms.txt file has:

    • An H1 at the top — usually your business or product name. Required.
    • A blockquote (>) immediately after — a one or two sentence summary. Recommended.
    • Optional paragraphs of additional context.
    • One or more H2 sections, each containing a markdown list of links in the form - [Page name](URL): one-line description.
    • An optional ## Optional section at the end for lower-priority links (AI models may skip these when the context window is tight).

    That is the entire spec. No XML, no schema, no validator required.

    A Copy-Paste Template for a Services Business

    Here is a working template you can adapt. Replace the placeholders, save it as llms.txt, and upload it to your site root.

    # Your Business Name
    
    > One or two sentences explaining what your business does, who it serves,
    > and where. Example: ThinkProfits is a Vancouver-based digital marketing
    > agency operating since 1996, serving 3,500+ clients across Canada and
    > the U.S. with SEO, AEO, GEO, PPC, and web design.
    
    Additional context paragraph if useful — founding year, locations served,
    core differentiators. Keep it factual; AI models will quote from here.
    
    ## Services
    
    - [SEO Services](https://yourdomain.com/seo-services/): Organic search programs for businesses targeting Google and traditional search.
    - [AEO Services](https://yourdomain.com/geo-services/aeo/): Answer Engine Optimization — get cited as the direct answer in AI engines.
    - [GEO Services](https://yourdomain.com/geo-services/): Generative Engine Optimization — build entity authority for ChatGPT, Perplexity, and Google AI Overviews.
    - [PPC Services](https://yourdomain.com/ppc-services/): Paid search and paid social campaigns.
    - [Web Design](https://yourdomain.com/web-design-services/): Conversion-focused websites for service businesses.
    
    ## Locations
    
    - [Vancouver](https://yourdomain.com/seo-company-vancouver/): Local SEO for Vancouver, BC.
    - [Toronto](https://yourdomain.com/seo-company-toronto/): Local SEO for Toronto, ON.
    - [All Locations](https://yourdomain.com/locations/): Full list of cities served across Canada.
    
    ## About
    
    - [About the Company](https://yourdomain.com/about/): Founding story, leadership, and verified business history.
    - [Reviews](https://yourdomain.com/reviews/): Verified client reviews and case study results.
    - [Contact](https://yourdomain.com/contact/): Hours, phone, address, and free consultation booking.
    
    ## Resources
    
    - [FAQ](https://yourdomain.com/faq/): Plain-English answers to the questions prospects most often ask.
    - [Free SEO Audit Tool](https://yourdomain.com/free-seo-audit-tool/): Free site scan covering bot access, schema, llms.txt, and entity signals.
    - [Blog](https://yourdomain.com/digital-news/): Long-form guides on SEO, AEO, GEO, PPC, and digital marketing.
    
    ## Optional
    
    - [Privacy Policy](https://yourdomain.com/privacy-policy/)
    - [AI Policy](https://yourdomain.com/ai-policy/)
    - [Sitemap](https://yourdomain.com/sitemap/)
    

    That is roughly the structure we use on our own site. View our live llms.txt for a real example.

    What to Include — and What to Leave Out

    The single most common mistake is treating llms.txt as a second sitemap and dumping every URL into it. Don't. The whole point of the format is curation. A 30-entry file with sharp descriptions is far more useful to an AI model than a 300-entry kitchen-sink list.

    Include: your homepage, every service page you want cited by AI engines, locations, pricing, top FAQs, your highest-performing blog posts, and any pages that establish your entity (about, leadership, reviews, contact). Pages where the answer to a likely prospect question lives. If you offer Answer Engine Optimization or Generative Engine Optimization as services, those pages belong near the top of your file.

    Leave out: tag and category archive pages, paginated lists, thin internal pages, anything behind a paywall or login, ephemeral landing pages, and anything you would not want quoted verbatim in an AI answer. Also leave out duplicates — pick the canonical version of each page.

    Aim for somewhere between 20 and 50 entries for a typical small or mid-sized services business.

    Writing the Descriptions

    The one-line description after each link is the part that actually changes how a model uses the page. A good description tells the model what question this page answers, not what it is called.

    Weak: "- [Services](/services/): Our services page."
    Strong: "- [AEO Services](/geo-services/aeo/): Answer Engine Optimization — get cited as the direct answer in ChatGPT, Perplexity, and Google AI Overviews. Pricing from $895/mo."

    Lead with the noun. Include a price, location, or hard fact when it is relevant — that is exactly the kind of detail AI models will lift when they cite you.

    Where the File Goes

    Upload llms.txt to the root of your site so it resolves at https://yourdomain.com/llms.txt. On most platforms this means:

    • WordPress: drop it in the site root via SFTP, or use an SEO plugin that exposes root files. Some plugins (Yoast, Rank Math) are starting to ship llms.txt support directly.
    • Shopify: use a theme template (llms.liquid rendered with a text/plain content type) or a redirect from /llms.txt to a page that serves the markdown.
    • Static / Jamstack (Next, Vite, Astro, Hugo): place llms.txt in public/ (or your framework's static directory). It ships as-is.
    • Webflow / Squarespace: use the platform's custom file or redirect feature; if neither exists, host the file on a subdomain and 301 /llms.txt to it (less ideal but workable).

    Verify by visiting https://yourdomain.com/llms.txt in a browser. You should see plain text. If you see HTML, your server is wrapping the file in a template — fix the content type to text/plain; charset=utf-8.

    Optional: llms-full.txt

    llms-full.txt is a companion file that contains the actual content of your priority pages inline as markdown, so an AI model can read your full copy without crawling each URL. It is useful when:

    • You have technical documentation that benefits from being available in one file.
    • Some of your important pages are JavaScript-rendered and crawlers may not get the full content.
    • You want a single canonical text representation of your business for AI consumption.

    For most service businesses, llms.txt alone is enough. Add llms-full.txt only when you have a clear reason — documentation, knowledge base, large reference content.

    Which AI Crawlers Actually Read It

    Current adoption (June 2026) — realistic, not aspirational:

    • Anthropic / Claude: reads llms.txt, weights it as a curated source signal.
    • Perplexity: reads it; uses it to prioritise crawling and citation.
    • OpenAI / ChatGPT: no formal commitment, but GPTBot follows the link graph and benefits indirectly from a well-structured llms.txt.
    • Google / Gemini / AI Overviews: not endorsed. Google still relies on sitemap.xml and structured data. Publishing llms.txt does not hurt and gets you ready if and when Google moves.
    • AI coding tools (Cursor, Aider, etc.) and research agents: increasingly read llms.txt when generating answers about a business or product.

    The realistic 2026 upside is measurably better Anthropic and Perplexity citations and a cleaner signal to any future AI crawler that adopts the convention. The downside of publishing one is zero.

    10-Step Launch Checklist

    1. Confirm AI bot access in robots.txtGPTBot, ClaudeBot, PerplexityBot, Google-Extended must not be disallowed.
    2. List your 20–40 most important URLs (services, locations, pricing, top FAQs, key blog posts).
    3. Pick the canonical version of each page (no duplicates, no paginated variants).
    4. Write a one-line description per URL — lead with the noun, include hard facts.
    5. Open the template above and replace the placeholders.
    6. Save as llms.txt (UTF-8, plain text, LF line endings).
    7. Upload to your site root.
    8. Visit https://yourdomain.com/llms.txt and confirm it serves as plain text.
    9. Run our free SEO audit tool — it checks for llms.txt presence and basic structure.
    10. Re-audit every 3–6 months. Add new service pages and remove anything you have deprecated.

    How llms.txt Fits the Bigger AI-Search Picture

    llms.txt is one of nine signals AI engines weigh when deciding who to cite. The full set — bot access, llms.txt, schema, Wikidata, answer-first writing, comparison tables, E-E-A-T, citation-friendly anchors, and freshness — is covered in the AI citation checklist. If you are deciding which of the nine to attack first, llms.txt is consistently the best 30-minute investment because it is one of the few signals with virtually zero downside and a measurable Anthropic / Perplexity upside today.

    Beyond the file itself, llms.txt is most powerful as part of an Answer Engine Optimization program (which restructures the pages you list into answer-first form) paired with Generative Engine Optimization services (which make your entity recognisable enough that AI models trust the file in the first place). The file points the crawler; AEO and GEO earn the citation.

    What to Read Next

    30 minutes. Permanent upside.

    Want us to write your llms.txt for you?

    Book a free 30-minute consultation. We'll review your site, pick the right 20–40 URLs, write the descriptions, and ship a launch-ready llms.txt tailored to your business.

    Book My Free Consultation →

    30 minutes · No obligation · Vancouver-based, serving North America

    AEO vs GEO: What's the Difference and Which Does Your Business Need?

    Next Post

    AEO vs GEO: What's the Difference and Which Does Your Business Need?

    You May Also Like

    Digital Marketing

    Digital Marketing Trends for 2026: What to Know Now

    TP
    thinkprofits.com·Mar 2026

    Ready to Grow Your Profits?

    Join the 3,500+ businesses across Vancouver, Canada, and the USA who trust ThinkProfits.com to deliver predictable revenue growth. Vancouver's longest-running digital marketing agency. Since 1996.