Ever wondered why ChatGPT recommends some businesses but ignores others? AI systems aren't figuring this out magically from your website. They're looking for specific signals: structured files that tell them exactly what you do, how to describe you, and what they're allowed to say about you.
And here's where it gets interesting. There's not just one file. There are ten different files that work together to create your complete AI Site Identity. Most businesses have zero. Some have one or two. Almost nobody has all ten.
I ran our free AI Site Identity checker on 200 UK business websites last month. Twelve had proper AI discovery files in place. That's 6%. And these weren't tiny startups. Established companies with marketing budgets, SEO consultants, the works. They just didn't know this existed.
The AI Discovery File Ecosystem
When someone asks ChatGPT "recommend a good WordPress host in the UK," it has to make a decision. Based on what? Your homepage? Your About page? That blog post you wrote three years ago and forgot about?
Websites are messy. They're full of contradictions, marketing speak, outdated information, and content written for humans who understand context. AI systems don't do context the way we do. They need clear, structured, factual signals.
That's what AI discovery files are: your business's identity card for AI systems. Instead of letting ChatGPT or Claude guess based on fragments, you're giving them a proper dossier.
Most people think creating one file (usually llms.txt, because that's the one everyone talks about) is job done. It's not. That's like a CV that only lists your name. Technically it exists, but it's not doing you any favours. We covered the llms.txt foundation in our complete guide to creating AI identity files, but it's just one piece of the puzzle.
The ten files work as an ecosystem:
- llms.txt - Your core business identity for Large Language Models
- llm.txt - Compatibility variant that redirects to llms.txt
- llms.html - Human-readable HTML version with Schema.org structured data
- ai.txt - Permissions and usage policies for AI systems
- ai.json - Machine-parseable AI interaction guidance in JSON format
- identity.json - Structured canonical data aligned with Schema.org
- brand.txt - How you want to be named and described
- faq-ai.txt - Verified questions and answers AI can reference
- developer-ai.txt - Technical context about your platform and stack
- robots-ai.txt - Specific crawler directives for AI bots
Each one serves a purpose the others don't cover. Think of them like different departments in a company: marketing, legal, technical, customer service. They all need to exist and they all need to be saying the same things. Together, they form what we've started calling the web's identity layer, following the same pattern as robots.txt and sitemaps before them.
What Each File Actually Does
llms.txt: The Foundation
This is your core business identity file. When an AI system wants to know "what does this company actually do," llms.txt is where it looks first. It contains your company name (exactly as you want it referenced), what you do (factual, not marketing waffle), where you operate, key services, and target audience.
The biggest mistake? Making it too marketing-focused. "We're passionate about excellence" means nothing to AI. "WordPress hosting provider, Kettering, UK, established 2001" tells it everything.
llm.txt: The Compatibility Variant
Some AI systems request llm.txt (singular) rather than llms.txt (plural). Both filenames exist in the wild, so the specification includes llm.txt as a compatibility variant. The recommended approach is a 301 redirect from llm.txt to llms.txt, so you maintain a single source of truth while catching both requests.
ai.txt: The Permissions File
This tells AI systems what they're allowed to do with your content. Can they quote you? Summarise you? What about commercial use? Without ai.txt, AI systems make their own decisions about how to use your content. Sometimes they get it right. Sometimes they quote you out of context or make claims you never made.
ai.json: Machine-Readable Interaction Guidance
Where ai.txt is human-readable, ai.json delivers the same permissions and interaction guidance in structured JSON with schema validation. AI systems that process JSON natively can parse your rules without interpreting free-form text. It's the difference between handing someone a paragraph and handing them a form with checkboxes: less room for misinterpretation.
brand.txt: Naming Consistency
Controls how AI writes your company name, product names, and key terminology. Sounds trivial until you've seen ChatGPT call you three different names in the same response. Inconsistent branding in AI responses looks unprofessional and confuses potential customers.
faq-ai.txt: Verified Q&A
Pre-verified questions and answers AI can reference when asked about your business. When someone asks ChatGPT "how long does WordPress hosting setup take," and you've provided a verified answer, AI is far more likely to reference yours than guess from random web content.
developer-ai.txt: Technical Context
Technical information about your platform, stack, and infrastructure. When developers research WordPress hosting options and ask AI technical questions, you want accurate answers about your infrastructure, not guesses. "Does 365i support PHP 8.3?" should get a confident, correct answer.
robots-ai.txt: Crawler Directives
Specific instructions for AI web crawlers about what they can access. This is separate from standard robots.txt because AI crawlers often need different rules than traditional search engine bots.
identity.json: Structured Data
Your business information in machine-readable JSON. The most structured of all ten files and the easiest for AI systems to parse. API endpoints, social profiles, contact details, service catalogue: all in one clean, parseable format.
llms.html: Human-Readable Version
A formatted HTML page containing the same information as your text-based files. It serves a dual purpose: AI systems that prefer HTML can parse it, and humans (including potential customers) can read it to understand what AI knows about your business. We publish ours publicly.
How They Work Together
These files don't exist in isolation. They reference each other, support each other, and they need to tell consistent stories. This is where DIY implementations fall apart.
I reviewed someone's files last month. Their llms.txt said "Company Name Ltd" and their brand.txt said "CompanyName Ltd" (no space). Tiny detail. But when ChatGPT sees conflicting signals, it doesn't know which to trust. It might use one version, or the other, or make up a third that's neither.
The ten files operate in three layers:
Core Identity: llms.txt, llm.txt, identity.json, and llms.html establish who you are. These four must be absolutely consistent. Same company name. Same address. Same service descriptions. (llm.txt simply redirects to llms.txt, so that's handled automatically.)
Control: ai.txt, ai.json, brand.txt, and robots-ai.txt manage how AI interacts with you. ai.json delivers the same rules as ai.txt in structured JSON. If ai.txt says "yes, AI can quote our content" but robots-ai.txt blocks all AI crawlers, that's a contradiction AI systems will notice.
Enhancement: faq-ai.txt and developer-ai.txt enrich the core identity with specific, useful information. They add depth to what the foundation files establish.
"The future of search is not ten blue links. It's AI understanding your content well enough to answer questions about it directly."
Sundar Pichai, CEO of Google, Google I/O 2024 Keynote
When I first heard Pichai say this, it clicked for me why these files matter so much. If search is becoming AI answering questions directly, then the businesses providing the clearest, most structured answers are the ones getting cited. Running a managed hosting platform since 2001 has shown me that the companies investing early in new infrastructure always come out ahead.
DIY vs Professional Setup
You've got two paths.
DIY: Start with our guide to creating llms.txt, then work through the other nine using this article as your roadmap. Budget 10-15 hours and be prepared for technical challenges. Use our free checker to validate as you go.
Professional setup: All ten files created, tested, and validated. Delivered in 24-48 hours. The biggest advantage isn't speed, it's consistency. We've seen too many DIY implementations where files contradict each other because different sections were written at different times.
Either way, you'll know exactly where you stand. The free checker takes 30 seconds and scores your current AI visibility out of 100. Most sites score under 15.
Generate AI Discovery Files from your dashboard
Using WordPress? Install the plugin and create all 10 files in minutes. No coding, no configuration files to edit manually.
Get the Plugin →Testing Your AI Discovery Files
Creating the files is half the job. The other half is making sure they actually work.
The free AI Site Identity Checker validates all your files at once. It checks for proper formatting, required fields, and consistency across files. But here's what it can't check: whether your content is actually accurate and useful. That's on you.
Common issues the checker finds: missing required fields in llms.txt, contradictory information between files (different company names, conflicting location data), broken JSON syntax in identity.json, and robots-ai.txt rules that accidentally block the AI crawlers you're trying to reach.
Beyond the checker, test manually. Ask ChatGPT about your business before and after implementing files. Check if the responses improve. Ask specific questions your faq-ai.txt should answer. If AI is still guessing when you've given it verified answers, something's misconfigured. We've published a dedicated guide to validating your AI discovery files that covers exactly what to check and how to fix common issues.
"Structured data is the language of machines. If you want machines to understand your content, you need to speak their language."
Gary Illyes, Search Analyst at Google, Search Engine Roundtable
Gary's talking about traditional structured data here (schema.org), but the principle extends directly to AI discovery files. We've been implementing structured data for clients since Google first started supporting it, and the pattern is always the same: businesses that speak the machine's language get found. Those that don't get skipped. AI discovery files are just the latest chapter in that story.
For WordPress sites specifically, these files sit in your root directory alongside robots.txt. Standard text files any host can serve. There's now a free WordPress plugin on WordPress.org that generates all 10 files from your dashboard, which removes the manual file creation entirely. Sites on our global CDN already have the infrastructure optimised for AI crawlers: proper caching rules, appropriate headers, and server config that ensures AI systems can access files without triggering rate limits. We covered the broader picture of how AI discovery files help businesses get recommended on our sister site.
Get This Sorted
Ten files. One consistent story. That's all AI needs to understand your business properly.
The sites that get this right aren't doing anything clever. They're just giving AI systems what those systems are looking for, in the format they can actually read. The 94% of businesses without these files are leaving the conversation entirely. And this isn't just our opinion: investors just valued an AI visibility tracking company at $1 billion, which tells you where the industry thinks this is heading.
Check your current score with the free checker. It takes 30 seconds. Then decide whether to tackle it yourself or get it handled properly. Either way, the clock's ticking. Every week you wait is another week AI is recommending your competitors instead.
We also explored how GEO differs from traditional SEO and what website design changes are needed for AI search. Worth reading alongside this guide if you're mapping out your full AI visibility strategy.
Frequently Asked Questions
Do I really need all ten AI discovery files?
For full AI visibility, yes. Each file serves a specific purpose the others don't cover. llms.txt handles core business identity, ai.txt manages permissions, brand.txt controls naming, and ai.json provides machine-readable interaction rules. Having just one or two leaves gaps in how AI systems understand you. Start with llms.txt, ai.txt, and brand.txt as your foundation, then add the rest.
Which AI discovery file should I create first?
Start with llms.txt. It's the foundation everything else builds on. Then add ai.txt for permissions, followed by brand.txt for naming consistency. The remaining seven can follow in any order once those core three are in place.
What happens if my AI discovery files contradict each other?
AI systems get confused and may ignore your files entirely or cherry-pick incorrect information. If llms.txt says you're in London but identity.json says Manchester, AI doesn't know which to trust. Consistency across all ten files is more important than having all ten with conflicting data.
How long does it take to create all ten files myself?
Budget 10-15 hours if you understand the specifications and have your business information organised. That breaks down to 3-4 hours researching and planning, 6-8 hours creating files, and 2-3 hours testing and fixing consistency issues.
Will these files guarantee ChatGPT mentions my business?
No. AI systems make independent decisions about what to recommend based on multiple factors. What the files guarantee is that when AI does reference your business, it has accurate information to work from. You're controlling the narrative, not forcing the mention.
How often should I update my AI discovery files?
Update whenever core business information changes: new services, pricing updates, location changes. A quarterly review works well. Set a calendar reminder and check your files against current business reality. Small additions (new FAQs, updated tech specs) can happen any time.
How are AI discovery files different from SEO?
SEO helps people find your website in search results. AI discovery files help AI systems understand and describe your business when answering questions directly, which increasingly means no website click at all. You need both working together. Perfect SEO doesn't guarantee ChatGPT describes you correctly, and perfect AI files don't guarantee Google rankings.
How do I check if my AI discovery files are working?
Use the free AI Site Identity Checker to validate formatting, required fields, and cross-file consistency. It scores your visibility out of 100 and identifies specific issues. Then test manually by asking ChatGPT about your business before and after implementing files to see if responses improve.
Your AI Visibility Starts with the Right Hosting
365i's managed WordPress hosting is built for the infrastructure AI crawlers look for: proper caching, fast response times, and server configurations that don't accidentally block the systems trying to understand your business.
Explore WordPress Hosting