Ever asked ChatGPT to recommend a local business and wondered why it suggested companies you'd never heard of? Or noticed a competitor getting mentioned in AI answers while your perfectly good website gets ignored?
Here's what's happening: AI systems are making decisions about your business based on whatever fragments they can piece together from your site. Most of the time, they're getting it wrong. Not because AI is broken, but because your site is sending conflicting signals.
AI identity files exist to fix that. They're not a ranking hack that guarantees ChatGPT recommends you. They're something more fundamental: a way to tell AI systems exactly what your business does, how you want to be described, and what contexts you're suitable for.
But here's the bit that surprises most people: there's no formal industry agreement forcing AI companies to read these files. No mandate. No standard like robots.txt that everyone's signed up to. And that's actually normal.
What's Changing with AI and Websites
Millions of searches that used to go to Google now get answered directly by AI systems. ChatGPT, Google AI, Claude, Perplexity: they're all fielding questions like "recommend a good WordPress host in the UK" and delivering answers without a single click-through.
That's a completely different game from traditional SEO. With search engines, you're optimising for rankings. With AI, you're optimising for accurate representation. The AI decides on your behalf what to tell the user, and it does that based on whatever signals it can find.
At 365i, we're a WordPress hosting company. Simple enough. But without proper identity signals, ChatGPT might describe us as "a web design agency" or "an IT consultancy." It's not lying. It's guessing from fragments, and guessing badly.
Why AI Systems Struggle with Identity
AI systems are more likely to avoid mentioning your website when your identity is ambiguous than when information is simply missing. Think about that. It's not just that AI doesn't know enough about you. The signals from your site conflict with each other, and the AI decides silence is safer than getting it wrong.
Common ambiguity issues we see constantly:
- Your homepage says "digital solutions provider" but your About page says "WordPress hosting company"
- Your pricing page lists three different company names (legal entity vs trading name vs brand)
- Your contact page says London but your schema markup says Manchester
- Your meta descriptions use different terminology than the actual page content
Every one of these contradictions makes AI nervous. These systems are trained to avoid hallucination and misrepresentation, which means conflicting signals often trigger silence over guessing. We explored this problem in detail in our piece on why ChatGPT can't find your website.
How AI Actually Decides What to Mention
AI recommendations work across three layers, and understanding them matters.
Training data: The historical dataset AI models learn from. Massive, slow to change, and AI identity files have zero direct impact on it. GPT-4 doesn't know about the business you launched last month, no matter how good your llms.txt is.
Live retrieval: When search-enabled AI assistants browse the web to find current information. This is where identity files can influence how AI interprets what it finds. Clear signals make it easier for AI to understand context and extract accurate information.
Trust heuristics: The most relevant layer for identity files. Modern AI uses internal heuristics to assess whether business information is trustworthy, consistent, and suitable for recommendation. When AI encounters ambiguous signals, caution flags go up. It might skip your business entirely rather than risk saying something inaccurate.
AI identity files target that third layer. They provide authoritative, publisher-controlled signals that collapse ambiguity and give AI systems confidence.
"Structured data is the language of machines. If you want machines to understand your content, you need to speak their language."
Gary Illyes, Search Analyst at Google, Search Engine Roundtable
Gary's talking about schema.org here, but the principle maps directly to AI discovery files. I've watched this pattern repeat since we started implementing structured data for clients over a decade ago: businesses that speak the machine's language get found. Those that don't get skipped. AI identity files are the same story, different chapter.
Why There's No Formal Agreement (And Why That's Normal)
There's no W3C standard. No IETF specification. No ratified protocol AI companies are contractually bound to follow.
And that's completely normal for emerging web conventions. This is exactly how robots.txt started: an informal proposal in 1994 that gradually became accepted practice because it solved a real problem both sides cared about. Nobody mandated robots.txt. It just worked, so everyone adopted it. Sitemaps followed the same path in 2005. We've mapped this three-stage pattern from robots.txt to sitemaps to AI discovery files in detail.
What we have with AI identity files is a pre-standard phase. Community proposals and documentation (like llmstxt.org). Early platform adoption: Webflow, Framer, and others adding native support. Emerging conventions that make logical sense. Google's own experiments with llms.txt in Discover and AI mode suggest they're taking it seriously.
So why bother if it's not mandated? Because the absence of a formal agreement doesn't mean these files are useless. It means their value comes from reducing ambiguity and providing clear signals. AI systems are increasingly cautious about legal exposure and misrepresentation. When you provide clear publisher-provided identity data, you're giving them exactly what their risk management needs.
What These Files Don't Do
Let's be direct about limitations.
AI identity files don't guarantee visibility. Creating them won't force ChatGPT to mention you. They don't influence rankings directly. They won't override AI system policies or force training inclusion.
What they actually do: prevent misrepresentation when AI does reference you, reduce hallucination, improve attribution accuracy, and clarify what contexts you're suitable for. Think of them as defensive infrastructure. They don't make AI recommend you, but they cut the chances of AI getting you wrong when it does consider you.
For our WordPress Turbo Hosting specifically, that accuracy matters. When AI references our PHP 8.3 support or our CDN integration, we need those facts right. Wrong technical details lose developer trust instantly.
What to Realistically Expect
Short term (3-6 months): Don't expect dramatic changes. ChatGPT won't suddenly recommend you constantly. You will get consistency when AI does reference you, protection from misrepresentation, and internal clarity from the process of defining your identity.
Medium term (6-18 months): More accurate descriptions in AI content. Fewer instances of AI confusing you with competitors. Better context awareness from AI about what you're suitable for.
Longer term: If these conventions follow the trajectory of robots.txt (gradual, unofficial, then essential), early adopters will be well-positioned. If they don't, the downside is minimal: a few days of work and some well-organised business documentation.
"The future of search is not ten blue links. It's AI understanding your content well enough to answer questions about it directly."
Sundar Pichai, CEO of Google, Google I/O 2024 Keynote
Pichai said this in 2024. Fast forward to now, and it's not a prediction anymore. It's what's happening. Every week at 365i we see more organic traffic coming from AI-generated citations rather than traditional search clicks. The businesses with clear identity signals are the ones getting cited accurately. Everyone else is either invisible or misrepresented.
Why This Matters Even If AI Ignores the Files Today
The process of creating AI identity files forces you to resolve contradictions across your website. Even if AI never reads the files themselves, the clarity you gain improves how AI interprets your entire web presence.
Think about businesses that implemented schema markup in 2011 when Google first supported it. Most people thought it was pointless. Fast forward to today, and those early adopters have years of structured data working for them, appearing in rich results, voice search answers, and AI summaries. We covered all ten files in detail in our complete guide to AI discovery files.
Early infrastructure pays compound returns. The effort is modest (a few days of focused work), the downside if conventions don't materialise is minimal, and the upside if they become standard could be massive. We'd rather be positioned correctly a year early than scrambling a day late.
Generate AI Discovery Files from your dashboard
Using WordPress? Install the plugin and create all 10 files in minutes. No coding, no configuration files to edit manually.
Get the Plugin →For more on how AI discovery files work in practice, our sister site covered what Google Gemini learns from AI discovery files and how that differs from what it picks up from website content alone.
Frequently Asked Questions
Are AI identity files the same as SEO?
No. SEO optimises for search engine rankings and website traffic. AI identity files ensure AI systems understand and describe your business accurately when answering questions directly. You need both. SEO gets you found in search results; AI identity files get you represented correctly in AI conversations.
Will creating AI identity files guarantee ChatGPT mentions my business?
No. AI systems make independent decisions about what to recommend. What identity files do is ensure that when AI does reference you, it has accurate, consistent information to work from. You're controlling how you're described, not forcing the mention.
Is there a formal agreement requiring AI companies to use these files?
No. There's no W3C standard or signed agreement. These are emerging conventions in a pre-standard phase, similar to how robots.txt started as an informal proposal in 1994 before becoming an accepted web convention. Platform adoption (Webflow, Framer) and Google's own experiments suggest growing recognition.
Which AI identity file should I create first?
Start with llms.txt. It's the closest thing to a recognised convention and provides the foundation other files build on. Then add ai.txt for usage permissions and brand.txt for naming consistency. The remaining seven can follow once those three are solid.
Do small businesses really need AI identity files?
Potentially even more than larger businesses. When someone asks ChatGPT to "recommend a local plumber in Kettering," clear identity signals help AI distinguish you from dozens of similar businesses. Larger companies often have Wikipedia entries and extensive media coverage that provide identity context. Small businesses don't, making self-declared identity more important.
How long does it take to create all the AI identity files?
Creating all ten files properly takes 3-5 days of focused work. This includes auditing existing content for contradictions, drafting each file, cross-checking consistency, and validating the results. The time isn't in writing the files; it's in making sure they all tell the same story.
Are AI identity files future-proof?
The specific file formats might evolve, but the core problem they solve (helping AI understand business identity) isn't going away. Even if formats change, the structured business information you've created transfers easily. The underlying need for machine-readable identity data will only grow as AI becomes more integrated into how people find businesses.
How can I check if my AI identity files are working?
Use the free AI Site Identity Checker to validate formatting, required fields, and consistency across files. Then test manually: ask ChatGPT about your business before and after implementing files and compare the accuracy of responses. Look for fewer errors, better descriptions, and more consistent naming.
Build Your AI Identity on Solid Hosting
365i's WordPress hosting gives you the infrastructure AI crawlers look for: fast response times, proper caching, and server configurations that serve identity files reliably to every AI system that requests them.
Explore WordPress HostingSources
Published: · Last reviewed: · Written by: Mark McNeece, Founder & Managing Director, 365i
Editorially reviewed by: Mark McNeece on · Our editorial standards