Skip to main content

Web Standards and AI Discovery

These terms cover the protocols, standards, and infrastructure that determine whether AI systems can find, access, and use web content. This is the territory where the llms.txt research lives, and it's messier than you'd expect.

The short version: there are multiple half-overlapping standards, competing proposals, and a Web Application Firewall industry that doesn't particularly care about any of them. Welcome.

TermWhat it is
llms.txtA proposed web standard providing AI systems with a curated Markdown summary of a site
llms-full.txtOptional companion to llms.txt with full Markdown content of all linked pages
robots.txtThe original "instructions for machines," a plain-text access control file since the 1990s
Web Application Firewall (WAF)The security layer that blocks malicious traffic (and AI crawlers as collateral damage)
Generative Engine Optimization (GEO)Structuring content to be discovered and cited by AI systems (SEO's awkward cousin)
Content SignalsGoogle's proposed standard for AI usage rights and permissions
IETF aiprefAn IETF proposal for AI access/usage preferences, formally specified but slowly ratified
CC SignalsCreative Commons' proposal for AI licensing and copyright preferences
User AgentThe identity string in every HTTP request, and the AI-web relationship's identity crisis