Skip to main content

5 posts tagged with "GEO"

Generative Engine Optimization — how content is discovered, structured, and consumed by AI systems, and the infrastructure that enables or blocks it.

View All Tags

Google Said No to llms.txt. Five Google Teams Didn't Get the Memo.

Timeline showing Google executives dismissing llms.txt in April, July, and December 2025, while five Google developer documentation properties quietly implement llms.txt files in 2026.
· ~10 min read
Ryan Goodrich
Technical Writer, AI Enthusiast, and Developer Advocate

The timeline is where the joke lives.

April 2025. Google's John Mueller compares llms.txt to the keywords meta tag. For the uninitiated, the keywords meta tag is so discredited that invoking it in SEO circles is equivalent to recommending bloodletting at a medical conference. Mueller's message was clear: llms.txt is unnecessary, self-reported data that Google has no intention of using.

July 2025. Gary Illyes, also from Google's Search team, confirms the position at Search Central Live. No support. Won't be used. Normal SEO works fine for AI Overviews. The standard is, officially, not something Google is interested in.

December 3, 2025. An SEO professional named Lidia Infante discovers an llms.txt file on Google's own Search Central documentation. Mueller's response, posted to Bluesky: "hmmn :-/". The file was removed within hours.

So far, a clean narrative. Google said no, someone at Google accidentally deployed one, it was caught and deleted, and the official position holds. Embarrassing, but coherent.

Then I started pulling at threads.

78.8% of My Validator Is Made Up (And That's the Point)

Terminal running a self-audit of DocStratum's 52 validation items: bar charts show 6 spec-compliant (11.5%), 5 spec-implied (9.6%), and 41 DocStratum extensions (78.8%). Verdict: 78.8% invented — that's the product.
· ~16 min read
Ryan Goodrich
Technical Writer, AI Enthusiast, and Developer Advocate

I recently did something that most software developers would consider either admirably honest or clinically inadvisable: I audited my own tool against the specification it claims to implement, wrote down the results in excruciating detail, and published them.

The tool is DocStratum, a documentation quality platform for llms.txt files. The project started with a thesis that most people in the AI tooling space either haven't considered or don't want to hear: a Technical Writer with strong Information Architecture skills can outperform a sophisticated RAG pipeline by simply writing better source material. Structure is a feature. DocStratum exists to prove it.

At its core, DocStratum is a validation framework — think ESLint, but for a Markdown standard defined by a blog post instead of a formal grammar. It checks your llms.txt file across five validation levels: basic parseability (L0), structural compliance (L1), content quality (L2), best practices (L3), and a full extended-quality tier (L4). It categorizes findings across 38 diagnostic codes using three severity levels (Error, Warning, Info). It detects anti-patterns — 22 of them, with names like "The Ghost File," "The Monolith Monster," and "The Preference Trap." It has opinions.

Those opinions, it turns out, are almost entirely our own invention. (Good.)

The 844,000 Sites That Weren't: How an AI Adoption Stat Fell Apart Under Scrutiny

Hero image for: The 844,000 Sites That Weren't: How an AI Adoption Stat Fell Apart Under Scrutiny
· ~10 min read
Ryan Goodrich
Technical Writer, AI Enthusiast, and Developer Advocate

I need to tell you about a number. It's a number that shows up in blog posts and LinkedIn threads and conference talks and those AI trend reports that get passed around Slack channels like contraband. The number is 844,000, and it refers to the number of websites that have supposedly adopted the llms.txt standard.

I encountered this number while building the evidence inventory for an analytical paper about llms.txt (the Markdown-based content discovery format proposed by Jeremy Howard in September 2024). Because I am the kind of person who builds evidence inventories before writing papers, the kind of person who catalogs every factual claim and traces it back to a primary source before committing a single sentence to a draft, I decided to verify it.

I should not have done this on a weeknight. The verification process involved what I can only describe as the five stages of grief, but for statistics.

The llms.txt Access Paradox: The Data Nobody Wants to Hear

Terminal diagnostic dashboard showing three systemic findings: thin llms.txt adoption (105 out of 1 million sites), zero AI providers confirming inference-time usage with Google rejecting the standard, and Cloudflare's three overlapping WAF control layers where custom rules override AI crawl settings across 20% of all websites.
· ~14 min read
Ryan Goodrich
Technical Writer, AI Enthusiast, and Developer Advocate

In Part 1, I told the story of discovering that my own hosting infrastructure was blocking AI crawlers from reading the llms.txt file I'd specifically published for them. A Web Application Firewall (WAF), the security layer that inspects every inbound HTTP request, can't tell the difference between "AI system reading curated content as intended" and "malicious bot probing endpoints for vulnerabilities," and the result is a paradox that would be hilarious if it weren't also my actual production environment.

That was the personal version, the "I discovered this at 11 PM and said words I can't publish on a professional blog" version. This is the systemic version. The one where I pull at the thread and the whole sweater starts to unravel.

Because once I started asking "how widespread is this?", the answers didn't just confirm the WAF problem. They complicated the entire premise of what llms.txt is supposed to do. And I mean the entire premise.

I Tried to Help AI Read My Website. My Own Firewall Said No.

Terminal split-screen: the left pane shows a well-formatted llms.txt file with green checkmarks; the right pane shows a curl request using a GPTBot user-agent returning a 403 Forbidden response, with the WAF evaluation listing every failed check. Tagline: the file is perfect, nobody can read it.
· ~11 min read
Ryan Goodrich
Technical Writer, AI Enthusiast, and Developer Advocate

I did everything right. I wrote the file. I followed the spec. I deployed it to production. I even tested it in my browser: clean Markdown rendering, proper H2 sections, curated links with useful descriptions. My llms.txt file was, and I say this without hyperbole, the best piece of structured content I had ever placed at a root URL. I was proud of that file, in the way that only a documentation-first developer can be proud of a Markdown file that nobody has read.

Then an AI system tried to read it, and my own infrastructure said no.

Not a polite "no, sorry, you don't have permission." Not even a helpful "no, that file doesn't exist." The kind of no where Cloudflare intercepts the request before it touches my server, decides the visitor looks suspicious on the basis of (and I love this) being exactly the kind of visitor the file was created for, and serves a JavaScript challenge page instead. To the AI crawler, my lovingly curated Markdown might as well not exist. In its place: a blob of obfuscated HTML designed to prove the visitor is human. Which, by definition, the AI crawler is not. Nor does it aspire to be. That's the entire point.

Welcome to what I've started calling the llms.txt Access Paradox: the structural conflict between publishing content for AI systems and running the security infrastructure that blocks them. It's the kind of problem that makes you close your laptop, open it again, and start writing a research paper instead of just a blog post.