Skip to main content

Embedding Models Don't Read Your Metadata (But They Should)

Split comparison showing YAML metadata as noise versus the same metadata as a natural language sentence producing a sharper embedding vector
· ~9 min read
Ryan Goodrich
Technical Writer, AI Enthusiast, and Developer Advocate

Here's a sentence your embedding model understands perfectly well:

"This is a canonical faction document from the post-Glitch era describing cultural practices and political structure."

And here's functionally identical information that your embedding model treats as random noise:

canon: true
domain: faction
era: post-glitch
topics: [culture, politics]

Same facts. Same document. Different embedding behavior. The YAML blob gets processed as four disconnected tokens with no semantic weight. The natural language sentence gets encoded as a rich set of contextual signals that tell the model what this document is, what it's about, and how it relates to the kind of questions someone might ask.

The gap between the metadata your system knows and the context your embeddings encode is the single biggest free improvement sitting in most RAG pipelines. Almost nobody exploits it.

Your RAG Pipeline Has a Check Engine Light. You're Ignoring It.

Dashboard showing a GO/NO-GO decision framework with seven evaluation criteria for RAG pipeline quality assessment
· ~10 min read
Ryan Goodrich
Technical Writer, AI Enthusiast, and Developer Advocate

I ran a retrieval experiment that returned perfect zeros across all 36 queries, and every automated check I'd built said "statistically significant." The decision engine considered seven criteria, passed two of them, and issued a NO-GO. The pipeline caught the problem. Not me--the pipeline.

Here's what scares me: most production RAG systems don't have a pipeline like that. They don't have decision criteria. They don't have rollback thresholds. They don't have a concept of "this retrieval result is wrong and we should know about it automatically." They ship a model, run some spot checks, and move on to the next sprint.

Your RAG pipeline has a check engine light. You just never installed it.

Five Projects, One Realization: The Document Is the Database

Five project icons forming a document-centric pipeline: publish, validate, embed, compress, manage — connected by structural metadata flows
· ~8 min read
Ryan Goodrich
Technical Writer, AI Enthusiast, and Developer Advocate

I didn't plan a portfolio. I planned a Markdown file. Then another one. Then five projects materialized around them like ice crystals on a cold window, each shaped by the same principle I didn't recognize until project number four. Apparently I need to build the same insight multiple times before I notice I keep building it.

The insight: documents are not content delivery vehicles. They are structured knowledge systems. Almost every AI tool in production today throws away the structure and keeps only the content. That's like buying a filing cabinet, dumping all the folders on the floor, and asking someone to find last quarter's tax return by feeling the texture of the paper.

I know this because I've now built five projects that all, in their own way, try to fix that mistake.

I Added Context to My Embeddings and 43% of My Data Disappeared

Terminal showing embedding pipeline results: 218 chunks input, 124 surviving, 94 silently dropped, with retrieval metrics improving despite data loss
· ~7 min read
Ryan Goodrich
Technical Writer, AI Enthusiast, and Developer Advocate

In Part 1, I mentioned the D-22 experiment almost as an aside. Twenty-four tokens of metadata prefix, 16.5% improvement in ranking quality, 27.3% recall jump. Good numbers. Clean story.

I left out the part where 43% of my data vanished.

Not "performed poorly." Not "returned lower-quality results." Vanished. Ninety-four of 218 chunks silently dropped from the index because I added one sentence of context and didn't do the arithmetic on what that sentence would cost. The embedding pipeline didn't warn me. ChromaDB didn't complain. I only noticed because I'm the kind of person who checks row counts after every insert. (This is not a personality trait. It's scar tissue.)

The results improved anyway. That's the part I need to explain.

Google Said No to llms.txt. Five Google Teams Didn't Get the Memo.

Timeline showing Google executives dismissing llms.txt in April, July, and December 2025, while five Google developer documentation properties quietly implement llms.txt files in 2026.
· ~10 min read
Ryan Goodrich
Technical Writer, AI Enthusiast, and Developer Advocate

The timeline is where the joke lives.

April 2025. Google's John Mueller compares llms.txt to the keywords meta tag. For the uninitiated, the keywords meta tag is so discredited that invoking it in SEO circles is equivalent to recommending bloodletting at a medical conference. Mueller's message was clear: llms.txt is unnecessary, self-reported data that Google has no intention of using.

July 2025. Gary Illyes, also from Google's Search team, confirms the position at Search Central Live. No support. Won't be used. Normal SEO works fine for AI Overviews. The standard is, officially, not something Google is interested in.

December 3, 2025. An SEO professional named Lidia Infante discovers an llms.txt file on Google's own Search Central documentation. Mueller's response, posted to Bluesky: "hmmn :-/". The file was removed within hours.

So far, a clean narrative. Google said no, someone at Google accidentally deployed one, it was caught and deleted, and the official position holds. Embarrassing, but coherent.

Then I started pulling at threads.

Context Windows Are a Lie (And Haiku Protocol Is My Coping Mechanism)

Terminal showing a 128K context window shrinking to an effective 8K zone, with lost-in-the-middle degradation visualized as fading text
· ~10 min read
Ryan Goodrich
Technical Writer, AI Enthusiast, and Developer Advocate

LLM vendors would like you to know that their latest model supports a 128,000-token context window. Some of them say 200,000. One of them, and I won't name names but their logo is a little sunset, says a million. A million tokens. That's approximately four copies of War and Peace, which is appropriate because trying to get useful work done at the far end of a million-token window is its own kind of Russian tragedy.

Here's what the marketing materials don't mention: the effective context window, the portion where the model actually pays reliable attention to what you put there, is dramatically smaller. Research from Stanford, Berkeley, and others has converged on a finding that would be funny if it weren't costing people real money: models struggle with information placed in the middle of long contexts. They're great at the beginning. They're decent at the end. The middle? The middle is where facts go to die quietly, unnoticed, like a footnote in a terms of service agreement.

This is the "Lost in the Middle" problem, and if you're building anything that retrieves information and feeds it to a language model (which, in 2026, is approximately everyone) it means the number on the tin is a fantasy. Your 128K window is functionally an 8K window with 120K tokens of expensive padding.

I know this because I ran the experiment. Accidentally. Three times.

78.8% of My Validator Is Made Up (And That's the Point)

Terminal running a self-audit of DocStratum's 52 validation items: bar charts show 6 spec-compliant (11.5%), 5 spec-implied (9.6%), and 41 DocStratum extensions (78.8%). Verdict: 78.8% invented — that's the product.
· ~16 min read
Ryan Goodrich
Technical Writer, AI Enthusiast, and Developer Advocate

I recently did something that most software developers would consider either admirably honest or clinically inadvisable: I audited my own tool against the specification it claims to implement, wrote down the results in excruciating detail, and published them.

The tool is DocStratum, a documentation quality platform for llms.txt files. The project started with a thesis that most people in the AI tooling space either haven't considered or don't want to hear: a Technical Writer with strong Information Architecture skills can outperform a sophisticated RAG pipeline by simply writing better source material. Structure is a feature. DocStratum exists to prove it.

At its core, DocStratum is a validation framework — think ESLint, but for a Markdown standard defined by a blog post instead of a formal grammar. It checks your llms.txt file across five validation levels: basic parseability (L0), structural compliance (L1), content quality (L2), best practices (L3), and a full extended-quality tier (L4). It categorizes findings across 38 diagnostic codes using three severity levels (Error, Warning, Info). It detects anti-patterns — 22 of them, with names like "The Ghost File," "The Monolith Monster," and "The Preference Trap." It has opinions.

Those opinions, it turns out, are almost entirely our own invention. (Good.)

The Three Voices of Technical Research: Why My Blog Sounds Nothing Like My Paper

Three terminal panes side by side showing the same WAF-blocking finding in three voices: the blog (opinionated, orange tab), the guide (neutral, green tab), and the paper (impartial, blue tab). Tagline: same research, three rooms.
· ~10 min read
Ryan Goodrich
Technical Writer, AI Enthusiast, and Developer Advocate

Someone recently asked me a question that I've been thinking about ever since: "Doesn't writing your blog posts with humor and sarcasm undermine your credibility as a researcher?"

It's a fair question. The blog posts on this site are... aggressively me. I compare WAF blocking to "hiring a security guard who prevents anyone matching the physical description of 'reads books' from entering the bookstore." I describe AI crawlers as looking like "a DDoS attack with a liberal arts degree." I write sentences like "I am a documentation-first developer with a research compulsion and a growing collection of Markdown files about Markdown files," and then I publish those sentences on the internet where potential collaborators can see them.

Meanwhile, the analytical paper I'm writing about the same research uses phrases like "the structural misalignment between content publication intent and infrastructure-level access enforcement." Which is the same observation as the bookstore metaphor, expressed in the register of someone who wants to be taken seriously at a conference.

Same research. Same data. Same conclusions. Radically different voices. And I'd argue that if I used only one of those voices everywhere, the whole project would be worse.

I Fact-Checked My Own Research Paper Before Writing It (You Should Too)

Terminal running an evidence inventory audit of 49 claims: 33 verified, 13 author analysis, 1 partial, and 2 incorrect — including the 844,000 adoption stat that collapsed to 784 directory entries and 105 in the top million.
· ~11 min read
Ryan Goodrich
Technical Writer, AI Enthusiast, and Developer Advocate

Here's a workflow tip that's either going to save your credibility or confirm that I have an unhealthy relationship with spreadsheets: before you write anything that makes factual claims, build an evidence inventory first.

Not a bibliography. Not a "sources" section at the bottom of a Google Doc. An actual structured inventory where every single factual claim in your paper, blog post, report, or conference talk is cataloged, mapped to a primary source, independently verified, and assigned a status. Verified. Partially verified. Unverified. Or the one that makes your stomach drop: incorrect.

I know this sounds like the kind of advice that belongs on a poster in a university writing center, sandwiched between "cite your sources" and "plagiarism is bad." But I'm not talking about academic hygiene. I'm talking about self-defense.

The 844,000 Sites That Weren't: How an AI Adoption Stat Fell Apart Under Scrutiny

Hero image for: The 844,000 Sites That Weren't: How an AI Adoption Stat Fell Apart Under Scrutiny
· ~10 min read
Ryan Goodrich
Technical Writer, AI Enthusiast, and Developer Advocate

I need to tell you about a number. It's a number that shows up in blog posts and LinkedIn threads and conference talks and those AI trend reports that get passed around Slack channels like contraband. The number is 844,000, and it refers to the number of websites that have supposedly adopted the llms.txt standard.

I encountered this number while building the evidence inventory for an analytical paper about llms.txt (the Markdown-based content discovery format proposed by Jeremy Howard in September 2024). Because I am the kind of person who builds evidence inventories before writing papers, the kind of person who catalogs every factual claim and traces it back to a primary source before committing a single sentence to a draft, I decided to verify it.

I should not have done this on a weeknight. The verification process involved what I can only describe as the five stages of grief, but for statistics.