Skip to main content

3 posts tagged with "FractalRecall"

Posts about FractalRecall's metadata-as-DNA approach to vector embeddings and RAG quality.

View All Tags

Embedding Models Don't Read Your Metadata (But They Should)

Split comparison showing YAML metadata as noise versus the same metadata as a natural language sentence producing a sharper embedding vector
· ~9 min read
Ryan Goodrich
Technical Writer, AI Enthusiast, and Developer Advocate

Here's a sentence your embedding model understands perfectly well:

"This is a canonical faction document from the post-Glitch era describing cultural practices and political structure."

And here's functionally identical information that your embedding model treats as random noise:

canon: true
domain: faction
era: post-glitch
topics: [culture, politics]

Same facts. Same document. Different embedding behavior. The YAML blob gets processed as four disconnected tokens with no semantic weight. The natural language sentence gets encoded as a rich set of contextual signals that tell the model what this document is, what it's about, and how it relates to the kind of questions someone might ask.

The gap between the metadata your system knows and the context your embeddings encode is the single biggest free improvement sitting in most RAG pipelines. Almost nobody exploits it.

I Added Context to My Embeddings and 43% of My Data Disappeared

Terminal showing embedding pipeline results: 218 chunks input, 124 surviving, 94 silently dropped, with retrieval metrics improving despite data loss
· ~7 min read
Ryan Goodrich
Technical Writer, AI Enthusiast, and Developer Advocate

In Part 1, I mentioned the D-22 experiment almost as an aside. Twenty-four tokens of metadata prefix, 16.5% improvement in ranking quality, 27.3% recall jump. Good numbers. Clean story.

I left out the part where 43% of my data vanished.

Not "performed poorly." Not "returned lower-quality results." Vanished. Ninety-four of 218 chunks silently dropped from the index because I added one sentence of context and didn't do the arithmetic on what that sentence would cost. The embedding pipeline didn't warn me. ChromaDB didn't complain. I only noticed because I'm the kind of person who checks row counts after every insert. (This is not a personality trait. It's scar tissue.)

The results improved anyway. That's the part I need to explain.

Context Windows Are a Lie (And Haiku Protocol Is My Coping Mechanism)

Terminal showing a 128K context window shrinking to an effective 8K zone, with lost-in-the-middle degradation visualized as fading text
· ~10 min read
Ryan Goodrich
Technical Writer, AI Enthusiast, and Developer Advocate

LLM vendors would like you to know that their latest model supports a 128,000-token context window. Some of them say 200,000. One of them, and I won't name names but their logo is a little sunset, says a million. A million tokens. That's approximately four copies of War and Peace, which is appropriate because trying to get useful work done at the far end of a million-token window is its own kind of Russian tragedy.

Here's what the marketing materials don't mention: the effective context window, the portion where the model actually pays reliable attention to what you put there, is dramatically smaller. Research from Stanford, Berkeley, and others has converged on a finding that would be funny if it weren't costing people real money: models struggle with information placed in the middle of long contexts. They're great at the beginning. They're decent at the end. The middle? The middle is where facts go to die quietly, unnoticed, like a footnote in a terms of service agreement.

This is the "Lost in the Middle" problem, and if you're building anything that retrieves information and feeds it to a language model (which, in 2026, is approximately everyone) it means the number on the tin is a fantasy. Your 128K window is functionally an 8K window with 120K tokens of expensive padding.

I know this because I ran the experiment. Accidentally. Three times.