Why do embedding models ignore YAML metadata?

Embedding models are trained on natural language. YAML structured metadata is not natural language--it's syntax. The model treats YAML tags and keys as disconnected tokens without semantic weight. Natural language sentences expressing the same information encode rich contextual signals.

How much does converting metadata to natural language improve retrieval?

In the FractalRecall D-22 experiment, converting YAML metadata to a natural language prefix sentence improved NDCG@10 by 16.5% and recall by 27.3%. The prefix was just 24 tokens--roughly the same length as one sentence.

What information should be included in a metadata enrichment prefix?

The prefix should include document type, authority level, temporal context, and organizational structure. Effective prefixes are specific and informative--not generic statements, but details like 'canonical policy document approved by Legal, superseding version 3.2, covering remote worker termination'.

What happens when embedding input exceeds the token limit?

Embedding models have a maximum token budget per input. When text exceeds this limit, chunks are silently dropped from the index without error messages. In the FractalRecall D-22 experiment, adding a 24-token metadata prefix caused 43% of chunks to overflow, with no warning or notification that data was being discarded.

Why did retrieval improve despite losing 43% of the data?

The surviving chunks had better semantic representation from the metadata prefix, which improved ranking quality by 16.5% and recall by 27.3%. However, the improvement may also be partly due to accidental data cleaning if the overflowed chunks were lower quality, making it impossible to fully separate the enrichment benefit from the data loss benefit.

How should token budget be allocated for metadata enrichment?

Token budgets must account for both enrichment prefixes and content. The D-23 solution uses a prefix_reserve mechanism that pre-allocates tokens for metadata, guaranteeing the content never exceeds the remaining space. This prevents silent overflow at the cost of accepting smaller chunk sizes.

What is the 'Lost in the Middle' problem with LLM context windows?

Research from Stanford, Berkeley, and others shows that LLMs struggle with information placed in the middle of long contexts. Models attend well to the beginning and end of their context window, but reliability drops significantly for information in the middle. This means a 128K context window is functionally much smaller for reliable retrieval and reasoning tasks.

How does metadata enrichment improve retrieval quality?

In the FractalRecall D-22 experiment, prepending a 24-token natural-language metadata sentence to each chunk before embedding improved retrieval quality (NDCG@10) by 16.5% and recall by 27.3%. This suggests that a small amount of the right context can outperform a large amount of raw text.

What is semantic compression in the context of LLMs?

Semantic compression (as explored by Haiku Protocol) is the systematic transformation of verbose, human-friendly prose into dense, machine-optimized strings that preserve the same information in fewer tokens. Unlike summarization (which loses detail) or truncation (which loses endings), compression aims to retain all information while reducing token count.

3 posts tagged with "FractalRecall"

Posts about FractalRecall's metadata-as-DNA approach to vector embeddings and RAG quality.

View All Tags

Embedding Models Don't Read Your Metadata (But They Should)

February 26, 2026 · ~9 min read

Ryan Goodrich

Technical Writer, AI Enthusiast, and Developer Advocate

Here's a sentence your embedding model understands perfectly well:

"This is a canonical faction document from the post-Glitch era describing cultural practices and political structure."

And here's functionally identical information that your embedding model treats as random noise:

canon: true
domain: faction
era: post-glitch
topics: [culture, politics]

Same facts. Same document. Different embedding behavior. The YAML blob gets processed as four disconnected tokens with no semantic weight. The natural language sentence gets encoded as a rich set of contextual signals that tell the model what this document is, what it's about, and how it relates to the kind of questions someone might ask.

The gap between the metadata your system knows and the context your embeddings encode is the single biggest free improvement sitting in most RAG pipelines. Almost nobody exploits it.

I Added Context to My Embeddings and 43% of My Data Disappeared

February 23, 2026 · ~7 min read

Ryan Goodrich

Technical Writer, AI Enthusiast, and Developer Advocate

In Part 1, I mentioned the D-22 experiment almost as an aside. Twenty-four tokens of metadata prefix, 16.5% improvement in ranking quality, 27.3% recall jump. Good numbers. Clean story.

I left out the part where 43% of my data vanished.

Not "performed poorly." Not "returned lower-quality results." Vanished. Ninety-four of 218 chunks silently dropped from the index because I added one sentence of context and didn't do the arithmetic on what that sentence would cost. The embedding pipeline didn't warn me. ChromaDB didn't complain. I only noticed because I'm the kind of person who checks row counts after every insert. (This is not a personality trait. It's scar tissue.)

The results improved anyway. That's the part I need to explain.

Context Windows Are a Lie (And Haiku Protocol Is My Coping Mechanism)

February 21, 2026 · ~10 min read

Ryan Goodrich

Technical Writer, AI Enthusiast, and Developer Advocate

LLM vendors would like you to know that their latest model supports a 128,000-token context window. Some of them say 200,000. One of them, and I won't name names but their logo is a little sunset, says a million. A million tokens. That's approximately four copies of War and Peace, which is appropriate because trying to get useful work done at the far end of a million-token window is its own kind of Russian tragedy.

Here's what the marketing materials don't mention: the effective context window, the portion where the model actually pays reliable attention to what you put there, is dramatically smaller. Research from Stanford, Berkeley, and others has converged on a finding that would be funny if it weren't costing people real money: models struggle with information placed in the middle of long contexts. They're great at the beginning. They're decent at the end. The middle? The middle is where facts go to die quietly, unnoticed, like a footnote in a terms of service agreement.

This is the "Lost in the Middle" problem, and if you're building anything that retrieves information and feeds it to a language model (which, in 2026, is approximately everyone) it means the number on the tin is a fantasy. Your 128K window is functionally an 8K window with 120K tokens of expensive padding.

I know this because I ran the experiment. Accidentally. Three times.