Question 1

Why do embedding models ignore YAML metadata?

Accepted Answer

Embedding models are trained on natural language. YAML structured metadata is not natural language--it's syntax. The model treats YAML tags and keys as disconnected tokens without semantic weight. Natural language sentences expressing the same information encode rich contextual signals.

Question 2

How much does converting metadata to natural language improve retrieval?

Accepted Answer

In the FractalRecall D-22 experiment, converting YAML metadata to a natural language prefix sentence improved NDCG@10 by 16.5% and recall by 27.3%. The prefix was just 24 tokens--roughly the same length as one sentence.

Question 3

What information should be included in a metadata enrichment prefix?

Accepted Answer

The prefix should include document type, authority level, temporal context, and organizational structure. Effective prefixes are specific and informative--not generic statements, but details like 'canonical policy document approved by Legal, superseding version 3.2, covering remote worker termination'.

2 posts tagged with "Embeddings"

Embedding Models Don't Read Your Metadata (But They Should)

I Added Context to My Embeddings and 43% of My Data Disappeared