FractalRecall
FractalRecall explores a fundamentally different approach to how data gets vectorized for AI retrieval systems. The core thesis: metadata isn't something you attach to embeddings after creation. It's structural information that should shape how data gets embedded in the first place.
The Problem It Solves
Most RAG pipelines follow a standard pattern:
- Split documents into chunks
- Generate embeddings for each chunk
- Store embeddings in a vector database
- At query time, embed the query and find similar chunks
- Optionally filter results by metadata (date, category, document type)
That metadata step happens after retrieval. It's a filter, not a signal. The embedding itself carries no information about the document's structure, type, audience, or purpose. A paragraph from a product FAQ and a paragraph from an internal engineering spec might produce nearly identical embeddings if the words are similar--even though they serve completely different audiences and should be retrieved in completely different contexts.
FractalRecall argues that this is backwards.
The Metadata-as-DNA Approach
The name comes from two ideas:
Fractal. Metadata isn't a flat label; it has structure at every level. A document has metadata (type, audience, product). A section has metadata (topic, heading level). A paragraph has metadata (position, purpose, relationship to adjacent content). These layers are structurally self-similar at different scales. Like a fractal.
Recall. The goal is better retrieval. Not just finding documents that contain similar words, but finding documents that are relevant in context: the right type of content, for the right audience, from the right source.
In practice, this means metadata participates in the embedding process itself, not just in post-retrieval filtering. The details of how are the active research, and they'll be published as the work stabilizes.
How It Connects
FractalRecall addresses the quality side of a problem that LlmsTxtKit addresses from the access side:
- LlmsTxtKit asks: "Can AI systems access curated content at all?" (The answer is "sometimes, and it's harder than you'd think.")
- FractalRecall asks: "Once AI systems have the content, can they retrieve the right parts reliably?"
Both projects are motivated by the same observation: the gap between what AI systems could do with well-structured content and what they actually do with poorly-structured or poorly-retrieved content is enormous.
Current Status
FractalRecall is in the design and experimentation phase. The thesis has been documented (docs-first, naturally), and the initial architecture is being explored. The blog will cover findings as they emerge. The content plan includes four FractalRecall-focused posts covering the metadata-as-DNA concept, RAG context loss, chunking problems, and embeddings as lossy compression.
Where to Find It
- GitHub: southpawriter02/fractalrecall (link will update when repo is public)
- Related glossary terms: Embedding, Chunking, Metadata Filtering, RAG