Project banner

FractalRecall

FractalRecall explores a fundamentally different approach to how data gets vectorized for AI retrieval systems. The core thesis: metadata isn't something you attach to embeddings after creation. It's structural information that should shape how data gets embedded in the first place.

The Problem It Solves

Most RAG pipelines follow a standard pattern:

Four stacked layers: raw content, metadata, vector space, and fused embeddings — FractalRecall's approach to embedding.

Split documents into chunks
Generate embeddings for each chunk
Store embeddings in a vector database
At query time, embed the query and find similar chunks
Optionally filter results by metadata (date, category, document type)

That metadata step happens after retrieval. It's a filter, not a signal. The embedding itself carries no information about the document's structure, type, audience, or purpose. A paragraph from a product FAQ and a paragraph from an internal engineering spec might produce nearly identical embeddings if the words are similar--even though they serve completely different audiences and should be retrieved in completely different contexts.

FractalRecall argues that this is backwards.

The Metadata-as-DNA Approach

The name comes from two ideas:

Metadata as structural DNA — angular crystalline strands encoding data and metadata together.

Fractal. Metadata isn't a flat label; it has structure at every level. A document has metadata (type, audience, product). A section has metadata (topic, heading level). A paragraph has metadata (position, purpose, relationship to adjacent content). These layers are structurally self-similar at different scales. Like a fractal.

Recall. The goal is better retrieval. Not just finding documents that contain similar words, but finding documents that are relevant in context: the right type of content, for the right audience, from the right source.

In practice, this means metadata participates in the embedding process itself, not just in post-retrieval filtering. The details of how are the active research, and they'll be published as the work stabilizes.

How It Connects

FractalRecall addresses the quality side of a problem that LlmsTxtKit addresses from the access side:

LlmsTxtKit asks: "Can AI systems access curated content at all?" (The answer is "sometimes, and it's harder than you'd think.")
FractalRecall asks: "Once AI systems have the content, can they retrieve the right parts reliably?"

Both projects are motivated by the same observation: the gap between what AI systems could do with well-structured content and what they actually do with poorly-structured or poorly-retrieved content is enormous.

Current Status

FractalRecall is in the design and experimentation phase. The thesis has been documented (docs-first, naturally), and the initial architecture is being explored. The blog will cover findings as they emerge. The content plan includes four FractalRecall-focused posts covering the metadata-as-DNA concept, RAG context loss, chunking problems, and embeddings as lossy compression.

A topographic map of embedding space showing content clusters organized by metadata affinity.

Where to Find It

GitHub: southpawriter02/fractalrecall (link will update when repo is public)
Related glossary terms: Embedding, Chunking, Metadata Filtering, RAG

The Problem It Solves​

The Metadata-as-DNA Approach​

How It Connects​

Current Status​

Where to Find It​

The Problem It Solves

The Metadata-as-DNA Approach

How It Connects

Current Status

Where to Find It