Similarity Search

The process of finding stored embeddings that are mathematically "close" to a query embedding. The most common measure is cosine similarity, which calculates the angle between two vectors; vectors pointing in roughly the same direction have high similarity regardless of their magnitude. (Other measures include Euclidean distance and dot product, for those keeping score.)

Why it matters for writers: Similarity search finds content that's conceptually related to a query, not content that contains the exact same words. This is powerful: searching for "how to reset a password" can find a document titled "Account Recovery Steps." But it's also imprecise; it might also return vaguely related content about your company's security policy, your GDPR compliance page, and a blog post someone wrote about authentication in 2019. Understanding this tradeoff helps you evaluate RAG system behavior and write content that's semantically distinct enough to be retrieved accurately.

Related terms: Embedding · Vector Store · Reranking