Skip to main content

RAG and Retrieval

Retrieval-Augmented Generation is one of the most important patterns in applied AI, and one of the most overhyped. These terms cover the pipeline from raw documents to grounded, source-backed LLM responses. When it works, it's genuinely useful. When it doesn't, it's hallucination with extra steps.

TermWhat it is
Retrieval-Augmented Generation (RAG)Improving LLM responses by retrieving relevant documents and including them in context
Vector StoreA database optimized for storing and searching embeddings for semantic retrieval
ChunkingSplitting documents into smaller pieces before creating embeddings
Similarity SearchFinding stored embeddings mathematically close to a query embedding
RerankingA second-pass ranking step to improve retrieval relevance
Context Window StuffingFilling as much context as possible with retrieved documents (often counterproductive)
GroundingHow well an LLM's response is based on retrieved evidence vs. fabrication
Metadata FilteringUsing structured metadata to narrow retrieval results
Semantic SearchSearching by meaning rather than exact keyword matches
PrecisionThe fraction of retrieved results that are actually relevant
RecallThe fraction of all relevant documents that the system actually finds
NDCGWhether the best results appear near the top of a ranked list
MRRHow far down the list before the first relevant result