RAG and Retrieval

Retrieval-Augmented Generation is one of the most important patterns in applied AI, and one of the most overhyped. These terms cover the pipeline from raw documents to grounded, source-backed LLM responses. When it works, it's genuinely useful. When it doesn't, it's hallucination with extra steps.

Term	What it is
Retrieval-Augmented Generation (RAG)	Improving LLM responses by retrieving relevant documents and including them in context
Vector Store	A database optimized for storing and searching embeddings for semantic retrieval
Chunking	Splitting documents into smaller pieces before creating embeddings
Similarity Search	Finding stored embeddings mathematically close to a query embedding
Reranking	A second-pass ranking step to improve retrieval relevance
Context Window Stuffing	Filling as much context as possible with retrieved documents (often counterproductive)
Grounding	How well an LLM's response is based on retrieved evidence vs. fabrication
Metadata Filtering	Using structured metadata to narrow retrieval results
Semantic Search	Searching by meaning rather than exact keyword matches
Precision	The fraction of retrieved results that are actually relevant
Recall	The fraction of all relevant documents that the system actually finds
NDCG	Whether the best results appear near the top of a ranked list
MRR	How far down the list before the first relevant result