RAG and Retrieval
Retrieval-Augmented Generation is one of the most important patterns in applied AI, and one of the most overhyped. These terms cover the pipeline from raw documents to grounded, source-backed LLM responses. When it works, it's genuinely useful. When it doesn't, it's hallucination with extra steps.
| Term | What it is |
|---|---|
| Retrieval-Augmented Generation (RAG) | Improving LLM responses by retrieving relevant documents and including them in context |
| Vector Store | A database optimized for storing and searching embeddings for semantic retrieval |
| Chunking | Splitting documents into smaller pieces before creating embeddings |
| Similarity Search | Finding stored embeddings mathematically close to a query embedding |
| Reranking | A second-pass ranking step to improve retrieval relevance |
| Context Window Stuffing | Filling as much context as possible with retrieved documents (often counterproductive) |
| Grounding | How well an LLM's response is based on retrieved evidence vs. fabrication |
| Metadata Filtering | Using structured metadata to narrow retrieval results |
| Semantic Search | Searching by meaning rather than exact keyword matches |
| Precision | The fraction of retrieved results that are actually relevant |
| Recall | The fraction of all relevant documents that the system actually finds |
| NDCG | Whether the best results appear near the top of a ranked list |
| MRR | How far down the list before the first relevant result |