Retrieval-Augmented Generation (RAG)

A pattern that improves LLM responses by retrieving relevant documents from an external knowledge base and including them in the model's context at inference time. Instead of relying solely on what the model learned during training (which may be outdated, incomplete, or wrong), RAG gives it access to specific, up-to-date information on demand.

The basic flow: User asks a question → the system searches a knowledge base for relevant documents → those documents get inserted into the prompt alongside the question → the LLM generates a response grounded in the retrieved content.

It's like taking an open-book exam instead of relying on memory. The exam is still hard, but at least you've got the textbook in front of you. Whether you flip to the right page is another matter.

Why it matters for writers: RAG is how organizations get LLMs to answer questions about their content, internal docs, product knowledge bases, support articles. If you're writing content that will be consumed by a RAG pipeline, how you structure that content (headings, metadata, self-contained sections) directly affects retrieval quality. Your writing style is, in a very real sense, part of the system architecture.

Related terms: Vector Store · Chunking · Grounding · Embedding