Skip to main content

Context Window Stuffing

The practice of filling as much of an LLM's context window as possible with retrieved documents before generating a response. More context gives the model more information to work with, and in the early days of RAG the prevailing wisdom was "more is better." The prevailing wisdom was wrong.

There are diminishing returns, and sometimes negative ones. Models can get distracted by marginally relevant content, and the "lost in the middle" problem (see Context Window) means information buried in the middle of a long context may be underweighted. Stuffing the context is like packing a suitcase by throwing in everything you own: technically it all fits, but good luck finding your passport.

Why it matters for writers: This is a system design concern that directly affects content strategy. If your documents are verbose and repetitive, they consume context window space without adding proportional value. Concise, information-dense writing is more efficient in a RAG pipeline. This is one argument for the llms.txt approach, providing a curated, condensed version of your content instead of forcing retrieval systems to process your full HTML pages.

Related terms: Context Window · Retrieval-Augmented Generation · Token