NDCG

Normalized Discounted Cumulative Gain. The name is a mouthful, but the question it asks is simple: are the best results near the top?

Precision and recall treat all retrieved results equally--either they're relevant or they aren't. NDCG cares about order. A relevant document in position 1 is worth more than the same document in position 8, because users look at the top of the list first. NDCG assigns progressively less credit the further down a result appears, then normalizes the score against a theoretically perfect ranking. A score of 1.0 means every relevant document appeared in the ideal order. A score of 0.5 means the system found the right stuff but buried half of it.

NDCG is usually reported at a cutoff: NDCG@10 evaluates the ranking quality of the top 10 results. It's the metric most commonly used to measure overall retrieval quality in RAG systems because it captures something precision and recall miss: the system might find the right documents but present them in an unhelpful order.

Why it matters for writers: NDCG rewards content that matches queries strongly and unambiguously. If your best document for a given topic consistently ranks below a less relevant one, the problem is usually structural: the more relevant document has weaker metadata, a less descriptive title, or content that overlaps with too many other documents. NDCG failures often point to content organization problems, not content quality problems.

Related terms: Precision · Recall · Retrieval-Augmented Generation