Skip to main content

Inference

The process of running a trained model to generate output. When you send a prompt to Claude or GPT-4 and get a response, that's inference. It's distinct from training (where the model learns from vast datasets) and fine-tuning (where a trained model gets additional, targeted training).

Why it matters for writers: The distinction matters because information reaches LLMs at different stages. Some systems provide information during inference (like RAG (see Retrieval-Augmented Generation), while others bake information in during training. The llms.txt standard was designed to provide curated content at inference time) but whether any major AI provider actually uses it that way remains an open question. (See the blog for ongoing research on this topic, or just sit with the uncertainty. We all are.)

Related terms: Large Language Model · Fine-Tuning · Retrieval-Augmented Generation