Skip to main content

Transformer

The neural network architecture that powers virtually all modern LLMs. Introduced in a 2017 research paper with the delightfully understated title "Attention Is All You Need," the transformer's key innovation is the attention mechanism; it lets the model weigh the importance of different words relative to each other regardless of their position in the text. Before transformers, models processed text sequentially (word by word, like reading a novel on a bus). Transformers process entire sequences in parallel, which makes them dramatically faster to train.

Why it matters for writers: You don't need to understand the math. (I don't mean that condescendingly; I mean it literally. The math is not the useful part for most people.) What's useful is knowing that transformers "attend" to relationships between words across an entire passage. This is why LLMs are surprisingly good at keeping context over long documents, and why they still sometimes lose the thread if you bury something important on page 47.

Related terms: Large Language Model · Token · Context Window