When you hit enter, the model doesn't look up an answer. It reads everything in front of it and generates a reply, one piece at a time. That single step is called inference — and it holds a surprise about memory.
As you saw in Illustration 1, words turn into numbers. Here's that happening to your question, the instant you hit enter — and where it leads.
This whole act — your words in, a calculated next-word out — is called inference. And it raises a question: if answering is just a calculation over “the words in front of it,” what exactly counts as the words in front of it?
Send each message and watch what the model has to do first. The conversation isn't stored in its mind — every turn, the whole thing is handed back and read again from the top.
Notice the last turn. To answer “what were our goals?” the model re-read all four of your messages from the top. It didn't remember your goals — it read them again, exactly like every turn before.
This is what people mean when they say a model is stateless: it keeps nothing on its own. Everything it “knows” about your conversation is re-presented to it, in full, every single turn.
All that re-reading happens inside a space of a fixed size — the context window. It's large, but finite. Keep adding, and the oldest messages fall out the back.
{{ win.note }}
A fresh chat starts empty — the model has never met you. Memory isn't the model remembering. It's the app saving a note and pasting it into the top of every new window — and you decide what's worth saving.
Memory compounds. The more intentional you are about what you save, the sharper your collaborator gets over time.
My collaborator and I locked the Doctrine of Single Source of Truth into memory. It now echoes across dozens of other notes — so when I start to drift from it, the AI pushes back: “wait, we committed to this.” Memory isn't a filing cabinet you forget about. It's a growing set of shared commitments — and it's worth tending deliberately.
You've seen what really happens when you ask: the model re-reads the whole window and generates a reply — it stores nothing on its own. Next we'll see why that same process can produce an answer that's confidently, completely wrong.
Back to The Science of AI