Illustration 3 of 5

What Happens When You Ask

When you hit enter, the model doesn't look up an answer. It reads everything in front of it and generates a reply, one piece at a time. That single step is called inference — and it holds a surprise about memory.

Your question also becomes vectors.

As you saw in Illustration 1, words turn into numbers. Here's that happening to your question, the instant you hit enter — and where it leads.

You ask
When does the library open?
Split into tokens
{{ t.tok }}
Each token looked up as a vector
{{ t.tok }} {{ t.vec }}
(showing 4 of ~1,500 numbers per token)
One enormous calculation
vectors × billions of trained weights
= millions of multiply–add operations, all at once
Out comes a probability for the next word
{{ p.tok }}
{{ p.pctLabel }}
It picks one, adds it to the reply, and runs the whole calculation again for the next word. No search. No lookup. Just math.

This whole act — your words in, a calculated next-word out — is called inference. And it raises a question: if answering is just a calculation over “the words in front of it,” what exactly counts as the words in front of it?

The model remembers nothing. It re-reads everything.

Send each message and watch what the model has to do first. The conversation isn't stored in its mind — every turn, the whole thing is handed back and read again from the top.

Your conversation {{ reread.windowLabel }}
{{ m.text }}
An empty chat. Press Send and follow what happens on every single turn.
The model
holds nothing between turns
Re-read this turn
{{ reread.count }}
{{ reread.countSub }}
{{ reread.status }}

Notice the last turn. To answer “what were our goals?” the model re-read all four of your messages from the top. It didn't remember your goals — it read them again, exactly like every turn before.

This is what people mean when they say a model is stateless: it keeps nothing on its own. Everything it “knows” about your conversation is re-presented to it, in full, every single turn.

There's only so much it can hold: the context window.

All that re-reading happens inside a space of a fixed size — the context window. It's large, but finite. Keep adding, and the oldest messages fall out the back.

Context window {{ win.fillLabel }}
#{{ m.n }} {{ m.text }}
Fell out of the window
Nothing yet — there's still room.
#{{ m.n }} {{ m.text }}

{{ win.note }}

So how does it ever “remember” you?

A fresh chat starts empty — the model has never met you. Memory isn't the model remembering. It's the app saving a note and pasting it into the top of every new window — and you decide what's worth saving.

A note about how you like to work
“I keep one Single Source of Truth. Never duplicate the canonical record — point to it.”
{{ mem.saveHint }}
{{ mem.chatTitle }}
From memory: keeps a Single Source of Truth; never duplicate the canonical record.
{{ mem.greeting }}
A personal note — my experience

Memory compounds. The more intentional you are about what you save, the sharper your collaborator gets over time.

My collaborator and I locked the Doctrine of Single Source of Truth into memory. It now echoes across dozens of other notes — so when I start to drift from it, the AI pushes back: “wait, we committed to this.” Memory isn't a filing cabinet you forget about. It's a growing set of shared commitments — and it's worth tending deliberately.

One turn, start to finish.

You've seen what really happens when you ask: the model re-reads the whole window and generates a reply — it stores nothing on its own. Next we'll see why that same process can produce an answer that's confidently, completely wrong.

Back to The Science of AI