+
+
+
+
+
+ The model produces one token per inference, feeding each result back as new input.
+ This is why streaming feels gradual — and why longer outputs cost more in both latency and API spend.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Tokens, not words
+
+
+
+
+
+ 105 characters → 32 tokens. Rule of thumb: ~4 chars per token in English.
+ Each model generation uses a different tokenizer — same text, different token count, different cost.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Tokens in, tokens out
+
+
+
+
+
+ Input and output share the same vocabulary — tokenization shapes what the model even “sees”.
+ “Anthropic” becomes “Anth” + “ropic” because that’s how it appears most often in training data.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
What the model actually sees
+
+
+
+
+
+ The model never reads text — it reads a sequence of integers, each one an index into a vocabulary of ~200,000 entries.
+ Notice the comma is always ID 11 — the same punctuation mark maps to the same integer, everywhere, every time.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
💬 Models are stateless
+
+
+
+
+
+
+ User
+ Model
+
+
+
+
+
+ “My name is Shayan.”
+ ➜ to model
+
+
+
+
+
+
+ “Okay, your name is Shayan.”
+ ➜ to user
+
+
+
+
+
+
+ “What is my name?”
+ ➜ to model
+
+
+
+
+
+
+ “I don’t know your name — I have no memory of what you just said.”
+ ➜ to user
+
+
+
+
+
+
+
Every turn is a fresh API call.
+
Memory only exists if the harness replays the transcript.
+
+
+
+
+
+
+
+
@@ -489,9 +648,9 @@
-
+
-
+
@@ -559,165 +718,6 @@
-
-
-
-
-
-
-
-
One token at a time
-
-
-
-
-
- The model produces one token per inference, feeding each result back as new input.
- This is why streaming feels gradual — and why longer outputs cost more in both latency and API spend.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
Tokens, not words
-
-
-
-
-
- 105 characters → 32 tokens. Rule of thumb: ~4 chars per token in English.
- Each model generation uses a different tokenizer — same text, different token count, different cost.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
Tokens in, tokens out
-
-
-
-
-
- Input and output share the same vocabulary — tokenization shapes what the model even “sees”.
- “Anthropic” becomes “Anth” + “ropic” because that’s how it appears most often in training data.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
What the model actually sees
-
-
-
-
-
- The model never reads text — it reads a sequence of integers, each one an index into a vocabulary of ~200,000 entries.
- Notice the comma is always ID 11 — the same punctuation mark maps to the same integer, everywhere, every time.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
💬 Models are stateless
-
-
-
-
-
-
- User
- Model
-
-
-
-
-
- “My name is Shayan.”
- ➜ to model
-
-
-
-
-
-
- “Okay, your name is Shayan.”
- ➜ to user
-
-
-
-
-
-
- “What is my name?”
- ➜ to model
-
-
-
-
-
-
- “I don’t know your name — I have no memory of what you just said.”
- ➜ to user
-
-
-
-
-
-
-
Every turn is a fresh API call.
-
Memory only exists if the harness replays the transcript.