diff --git a/presentation/claude-code-best-practice/index.html b/presentation/claude-code-best-practice/index.html index 2e3a9c6..7c72eff 100644 --- a/presentation/claude-code-best-practice/index.html +++ b/presentation/claude-code-best-practice/index.html @@ -419,9 +419,168 @@ - +
+
+ + +

One token at a time

+ + +
+ Animated diagram showing autoregressive generation: prompt feeds into LLM, which predicts one token, feeds it back, and repeats until the full answer is produced. +
+ The model produces one token per inference, feeding each result back as new input.
+ This is why streaming feels gradual — and why longer outputs cost more in both latency and API spend. +
+
+ +
+
+ + + + +
+
+ + +

Tokens, not words

+ + +
+ Screenshot of the OpenAI tokenizer showing the sentence about BPE split into 32 tokens across 105 characters, with tabs for GPT-5.x, GPT-4, and GPT-3 tokenizers. +
+ 105 characters → 32 tokens. Rule of thumb: ~4 chars per token in English.
+ Each model generation uses a different tokenizer — same text, different token count, different cost. +
+
+ +
+
+ + + + +
+
+ + +

Tokens in, tokens out

+ + +
+ Animated diagram combining tokenization and autoregressive generation: the BPE-tokenized prompt feeds into the LLM, which generates the answer token-by-token using the same shared vocabulary. +
+ Input and output share the same vocabulary — tokenization shapes what the model even “sees”.
+ “Anthropic” becomes “Anth” + “ropic” because that’s how it appears most often in training data. +
+
+ +
+
+ + + + +
+
+ + +

What the model actually sees

+ + +
+ Animated diagram showing the 32 integer token IDs the model receives: e.g. 28133 for 'Does', 17554 for ' Chat', 162016 for 'GPT', 97481 for ' Claude'. Generated tokens are also shown as IDs. Vocab size V ≈ 200,000. +
+ The model never reads text — it reads a sequence of integers, each one an index into a vocabulary of ~200,000 entries.
+ Notice the comma is always ID 11 — the same punctuation mark maps to the same integer, everywhere, every time. +
+
+ +
+
+ + + + +
+
+ + +

💬 Models are stateless

+ + +
+ + +
+ User + Model +
+ + +
+
+ “My name is Shayan.” + ➜ to model +
+
+ + +
+
+ “Okay, your name is Shayan.” + ➜ to user +
+
+ + +
+
+ “What is my name?” + ➜ to model +
+
+ + +
+
+ “I don’t know your name — I have no memory of what you just said.” + ➜ to user +
+
+ +
+ + +

Every turn is a fresh API call.

+

Memory only exists if the harness replays the transcript.

+ +
+
+ + + + +
@@ -489,9 +648,9 @@
- + -
+
@@ -559,165 +718,6 @@
- - - -
-
- - -

One token at a time

- - -
- Animated diagram showing autoregressive generation: prompt feeds into LLM, which predicts one token, feeds it back, and repeats until the full answer is produced. -
- The model produces one token per inference, feeding each result back as new input.
- This is why streaming feels gradual — and why longer outputs cost more in both latency and API spend. -
-
- -
-
- - - - -
-
- - -

Tokens, not words

- - -
- Screenshot of the OpenAI tokenizer showing the sentence about BPE split into 32 tokens across 105 characters, with tabs for GPT-5.x, GPT-4, and GPT-3 tokenizers. -
- 105 characters → 32 tokens. Rule of thumb: ~4 chars per token in English.
- Each model generation uses a different tokenizer — same text, different token count, different cost. -
-
- -
-
- - - - -
-
- - -

Tokens in, tokens out

- - -
- Animated diagram combining tokenization and autoregressive generation: the BPE-tokenized prompt feeds into the LLM, which generates the answer token-by-token using the same shared vocabulary. -
- Input and output share the same vocabulary — tokenization shapes what the model even “sees”.
- “Anthropic” becomes “Anth” + “ropic” because that’s how it appears most often in training data. -
-
- -
-
- - - - -
-
- - -

What the model actually sees

- - -
- Animated diagram showing the 32 integer token IDs the model receives: e.g. 28133 for 'Does', 17554 for ' Chat', 162016 for 'GPT', 97481 for ' Claude'. Generated tokens are also shown as IDs. Vocab size V ≈ 200,000. -
- The model never reads text — it reads a sequence of integers, each one an index into a vocabulary of ~200,000 entries.
- Notice the comma is always ID 11 — the same punctuation mark maps to the same integer, everywhere, every time. -
-
- -
-
- - - - -
-
- - -

💬 Models are stateless

- - -
- - -
- User - Model -
- - -
-
- “My name is Shayan.” - ➜ to model -
-
- - -
-
- “Okay, your name is Shayan.” - ➜ to user -
-
- - -
-
- “What is my name?” - ➜ to model -
-
- - -
-
- “I don’t know your name — I have no memory of what you just said.” - ➜ to user -
-
- -
- - -

Every turn is a fresh API call.

-

Memory only exists if the harness replays the transcript.

- -
-
-