diff --git a/presentation/claude-code-best-practice/index.html b/presentation/claude-code-best-practice/index.html index c6ca716..2ba94ea 100644 --- a/presentation/claude-code-best-practice/index.html +++ b/presentation/claude-code-best-practice/index.html @@ -560,9 +560,84 @@ - +
+
+ + +

One token at a time

+ + +
+ Animated diagram showing autoregressive generation: prompt feeds into LLM, which predicts one token, feeds it back, and repeats until the full answer is produced. +
+ The model produces one token per inference, feeding each result back as new input.
+ This is why streaming feels gradual — and why longer outputs cost more in both latency and API spend. +
+
+ +
+
+ + + + +
+
+ + +

Tokens, not words

+ + +
+ Screenshot of the OpenAI tokenizer showing the sentence about BPE split into 32 tokens across 105 characters, with tabs for GPT-5.x, GPT-4, and GPT-3 tokenizers. +
+ 105 characters → 32 tokens. Rule of thumb: ~4 chars per token in English.
+ Each model generation uses a different tokenizer — same text, different token count, different cost. +
+
+ +
+
+ + + + +
+
+ + +

Tokens in, tokens out

+ + +
+ Animated diagram combining tokenization and autoregressive generation: the BPE-tokenized prompt feeds into the LLM, which generates the answer token-by-token using the same shared vocabulary. +
+ Input and output share the same vocabulary — tokenization shapes what the model even “sees”.
+ “Anthropic” becomes “Anth” + “ropic” because that’s how it appears most often in training data. +
+
+ +
+
+ + + + +
@@ -619,9 +694,9 @@
- + -
+

🧠 Models — e.g. Opus, GPT, Gemini

@@ -658,9 +733,9 @@
- + -
+

🧠 Limitations

The raw model has no real-time access — no internet, no files, no clock.

@@ -669,9 +744,9 @@
- + -
+
@@ -834,9 +909,9 @@
- + -
+

⚡ Tool Calling — how the harness reaches the world

- + -
+

💪 Harness — the body around the brain

@@ -902,9 +977,9 @@
- + -
+

💪 Harness — the body around the brain

@@ -939,9 +1014,9 @@
- + -
+

🎉 Yayyyyy! Problem solved with harness

The harness reaches out via WebSearch and fetches a real answer from live sources.

@@ -950,9 +1025,9 @@
- + -
+
?

Really?

@@ -960,9 +1035,9 @@
- + -
+

💪 Non-determinism — Doesn’t always use its tools

Similar prompt — but this time the model decided not to use the tool.

@@ -971,9 +1046,9 @@
- + -
+

💪 Non-determinism — Tools can fail

The model first tried one source — it failed (403) — so it fell back to another.

@@ -982,9 +1057,9 @@
- + -
+

🚨 Problem Statement

  1. @@ -999,9 +1074,9 @@
- + -
+

Vibe Coding

Andrej Karpathy's Feb 3 2025 tweet coining 'vibe coding' — 'fully give in to the vibes, embrace exponentials, and forget that the code even exists' @@ -1011,9 +1086,9 @@
- + -
+

Vibe Coding vs Agentic Engineering

@@ -1082,7 +1157,7 @@ todoapp/ -
+

👤 Agents

@@ -1140,7 +1215,7 @@ todoapp/
-
+

Create your first agent — /agents

@@ -1194,7 +1269,7 @@ todoapp/
-
+

Demo