From 73da14473e43219d44755dc96f1fed7f9d5ebd43 Mon Sep 17 00:00:00 2001 From: Shayan Rais Date: Thu, 7 May 2026 11:45:27 +0500 Subject: [PATCH] =?UTF-8?q?insert=203=20LLM=20intro=20slides=20at=20positi?= =?UTF-8?q?ons=2011=E2=80=9313=20in=20claude-code-best-practice=20deck?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit New slides introduce LLM fundamentals before the Claude Code content for a PM and beginner-technical audience: - Slide 11 "One token at a time" — autoregressive generation (llm-basic.svg) - Slide 12 "Tokens, not words" — tokenization basics (tokens.jpg) - Slide 13 "Tokens in, tokens out" — combined view (llm-advanced.svg) Renumbering: former slides 11–50 shifted to 14–53 by incrementing data-slide attributes and updating SLIDE-N banner comments. data-level groupings (agents/claude-md/skills/context/workflow) preserved. Total slide count 50 → 53. Asset path: ../assets/llm/. Co-Authored-By: Claude --- .../claude-code-best-practice/index.html | 181 +++++++++++++----- 1 file changed, 128 insertions(+), 53 deletions(-) diff --git a/presentation/claude-code-best-practice/index.html b/presentation/claude-code-best-practice/index.html index c6ca716..2ba94ea 100644 --- a/presentation/claude-code-best-practice/index.html +++ b/presentation/claude-code-best-practice/index.html @@ -560,9 +560,84 @@ - +
+
+ + +

One token at a time

+ + +
+ Animated diagram showing autoregressive generation: prompt feeds into LLM, which predicts one token, feeds it back, and repeats until the full answer is produced. +
+ The model produces one token per inference, feeding each result back as new input.
+ This is why streaming feels gradual — and why longer outputs cost more in both latency and API spend. +
+
+ +
+
+ + + + +
+
+ + +

Tokens, not words

+ + +
+ Screenshot of the OpenAI tokenizer showing the sentence about BPE split into 32 tokens across 105 characters, with tabs for GPT-5.x, GPT-4, and GPT-3 tokenizers. +
+ 105 characters → 32 tokens. Rule of thumb: ~4 chars per token in English.
+ Each model generation uses a different tokenizer — same text, different token count, different cost. +
+
+ +
+
+ + + + +
+
+ + +

Tokens in, tokens out

+ + +
+ Animated diagram combining tokenization and autoregressive generation: the BPE-tokenized prompt feeds into the LLM, which generates the answer token-by-token using the same shared vocabulary. +
+ Input and output share the same vocabulary — tokenization shapes what the model even “sees”.
+ “Anthropic” becomes “Anth” + “ropic” because that’s how it appears most often in training data. +
+
+ +
+
+ + + + +
@@ -619,9 +694,9 @@
- + -
+

🧠 Models — e.g. Opus, GPT, Gemini

@@ -658,9 +733,9 @@
- + -
+

🧠 Limitations

The raw model has no real-time access — no internet, no files, no clock.

@@ -669,9 +744,9 @@
- + -
+
@@ -834,9 +909,9 @@
- + -
+

⚡ Tool Calling — how the harness reaches the world

- + -
+

💪 Harness — the body around the brain

@@ -902,9 +977,9 @@
- + -
+

💪 Harness — the body around the brain

@@ -939,9 +1014,9 @@
- + -
+

🎉 Yayyyyy! Problem solved with harness

The harness reaches out via WebSearch and fetches a real answer from live sources.

@@ -950,9 +1025,9 @@
- + -
+
?

Really?

@@ -960,9 +1035,9 @@
- + -
+

💪 Non-determinism — Doesn’t always use its tools

Similar prompt — but this time the model decided not to use the tool.

@@ -971,9 +1046,9 @@
- + -
+

💪 Non-determinism — Tools can fail

The model first tried one source — it failed (403) — so it fell back to another.

@@ -982,9 +1057,9 @@
- + -
+

🚨 Problem Statement

  1. @@ -999,9 +1074,9 @@
- + -
+

Vibe Coding

Andrej Karpathy's Feb 3 2025 tweet coining 'vibe coding' — 'fully give in to the vibes, embrace exponentials, and forget that the code even exists' @@ -1011,9 +1086,9 @@
- + -
+

Vibe Coding vs Agentic Engineering

@@ -1082,7 +1157,7 @@ todoapp/ -
+

👤 Agents

@@ -1140,7 +1215,7 @@ todoapp/
-
+

Create your first agent — /agents

@@ -1194,7 +1269,7 @@ todoapp/
-
+

Demo