From b667fc52342902f4b11c77bf4c2e902cc7bc94bc Mon Sep 17 00:00:00 2001 From: Shayan Rais Date: Thu, 7 May 2026 12:48:04 +0500 Subject: [PATCH] restructure llm-animation-tokenids.svg for projector legibility MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Same pattern as 13f7ca9 (llm-basic.svg). Title and subtitle removed from the SVG; both promoted to the slide-level heading and caption on slide 13. The math-notation subtitle (f: ℤᵏ → ℝⱽ ; next_id = argmax(f(ids))) was removed entirely — not promoted anywhere, since it doesn't help PMs and there's no good place for it after the bottom section was simplified to a single line. Iteration counter relocated from y=83 (top) to y=550 (bottom, just above the feedback caption), font size 13 → 20. Entire diagram shifted upward by 80px to reclaim the space freed by the removed title and subtitle. Feedback path arc shifted accordingly: M 1090 510 Q 680 568 270 510 → M 1090 430 Q 680 488 270 430. Feedback caption pushed slightly: y=565 → y=575. Footnote about illustrative response IDs shifted: y=588 → y=613. ViewBox extended 600 → 630 at the bottom. Background rect height bumped to 630 to match. The decorative f: ℤᵏ → ℝⱽ label *inside* the LLM black box was preserved as visual texture (too small to read at projector distance, just signals "math is happening here") — distinct from the removed top subtitle. All 7 blocks on iteration elements preserved verbatim. Co-Authored-By: Claude --- .../assets/llm/llm-animation-tokenids.svg | 201 +++++++++--------- 1 file changed, 97 insertions(+), 104 deletions(-) diff --git a/presentation/assets/llm/llm-animation-tokenids.svg b/presentation/assets/llm/llm-animation-tokenids.svg index 2819bd8..556c737 100644 --- a/presentation/assets/llm/llm-animation-tokenids.svg +++ b/presentation/assets/llm/llm-animation-tokenids.svg @@ -1,5 +1,5 @@ - - + - - What the LLM actually sees: integer token IDs (advanced view) - - - BPE encodes text → integer IDs. The model is a function f: ℤᵏ → ℝⱽ ; next_id = argmax(f(ids)) - - - ITERATION 1 / 7 - ITERATION 2 / 7 - ITERATION 3 / 7 - ITERATION 4 / 7 - ITERATION 5 / 7 - ITERATION 6 / 7 - ITERATION 7 / 7 + ITERATION 1 / 7 + ITERATION 2 / 7 + ITERATION 3 / 7 + ITERATION 4 / 7 + ITERATION 5 / 7 + ITERATION 6 / 7 + ITERATION 7 / 7 - INPUT TOKEN IDs (k = 32, vocab V ≈ 200,000) - - Prompt encoded as 32 IDs (large) with token text below (small italic) - 28133Does - 17554 Chat - 162016GPT - 11, - 97481 Claude - 11, - 29683 Anth - 71571ropic - 11, - 451 Ll - 42804ama - 11, - 391 Mi - 2534str - 280al - 11, - 115613 Gemini - 11, - 326 and - 4651 Per - 12081plex - 536ity - 722 all - 1199 use - 20445 Byte - 10316- - 1517Pair - 70820 Encoding - 350 ( - 33B - 3111PE - 20707)? + 28133Does + 17554 Chat + 162016GPT + 11, + 97481 Claude + 11, + 29683 Anth + 71571ropic + 11, + 451 Ll + 42804ama + 11, + 391 Mi + 2534str + 280al + 11, + 115613 Gemini + 11, + 326 and + 4651 Per + 12081plex + 536ity + 722 all + 1199 use + 20445 Byte + 10316- + 1517Pair + 70820 Encoding + 350 ( + 33B + 3111PE + 20707)? - Generated token IDs (autoregressive feedback) - 12814*Yes - 11, - 722 all - 328* of - 1295* them - 656* do - 13. + 12814*Yes + 11, + 722 all + 328* of + 1295* them + 656* do + 13. - - + - - + LLM - f: ℤᵏ → ℝⱽ - - - - - - - - - - - - - - - - - - - - - - + f: ℤᴸ → ℝᴻ + + + + + + + + + + + + + + + + + + + + + + - no characters inside the box — only integers - - + - PREDICTED NEXT TOKEN ID - - argmax over V ≈ 200,000 logit dimensions - next_token_id =12814*↓ decodes to"Yes" - next_token_id =11↓ decodes to"," - next_token_id =722↓ decodes to" all" - next_token_id =328*↓ decodes to" of" - next_token_id =1295*↓ decodes to" them" - next_token_id =656*↓ decodes to" do" - next_token_id =13↓ decodes to"." - next_token_id =12814*↓ decodes to"Yes" + next_token_id =11↓ decodes to"," + next_token_id =722↓ decodes to" all" + next_token_id =328*↓ decodes to" of" + next_token_id =1295*↓ decodes to" them" + next_token_id =656*↓ decodes to" do" + next_token_id =13↓ decodes to"." + decoding text is post-processing — the model never produces strings - - next_token_id appended to input_ids → next forward pass - + * Response IDs are illustrative estimates; prompt IDs are from OpenAI's o200k_base tokenizer.