From a05c791c41becc9421350e2c84e24f26bcfaa93e Mon Sep 17 00:00:00 2001 From: Shayan Rais Date: Thu, 7 May 2026 11:45:16 +0500 Subject: [PATCH] =?UTF-8?q?add=20llm-advanced.svg=20=E2=80=94=20combined?= =?UTF-8?q?=20BPE=20tokenization=20+=20autoregressive=20diagram?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Animated SVG showing the same BPE-tokenized prompt from tokens.jpg (32 colored subword tiles, e.g., "Anthropic" → "Anth"+"ropic", "Perplexity" → "Per"+"plex"+"ity") feeding into the LLM and generating "Yes, they all use BPE." token-by-token across 7 iterations. Combines tokenization and autoregressive generation into one view. Co-Authored-By: Claude --- presentation/assets/llm/llm-advanced.svg | 150 +++++++++++++++++++++++ 1 file changed, 150 insertions(+) create mode 100644 presentation/assets/llm/llm-advanced.svg diff --git a/presentation/assets/llm/llm-advanced.svg b/presentation/assets/llm/llm-advanced.svg new file mode 100644 index 0000000..4e6728c --- /dev/null +++ b/presentation/assets/llm/llm-advanced.svg @@ -0,0 +1,150 @@ + + + + + + + + + + + + + + + How an LLM tokenizes input and generates text autoregressively + + + BPE chops words into subword tokens — same color = same word, gray = punctuation + + + ITERATION 1 / 7 + ITERATION 2 / 7 + ITERATION 3 / 7 + ITERATION 4 / 7 + ITERATION 5 / 7 + ITERATION 6 / 7 + ITERATION 7 / 7 + + + INPUT (CONTEXT) + + + Original prompt → 32 BPE tokens (105 chars) + Does + Chat + GPT + , + Claude + , + Anth + ropic + , + Ll + ama + , + Mist + ral + , + Gem + ini + , + and + Per + plex + ity + all + use + Byte + - + Pair + Encoding + ( + BPE + ) + ? + + Generated tokens (autoregressive) + Yes + , + they + all + use + BPE + . + + + + + + + + LLM + (black box) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + PREDICTED NEXT TOKEN + + argmax P(next token | input) + Yes + , + they + all + use + BPE + . + single token, drawn from the same vocab as the input + + + + + + + predicted token appended to input → next iteration + +