diff --git a/presentation/assets/llm/llm-advanced.svg b/presentation/assets/llm/llm-advanced.svg index 4e6728c..918df4f 100644 --- a/presentation/assets/llm/llm-advanced.svg +++ b/presentation/assets/llm/llm-advanced.svg @@ -1,5 +1,5 @@ - - + - - How an LLM tokenizes input and generates text autoregressively - - - BPE chops words into subword tokens — same color = same word, gray = punctuation - - - ITERATION 1 / 7 - ITERATION 2 / 7 - ITERATION 3 / 7 - ITERATION 4 / 7 - ITERATION 5 / 7 - ITERATION 6 / 7 - ITERATION 7 / 7 + ITERATION 1 / 7 + ITERATION 2 / 7 + ITERATION 3 / 7 + ITERATION 4 / 7 + ITERATION 5 / 7 + ITERATION 6 / 7 + ITERATION 7 / 7 - INPUT (CONTEXT) - - Original prompt → 32 BPE tokens (105 chars) - Does - Chat - GPT - , - Claude - , - Anth - ropic - , - Ll - ama - , - Mist - ral - , - Gem - ini - , - and - Per - plex - ity - all - use - Byte - - - Pair - Encoding - ( - BPE - ) - ? + Does + Chat + GPT + , + Claude + , + Anth + ropic + , + Ll + ama + , + Mist + ral + , + Gem + ini + , + and + Per + plex + ity + all + use + Byte + - + Pair + Encoding + ( + BPE + ) + ? - Generated tokens (autoregressive) - Yes - , - they - all - use - BPE - . + Yes + , + they + all + use + BPE + . - - + - - + LLM - (black box) - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + - - + - PREDICTED NEXT TOKEN - - argmax P(next token | input) - Yes - , - they - all - use - BPE - . - Yes + , + they + all + use + BPE + . + single token, drawn from the same vocab as the input -