diff --git a/presentation/assets/llm/llm-advanced.svg b/presentation/assets/llm/llm-advanced.svg new file mode 100644 index 0000000..4e6728c --- /dev/null +++ b/presentation/assets/llm/llm-advanced.svg @@ -0,0 +1,150 @@ + + + + + + + + + + + + + + + How an LLM tokenizes input and generates text autoregressively + + + BPE chops words into subword tokens — same color = same word, gray = punctuation + + + ITERATION 1 / 7 + ITERATION 2 / 7 + ITERATION 3 / 7 + ITERATION 4 / 7 + ITERATION 5 / 7 + ITERATION 6 / 7 + ITERATION 7 / 7 + + + INPUT (CONTEXT) + + + Original prompt → 32 BPE tokens (105 chars) + Does + Chat + GPT + , + Claude + , + Anth + ropic + , + Ll + ama + , + Mist + ral + , + Gem + ini + , + and + Per + plex + ity + all + use + Byte + - + Pair + Encoding + ( + BPE + ) + ? + + Generated tokens (autoregressive) + Yes + , + they + all + use + BPE + . + + + + + + + + LLM + (black box) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + PREDICTED NEXT TOKEN + + argmax P(next token | input) + Yes + , + they + all + use + BPE + . + single token, drawn from the same vocab as the input + + + + + + + predicted token appended to input → next iteration + +