diff --git a/presentation/claude-code-best-practice/index.html b/presentation/claude-code-best-practice/index.html index 50d087b..521c0b2 100644 --- a/presentation/claude-code-best-practice/index.html +++ b/presentation/claude-code-best-practice/index.html @@ -480,25 +480,17 @@
-
- - -

One token at a time

- - -
- Animated diagram showing autoregressive generation: prompt feeds into LLM, which predicts one token, feeds it back, and repeats until the full answer is produced. -
- The model produces one token per inference, feeding each result back as new input.
- This is why streaming feels gradual — and why longer outputs cost more in both latency and API spend. -
-
- -
+

How an LLM generates text (autoregressive)

+
+ Animated diagram showing autoregressive generation: prompt feeds into LLM, which predicts one token, feeds it back, and repeats until the full answer is produced. +
+ Each predicted token is appended to the input, then fed back into the LLM. +
+