-
-
-
-
-
- The model produces one token per inference, feeding each result back as new input.
- This is why streaming feels gradual — and why longer outputs cost more in both latency and API spend.
-
-
-
-
+
How an LLM generates text (autoregressive)
+
+
+
+ Each predicted token is appended to the input, then fed back into the LLM.
+
+