diff --git a/presentation/claude-code-best-practice/index.html b/presentation/claude-code-best-practice/index.html
index bed2a97..c6ca716 100644
--- a/presentation/claude-code-best-practice/index.html
+++ b/presentation/claude-code-best-practice/index.html
@@ -125,6 +125,7 @@
.pillar-mini-card .pmc-body { font-size: 0.72rem; line-height: 1.35; color: #333; margin-top: 5px; display: -webkit-box; -webkit-line-clamp: 5; -webkit-box-orient: vertical; overflow: hidden; }
.pillar-mini-card .pmc-badge { display: inline-block; font-size: 0.65rem; font-weight: 600; padding: 2px 7px; border-radius: 999px; margin-top: 6px; white-space: nowrap; align-self: flex-start; }
.pillar-mini-card.inactive { opacity: 0.55; }
+
@@ -402,6 +403,7 @@
Boris Cherny (creator of Claude Code) — different teams use Claude Code completely differently.
There is no single “correct” way. But there are patterns worth understanding.
+ Source: Boris Cherny on X — tweet 1 · tweet 2 · tweet 3
@@ -471,9 +473,6 @@
Each run the model samples — temperature controls how widely it samples.
-
- Bender, Gebru, McMillan-Major, Mitchell — On the Dangers of Stochastic Parrots (2021)
-
stochastic
@@ -484,12 +483,86 @@
confident
pattern-matching
+
+
+ Source: Bender, Gebru, McMillan-Major, Mitchell — On the Dangers of Stochastic Parrots (2021)
-
+
+
+
+
+
Even temperature = 0 isn’t deterministic.
+
You set it to zero. You expect the same answer every time. You’re wrong.
+
+
+
+
+
+
+
+
The data point
+
+
+
+
+
+
+
Without fix
+
+
80
+
unique completions
+
out of 1,000 calls
+
+
+
+
+
→
+
+
+
+
With fix
+
+
1
+
unique completion
+
out of 1,000 calls
+
+
+
+
+
+
Qwen3-235B at temperature = 0 — first divergence at token 103 (“Queens, New York” vs “New York City”)
+
+
+
+
+
Why it happens
+
Server load varies → batch size varies → kernel reductions reorder → numerics shift. Not GPU randomness — arithmetic order.
+
+
+
+
+
The fix
+
Batch-invariant kernels → consistent reduction order → identical numerics every run.
+
+
+
+
+
+
Determinism is engineered in — at every layer.
+
+
+
Source: Thinking Machines — Defeating Nondeterminism in LLM Inference (2025)
+
+
+
+
+
+
+
@@ -546,9 +619,9 @@
-
+
-
+
🧠 Models — e.g. Opus, GPT, Gemini
@@ -585,9 +658,9 @@
-
+
-
+
🧠 Limitations
The raw model has no real-time access — no internet, no files, no clock.
@@ -596,9 +669,9 @@
-
+
-
+
@@ -761,9 +834,9 @@
-
+
-
+
⚡ Tool Calling — how the harness reaches the world
Turn — one round from the user’s view: you ask, the assistant answers.
The entire flow above — your request, the assistant’s tool calls, and the final reply — is one turn.
Inference — one call to the language model. The model wakes up, reads the input it was given, writes a reply, then forgets everything. Every arrow touching the “Language Model” column above is a separate inference. One turn can contain many inferences.
+
+
+
Source: Anthropic — Claude Code in Action: What is a coding assistant?
-
+
-
+
💪 Harness — the body around the brain
@@ -826,9 +902,9 @@
-
+
-
+
💪 Harness — the body around the brain
@@ -863,9 +939,9 @@
-
+
-
+
🎉 Yayyyyy! Problem solved with harness
The harness reaches out via WebSearch and fetches a real answer from live sources.
@@ -874,9 +950,9 @@
-
+
-
+
?
Really?
@@ -884,9 +960,9 @@
-
+
-
+
💪 Non-determinism — Doesn’t always use its tools
Similar prompt — but this time the model decided not to use the tool.
@@ -895,9 +971,9 @@
-
+
-
+
💪 Non-determinism — Tools can fail
The model first tried one source — it failed (403) — so it fell back to another.
@@ -906,9 +982,9 @@
-
+
-
+
🚨 Problem Statement
-
@@ -923,20 +999,21 @@
-
+
-
+
-
+
-
+
Vibe Coding vs Agentic Engineering
@@ -1005,7 +1082,7 @@ todoapp/
-
+
👤 Agents
@@ -1063,7 +1140,7 @@ todoapp/
-
+
Create your first agent — /agents
@@ -1117,7 +1194,7 @@ todoapp/
-
+
Demo