post: complete kitchen brigade analogy + real flow verdicts + knife-rack model strategy

2026-06-04 21:39:14 +01:00
parent a82b1ffc72
commit 48af4c57f5
1 changed files with 94 additions and 52 deletions
@@ -448,14 +448,34 @@

 <hr>

-<h2>A Real Flow in Action: Fixing 19 Tests After a Crash</h2>
-<p>This morning, a machine crash cut off a flow mid-stream. Nineteen tests were left red — contracts written, implementation half-done. The task: finish the interrupted jump & locomotion animation work and make them all green.</p>
+<h2>Three Kitchens, One Morning</h2>
+<p>This morning, I ran three flows. Each is a different kitchen, a different brigade, a different dish. Here's what actually happened — real flow logs, real verdicts, nothing staged.</p>

-<p>I typed one slash command into Pi:</p>
+<div class="callout callout--amber">
+  <span class="callout__kicker">Flow 1 · 4 June, 18:32</span>
+  <p><strong>/deep-implement</strong> — "Build the tinqs-gitea-read extension: list_org_repos, read_repo_file, list_repo_dir, search_repos." Nine steps, 14 minutes. Verdict: <span class="gate gate--test">PASS</span>. 31/31 vitest tests green, zero new TypeScript errors, session-level caching, path traversal protection. Every <code>execute()</code> body fully wired — no stubs, no placeholders. Like a saucier who doesn't just list ingredients but actually makes the sauce.</p>
+</div>

-<pre><code>/game-feature Finish the leftover jump & locomotion animation work — make the 19 FAILING tests GREEN. They are existing RED contracts written by an earlier animation flow that a machine crash cut off mid-stream; the contracts are already written, so IMPLEMENT to satisfy them (do not rewrite the contracts).</code></pre>
+<div class="callout callout--purple">
+  <span class="callout__kicker">Flow 2 · 4 June, 19:04</span>
+  <p><strong>/game-feature</strong> — "Make the player jump." Build: <span class="gate gate--build">PASS</span>. Tests: <span class="gate gate--test">PASS</span>. Behaviour/Feel/Visual: <span style="color:#f59e0b;">NOT RUN</span> — no live game instance was reachable. The flow didn't silently skip the visual gate. It <strong>hard-stopped</strong> and reported honestly: "FAIL — the feature has not been verified in-game." This is the kitchen saying: "The dish is cooked, but nobody tasted it. I'm not sending it out."</p>
+</div>

-<p>What happened next was fully autonomous. Here's the flow, verbatim — this is the exact YAML that runs in production:</p>
+<div class="callout callout--amber">
+  <span class="callout__kicker">Flow 3 · 4 June, 19:49</span>
+  <p><strong>/cto-infra</strong> — "Synthesize cost, stability, and VCS research into an AWS architecture decision." Four research streams fed into one CTO agent. Output: 14 requirements mapped to specific decisions, cost-vs-stability tradeoffs resolved with dollar figures, EC2+EBS over Fargate+EFS, RDS Multi-AZ mandatory, S3+CloudFront for LFS. Like an executive chef reading four menu proposals, reconciling them into one service, and pricing every plate.</p>
+</div>
+
+<hr class="accent">
+
+<h2>Dinner Rush Recovery: The Crash That Interrupted Service</h2>
+<p>Earlier today, a machine crash cut off a flow mid-stream — the kitchen lost power during dinner rush. Nineteen tests were left red. Contracts written, implementation half-done. Half-cooked dishes on every station.</p>
+
+<p>I typed one slash command — the expediter reassembled the brigade:</p>
+
+<pre><code>/game-feature Finish the leftover jump & locomotion animation work — make the 19 FAILING tests GREEN.</code></pre>
+
+<p>What happened next: the team picked up exactly where the crash left off. Here's the recipe — the exact YAML that runs in production:</p>

 <pre><code>name: game-feature
 description: Build a PLAYABLE game feature and prove it in the LIVE game.
@@ -505,45 +525,58 @@ steps:
  - id: report
    agent: game-judge</code></pre>

-<p>Eighteen steps, seven custom agents, five oracle gates, and one judge. The whole thing runs as a slash command.</p>
+<p>Eighteen steps, seven cooks, five inspection points, one head chef. Triggered by a single order ticket.</p>

-<p>Here's what actually happened. The <strong>vision-preflight</strong> agent fired first — checked that <code>GEMINI_API_KEY</code> was set and that <code>game_frames</code> could reach the live game instance. Both passed in under a second. Without this gate, the rest of the flow would be meaningless — we'd do all the build work only to discover the vision judge can't run. So we check first.</p>
+<p>Here's how the brigade actually worked. The <strong>vision-preflight</strong> agent — the chef who checks the gas is on before anyone starts cooking — verified <code>GEMINI_API_KEY</code> was set and <code>game_frames</code> could reach the live game. Both green in under a second. Without this, the whole kitchen would prep for an hour only to discover the oven doesn't work.</p>

-<p>The <strong>project-context-reader</strong> ingested <code>PlayerController.cs</code>, <code>PlayerAnimController.cs</code>, <code>PlayerAnimationLogic.cs</code>, the test files, and the manifest. The <strong>feature-planner</strong> decomposed the 19 failures into four fix groups: (1) vegetation manifest — 146 items with broken <code>prefabPath</code>, (2) animation controller — crouch parameter not plumbed through, (3) jump physics — coyote time, variable height, air control all unimplemented, (4) animation tree — state machine missing entirely.</p>
+<p>The <strong>project-context-reader</strong> — the commis who reads the entire recipe book — ingested <code>PlayerController.cs</code>, <code>PlayerAnimController.cs</code>, <code>PlayerAnimationLogic.cs</code>, the test files, the manifest. The <strong>feature-planner</strong> — the sous-chef who breaks down the order into station tasks — decomposed 19 failures into four fix groups: vegetation manifest (146 broken <code>prefabPath</code> items), animation controller (crouch parameter not plumbed), jump physics (coyote time, variable height, air control — all missing), and animation tree (entire state machine absent).</p>

-<p>Then the <strong>game-builder</strong> agent went to work. It read the test failure messages, traced each one to the source, and started implementing. Coyote time: a 100ms grace period after <code>IsOnFloor()</code> becomes false. Variable jump height: scale velocity by key hold duration, 3.5 at tap, 6.5 at 300ms hold. Air control: reduce horizontal velocity by 40% when airborne. Jump phases: minimum 0.15s duration on jump_start before transitioning to airborne. Landing timer: wait full <code>jump_land</code> length + one frame, not <code>length - blend</code>. Animation tree: state machine with <code>jump_start → jump → jump_land</code> states, 0.1s blend transitions.</p>
+<p>Then the <strong>game-builder</strong> — the line cook at the hot station — read each test failure like a dish ticket, traced it to the source, and started cooking. Coyote time: 100ms grace period after feet leave the ground. Variable jump height: velocity scaled by hold duration, tap gives 3.5, full hold gives 6.5. Air control: horizontal speed cut 40% while airborne. Jump phases: minimum 0.15s on jump_start before transitioning up. Landing timer: wait the full animation length, not length-minus-blend. Animation tree: <code>jump_start → jump → jump_land</code> states with 0.1s blends.</p>

-<p>The <strong>build-verifier</strong> compiled it. <strong>Test-runner</strong> ran the suite. <strong>Behavioral-prober</strong> sent <code>{"jump": true}</code> to the live game and sampled the player body 30 times. <strong>Feel-judge</strong> measured apex height, airtime, and liftoff latency against thresholds. <strong>Animation-vision-judge</strong> grabbed 8 frames at 100ms intervals, composed them into a grid, and had <code>gemini-2.5-flash</code> check for T-poses, foot-slide, frozen frames, and missing transitions.</p>
+<p>Then the inspection line: <strong>build-verifier</strong> compiled. <strong>Test-runner</strong> ran the suite. <strong>Behavioral-prober</strong> sent <code>{"jump":true}</code> to the live game and sampled the player body. <strong>Feel-judge</strong> measured apex, airtime, liftoff latency. <strong>Animation-vision-judge</strong> captured 8 frames, gridded them, had <code>gemini-2.5-flash</code> scan for T-poses and foot-slide.</p>

-<p>Any red gate → evidence fed back to the game-builder → fix → re-enter the gate ladder. Bounded to 3 retries per the <code>max_iterations</code> in the loop decision. Any green gate → falls through to the next. All green → the <strong>game-judge</strong> produces the final honest verdict.</p>
+<p>Anything red → ticket back to the cook with the specific failure → fix → re-enter the line. Bounded to 3 returns. Anything green → falls through. All green → <strong>game-judge</strong> gives the final verdict.</p>

-<p>This isn't a demo. It's running right now, as I write this, in a Pi session on my machine. The flow is a file at <code>.pi/flows/flows/game-feature.yaml</code>. I trigger it with a slash command. It dispatches sub-agents, runs them through oracle gates, loops on failures, and reports a verdict. That's it.</p>
+<div class="callout">
+  <span class="callout__kicker">Not a Demo</span>
+  <p>This flow is a file at <code>.pi/flows/flows/game-feature.yaml</code>. I trigger it by typing <code>/game-feature</code> in Pi. It dispatches agents, runs gates, loops on failures, reports a verdict. There is no dashboard with drag-and-drop. There is a YAML file and a slash command. That's the whole product.</p>
+</div>

-<h2>The Flow-as-Command Pattern</h2>
-<p>Every flow registers as a slash command. <code>.pi/flows/flows/game-feature.yaml</code> becomes <code>/game-feature</code>. Type it in Pi, describe what you want, hit enter. The flow architect dispatches the DAG, the dashboard shows agent cards with live status, and you watch it happen — or walk away and check the result later.</p>
+<hr class="accent">

-<p>This is the pattern that makes flows different from scripts. Flows are not hardcoded pipelines you invoke from the terminal. They're slash commands you type in conversation. You describe what you want in natural language, the flow wires it through the agents, and the agents route through the gates. The YAML is the skeleton; the conversation is the context.</p>
+<h2>The Menu: Flows Are Slash Commands</h2>
+<p>Every flow becomes a slash command — the menu you read to the expediter. <code>.pi/flows/flows/game-feature.yaml</code> → <code>/game-feature</code>. You don't invoke a pipeline from a terminal. You order a dish in conversation.</p>

-<p>A few flows I use daily:</p>
+<p>"Add wall-running" is not a CLI flag. It's natural language. The flow reads it, wires it through the agents, routes it through the gates. The YAML is the recipe. The conversation is the context.</p>
+
+<p>The menu I call from daily:</p>

 <ul>
-  <li><strong>/game-feature</strong> — "add wall-running" or "fix the 19 red tests from the crash" → research, plan, implement, five gates, judge</li>
-  <li><strong>/review</strong> — "review the last PR" → research → review with code-quality agent</li>
-  <li><strong>/flows:new</strong> — "I need a flow that..." → the Flow Architect reads the agent catalog, selects agents, designs a DAG, and writes the YAML</li>
+  <li><strong>/game-feature</strong> — "add a double-jump" or "fix the 19 red tests" → brigade assembles, cooks, inspects, plates</li>
+  <li><strong>/deep-implement</strong> — "build the gitea-read extension" → research → plan → implement → test → review → judge</li>
+  <li><strong>/cto-infra</strong> — "reconcile cost, stability, and VCS research into architecture decisions" → 4 research streams → 1 synthesis agent → 14 requirements mapped to decisions</li>
+  <li><strong>/flows:new</strong> — "I need a flow that..." → the Flow Architect reads the agent catalog, selects cooks, designs the recipe, writes the YAML</li>
 </ul>

-<p>The slash command is the interface. The flow is the implementation. The oracle gates are the safety net.</p>
+<h2>The Pass: How Agents Hand Off Work</h2>
+<p>In a real kitchen, cooks don't shout instructions across the room. They place finished plates on the pass. The expediter reads the ticket, checks the plate, routes it to the next station or to the dining room. Nobody yells. Nobody grabs someone else's pan.</p>

-<h2>How Agents Communicate (It's Not Chat)</h2>
-<p>A common question: are the agents constantly talking to each other? The answer is no — and that's deliberate. Agents don't chat. They pass structured results through the flow engine bus.</p>
+<p>Flows work the same way. Agents never talk to each other directly. When the game-builder finishes, it doesn't ping the test-runner. It calls <code>finish({ summary: "...", artifacts: "...", files: "..." })</code> — placing its work on the pass. The flow engine — the expediter — records it and routes it. The next agent receives exactly the inputs wired in the YAML: <code>${{result.game-builder.summary}}</code>, <code>${{result.game-builder.files}}</code>.</p>

-<p>Each agent runs in an isolated session with scoped tools and file access. When agent A finishes, it calls <code>finish({ summary: "...", artifacts: "...", files: "..." })</code>. The flow engine records the result. Agent B receives exactly what it needs via template variables — <code>${{result.A.summary}}</code>, <code>${{result.A.artifacts}}</code>, <code>${{result.A.files}}</code> — wired through the <code>inputs:</code> block in the flow YAML.</p>
+<div class="kitchen-grid">
+  <div class="kitchen-col">
+    <span class="kitchen-col__title kitchen-col__title--kitchen">What People Expect</span>
+    <p>Agents chatting freely, PM-slack style: "Hey test-runner, I just pushed some code, can you check it? Also the jump feels off, maybe tune the velocity?"</p>
+  </div>
+  <div class="kitchen-col">
+    <span class="kitchen-col__title kitchen-col__title--reality">What Actually Happens</span>
+    <p>Agent A → <code>finish({verdict: "pass", findings: ["coyote_time=100ms"]})</code> → engine records → Agent B receives <code>${{result.A.findings}}</code> via <code>inputs:</code> block. No chatter. Structured handoff.</p>
+  </div>
+</div>

-<p>This is not agent-to-agent chatter. It's a publish/subscribe bus where the flow engine is the broker. Agents never directly invoke each other. They never read each other's raw output unless the flow explicitly wires it. The DAG's <code>blockedBy</code> edges define who waits for whom; the <code>inputs:</code> block defines what data flows across the edge.</p>
+<p>Why? Because unstructured chatter is how hallucination cascades start. Agent A confidently states something wrong. Agent B builds on it. Agent C compounds it. Three agents later, they're collectively wrong about a file that doesn't exist, and nobody can trace where the error came from. The pass — structured result-passing with typed outputs — makes every handoff auditable, verifiable, and debuggable.</p>

-<p>Why not let agents talk freely? Because unstructured chatter is the fastest path to hallucination cascades. Agent A confidently states something wrong, agent B builds on it, agent C compounds it. By the time a human notices, you have three agents collectively wrong about a file that doesn't exist. Structured result-passing with typed outputs (<code>verdict: pass</code>, <code>findings: ["missing import", "type mismatch"]</code>) keeps each agent's output machine-readable and verifiable by the gates.</p>
-
-<p>Pi itself is designed for solo interactive work — you ask, it does, you review. The orchestration layer I wrote on top inverts that pattern. Pi becomes the agent harness; the flow engine becomes the conductor. Agents don't talk to each other. They talk to the engine. The engine talks to the gates. The gates talk to the live game. That's the architecture.</p>
+<p>Pi itself is built for solo interactive work: you ask, it does, you review. The orchestration layer I wrote on top inverts that. Pi becomes the kitchen. The flow engine becomes the expediter. Agents become line cooks who place plates on the pass, never shouting across the room.</p>

 <h2>The Setup: Extensions, Agents, and 15–20 Flows</h2>
 <p>"How did you set this up?" is the question I get most often. Here's the honest answer: there's no dashboard with drag-and-drop. You write three kinds of files.</p>
@@ -594,49 +627,58 @@ Context: ${{input.context}}</code></pre>

 <p>The setup is not a product you install. It's a stack: Pi as the agent harness, custom extensions as the tool layer, markdown agents as the role layer, YAML flows as the orchestration layer. The whole thing lives in <code>.pi/flows/</code>. Version-controlled. CI-tested. Slash-command invoked.</p>

-<h2>Structure vs. Freestyle: The Skeleton and the Muscle</h2>
-<p>"Do you define the process with these trees, or do the agents freestyle a bit?" Both — and knowing which is which is the whole game.</p>
+<h2>The Recipe vs. The Technique</h2>
+<p>"Do you define the process with these trees, or do the agents freestyle?" Both. The recipe says what to make and in what order. The technique is how each cook executes their station.</p>

-<p>The <strong>skeleton is rigid</strong>. The flow YAML defines exactly which agents run, in what order, with what dependencies (<code>blockedBy</code>), what inputs they receive, and which gates they must pass. The DAG is not negotiable. An agent cannot decide to skip the build gate because it feels confident. The build gate runs. Period.</p>
+<div class="kitchen-grid">
+  <div class="kitchen-col">
+    <span class="kitchen-col__title kitchen-col__title--kitchen">The Recipe (Rigid)</span>
+    <p>The flow YAML is the recipe. It says: first the prep cook dices onions, then the saucier makes the base, then the grill cook sears the protein. After every station, the plate hits the pass for inspection. <strong>This order is not negotiable.</strong> A cook cannot skip the inspection because they feel confident. The inspection runs. Period.</p>
+  </div>
+  <div class="kitchen-col">
+    <span class="kitchen-col__title kitchen-col__title--reality">The Technique (Autonomous)</span>
+    <p>Inside their station, a cook has full agency. How they dice the onions — brunoise or rough chop — is their call. Which pan they use, how they adjust the heat, whether they taste midway. The game-builder decides which files to read, which approach to take. Nobody tells it "edit line 247." It figures that out with <code>grep</code>, <code>find</code>, and reading code.</p>
+  </div>
+</div>

-<p>The <strong>muscle is autonomous</strong>. Inside its step, an agent has full agency. The game-builder decides which files to read, which approach to take, which code to write. It discovers project structure with <code>grep</code> and <code>find</code>. It runs the test suite to understand failures. It writes the fix and verifies it compiles. No human tells it "edit line 247 of PlayerController.cs." The agent figures that out.</p>
+<p>This balance is everything. Too much recipe → agents can't handle surprises. Too much freestyle → agents hallucinate, skip checks, ship broken code. The recipe guarantees the right things happen in the right order — preflight before build, build before test, test before ship. The technique handles the messy, unpredictable reality of actual code.</p>

-<p>Think of it like a company: the org chart (DAG) defines reporting lines and handoff points. The people (agents) do the actual work their own way. The compliance department (gates) checks everything before it ships. The CEO (judge) signs off.</p>
+<div class="callout callout--purple">
+  <span class="callout__kicker">The Meta-Kitchen</span>
+  <p>And when a recipe is wrong? Another flow improves it. A meta-flow reads performance data, spots bottlenecks — "the feel gate keeps failing because the cook doesn't know the jump velocity threshold" — edits the YAML to wire that threshold into the builder's inputs, and commits the change. <strong>Flows that edit flows.</strong> The kitchen that renovates itself between services.</p>
+</div>

-<p>This balance is why the system works at all. Too much structure → agents can't adapt to unexpected situations. Too much freestyle → agents hallucinate, skip checks, ship broken code. The skeleton guarantees the right things happen in the right order. The muscle handles the messy reality of actual code.</p>
+<hr class="accent">

-<p>And when a flow's skeleton is wrong? The meta-flow improves it. It reads flow performance data, identifies bottlenecks ("the feel gate keeps failing because the game-builder doesn't know the jump velocity threshold"), edits the YAML to wire that threshold into the builder's inputs, and commits the change. Flows that improve flows. That's the endgame.</p>
-
-<h2>Model Strategy: DeepSeek for Code, Gemini for Vision</h2>
-<p>"Which DeepSeek model?" The short answer: <strong>DeepSeek V4</strong> for coding-heavy agents, <strong>DeepSeek V4 Flash</strong> for fast routing decisions. The long answer: model selection is not one-size-fits-all.</p>
-
-<p>Flows use <strong>role-based model tiers</strong> — each agent declares a tier (<code>@coding</code>, <code>@planning</code>, <code>@research</code>, <code>@fast</code>, <code>@compact</code>, <code>@vision</code>), and the engine resolves it to a concrete model at dispatch time. This means you can swap models globally without touching any agent or flow file.</p>
+<h2>Picking the Right Knife: Model Strategy</h2>
+<p>You don't use a paring knife to butcher a cow. You don't use a cleaver to supreme an orange. Different work needs different blades. Flows use <strong>role-based model tiers</strong> — each agent declares the blade it needs, and the engine hands it the right one at dispatch time.</p>

 <table style="width:100%;border-collapse:collapse;margin:18px 0;font-size:0.89rem;">
  <thead>
    <tr style="text-align:left;border-bottom:1px solid #2a3340;">
      <th style="padding:8px 12px;color:#c9935a;">Tier</th>
-      <th style="padding:8px 12px;color:#c9935a;">Model</th>
-      <th style="padding:8px 12px;color:#c9935a;">Used for</th>
+      <th style="padding:8px 12px;color:#c9935a;">The Knife</th>
+      <th style="padding:8px 12px;color:#c9935a;">What It Cuts</th>
    </tr>
  </thead>
  <tbody>
-    <tr style="border-bottom:1px solid #1c2230;"><td style="padding:7px 12px;color:#e6edf3;"><code>@coding</code></td><td style="padding:7px 12px;color:#38bdf8;">deepseek/deepseek-v4</td><td style="padding:7px 12px;color:#cdd7e2;">Reading, writing, editing code — the game-builder, fixer, test-author</td></tr>
-    <tr style="border-bottom:1px solid #1c2230;"><td style="padding:7px 12px;color:#e6edf3;"><code>@planning</code></td><td style="padding:7px 12px;color:#38bdf8;">deepseek/deepseek-v4</td><td style="padding:7px 12px;color:#cdd7e2;">Flow architect, feature planner — decomposing tasks, designing DAGs</td></tr>
-    <tr style="border-bottom:1px solid #1c2230;"><td style="padding:7px 12px;color:#e6edf3;"><code>@fast</code></td><td style="padding:7px 12px;color:#38bdf8;">deepseek/deepseek-v4-flash</td><td style="padding:7px 12px;color:#cdd7e2;">Routing decisions — gate pass/fail, fork choices, loop exit checks</td></tr>
-    <tr style="border-bottom:1px solid #1c2230;"><td style="padding:7px 12px;color:#e6edf3;"><code>@research</code></td><td style="padding:7px 12px;color:#38bdf8;">deepseek/deepseek-v4</td><td style="padding:7px 12px;color:#cdd7e2;">Codebase investigation, reading project docs, pattern analysis</td></tr>
-    <tr style="border-bottom:1px solid #1c2230;"><td style="padding:7px 12px;color:#e6edf3;"><code>@vision</code></td><td style="padding:7px 12px;color:#38bdf8;">google/gemini-2.5-flash</td><td style="padding:7px 12px;color:#cdd7e2;">Multimodal frame judging — T-pose detection, animation clip verification</td></tr>
-    <tr><td style="padding:7px 12px;color:#e6edf3;"><code>@compact</code></td><td style="padding:7px 12px;color:#38bdf8;">deepseek/deepseek-v4-flash</td><td style="padding:7px 12px;color:#cdd7e2;">Summarisation, report generation, lightweight post-processing</td></tr>
+    <tr style="border-bottom:1px solid #1c2230;"><td style="padding:7px 12px;color:#e6edf3;"><code>@coding</code></td><td style="padding:7px 12px;color:#f59e0b;">DeepSeek V4</td><td style="padding:7px 12px;color:#cdd7e2;"><strong>Chef's knife</strong> — your workhorse. Reads 800-line files, writes 200-line diffs. Game-builder, fixer, test-author. Free.</td></tr>
+    <tr style="border-bottom:1px solid #1c2230;"><td style="padding:7px 12px;color:#e6edf3;"><code>@planning</code></td><td style="padding:7px 12px;color:#f59e0b;">DeepSeek V4</td><td style="padding:7px 12px;color:#cdd7e2;"><strong>Boning knife</strong> — precision decomposition. Breaks tasks into steps, designs DAGs. Flow architect, feature planner.</td></tr>
+    <tr style="border-bottom:1px solid #1c2230;"><td style="padding:7px 12px;color:#e6edf3;"><code>@fast</code></td><td style="padding:7px 12px;color:#38bdf8;">DeepSeek V4 Flash</td><td style="padding:7px 12px;color:#cdd7e2;"><strong>Paring knife</strong> — quick, decisive cuts. Gate pass/fail, fork choices, loop exits. No overthinking.</td></tr>
+    <tr style="border-bottom:1px solid #1c2230;"><td style="padding:7px 12px;color:#e6edf3;"><code>@research</code></td><td style="padding:7px 12px;color:#f59e0b;">DeepSeek V4</td><td style="padding:7px 12px;color:#cdd7e2;"><strong>Fillet knife</strong> — flexible, follows contours. Reads codebase, traces patterns, finds what matters.</td></tr>
+    <tr style="border-bottom:1px solid #1c2230;"><td style="padding:7px 12px;color:#e6edf3;"><code>@vision</code></td><td style="padding:7px 12px;color:#a855f7;">Gemini 2.5 Flash</td><td style="padding:7px 12px;color:#cdd7e2;"><strong>The inspector's eyes</strong> — the only knife that sees. Multimodal frame judging: T-poses, foot-slide, frozen anims.</td></tr>
+    <tr><td style="padding:7px 12px;color:#e6edf3;"><code>@compact</code></td><td style="padding:7px 12px;color:#38bdf8;">DeepSeek V4 Flash</td><td style="padding:7px 12px;color:#cdd7e2;"><strong>Kitchen shears</strong> — lightweight, versatile. Summaries, verdicts, post-processing. Fast and cheap.</td></tr>
  </tbody>
 </table>

-<p>Why DeepSeek? Two reasons. First, <strong>it's free</strong> — the coding tier runs on DeepSeek's API with no usage limits, which matters when your game-builder agent is reading 800-line files and writing 200-line diffs ten times a session. Second, <strong>it's genuinely good at C# and Godot</strong> — I've had it write a full lighting module for our Godot fork by reading Unity API docs and adapting patterns. No agent had pulled that off before.</p>
+<div class="callout callout--amber">
+  <span class="callout__kicker">Why DeepSeek?</span>
+  <p>Two reasons. <strong>It's free</strong> — no usage limits, which matters when your game-builder reads 800-line files and writes 200-line diffs ten times a session. <strong>It's genuinely good at C# and Godot</strong> — I've had it write a full lighting module for our Godot fork by reading Unity API docs and adapting patterns. No agent had pulled that off before. DeepSeek can't do multimodal, so vision goes to Gemini — but for everything else, it's the chef's knife you reach for 90% of the time.</p>
+</div>

-<p>Vision is the exception. DeepSeek can't do multimodal, so the visual gate uses <strong>Gemini 2.5 Flash</strong>. It's fast (under 2 seconds per frame grid), cheap, and catches the things that matter: T-poses, foot-slide, frozen animations, missing transitions. The vision preflight gate checks the Gemini API key is set before any build work starts — if it's missing, the entire flow hard-stops. Vision is never silently skipped.</p>
+<p>The point of the knife rack: you configure this <strong>once</strong>. Every agent declares <code>model: @coding</code> and gets DeepSeek V4 automatically. Swap models globally without touching any flow or agent file. The right blade, every time, no thinking required.</p>

-<p>The key insight: <strong>different work needs different brains</strong>. Code writing needs a model that understands language semantics and type systems. Vision judging needs a model that sees pixels and understands motion. Routing decisions need a model that's fast and decisive, not one that overthinks. The role-tier system means you configure this once, at the model level, and every agent that declares <code>model: @coding</code> gets the right brain automatically.</p>
-
-<hr>
+<hr class="accent">

 <p>The oracle tools — <code>verify_build</code>, <code>drive_game</code>, <code>game_frames</code> — are the durable assets. About 300 lines of TypeScript each, MIT licensed, reusable in any Pi project. The flow engine composes them; the agents route through them.</p>