diff --git a/_index_template.html b/_index_template.html
index 91e1ff7..0d805a8 100644
--- a/_index_template.html
+++ b/_index_template.html
@@ -122,9 +122,9 @@
   <div class="blog-list">
     <!-- hand-authored HTML posts (not from build.js) -->
     <a href="pi-flow-native-brain" class="blog-card">
-      <span class="blog-card__date">3 June 2026</span>
+      <span class="blog-card__date">4 June 2026</span>
       <h2 class="blog-card__title">How Pi Agents Build, Test, and Ship Game Code with Oracle-Backed Flows</h2>
-      <p class="blog-card__excerpt">When we ask Pi to build a game feature, it doesn't just write code. It compiles, runs tests, drives the live game, measures feel, fixes CI failures, and ships a green PR — all through composable oracle-backed flows.</p>
+      <p class="blog-card__excerpt">We type a slash command, agents fan out through five oracle gates, the game-builder fixes 19 red tests while vision judges check the live game — and it all runs as one autonomous flow.</p>
       <span class="blog-card__read">Read &rarr;</span>
     </a>
 {{CARDS}}
diff --git a/pi-flow-native-brain.html b/pi-flow-native-brain.html
index 7302c82..eb3e6cf 100644
--- a/pi-flow-native-brain.html
+++ b/pi-flow-native-brain.html
@@ -25,7 +25,7 @@
     "@context": "https://schema.org",
     "@type": "BlogPosting",
     "headline": "How Pi Agents Build, Test, and Ship Game Code with Oracle-Backed Flows",
-    "datePublished": "2026-06-03",
+    "datePublished": "2026-06-04",
     "author": {
       "@type": "Person",
       "name": "Ozan Bozkurt"
@@ -162,6 +162,91 @@
     .post__body li {
       margin: 4px 0;
     }
+
+    /* ── Analogy callout box ── */
+    .post__body .callout {
+      background: linear-gradient(135deg, rgba(56,189,248,0.06), rgba(168,85,247,0.06));
+      border: 1px solid rgba(56,189,248,0.2);
+      border-left: 4px solid #38bdf8;
+      border-radius: 0 12px 12px 0;
+      padding: 18px 20px;
+      margin: 22px 0;
+    }
+    .post__body .callout--amber {
+      background: linear-gradient(135deg, rgba(245,158,11,0.07), rgba(201,147,90,0.05));
+      border-color: rgba(245,158,11,0.25);
+      border-left-color: #f59e0b;
+    }
+    .post__body .callout--purple {
+      background: linear-gradient(135deg, rgba(168,85,247,0.07), rgba(56,189,248,0.04));
+      border-color: rgba(168,85,247,0.25);
+      border-left-color: #a855f7;
+    }
+    .post__body .callout__kicker {
+      font-family: ui-monospace, 'SF Mono', 'Cascadia Code', Consolas, monospace;
+      font-size: 0.7rem;
+      letter-spacing: 0.18em;
+      text-transform: uppercase;
+      color: #38bdf8;
+      margin-bottom: 8px;
+      display: block;
+    }
+    .post__body .callout--amber .callout__kicker { color: #f59e0b; }
+    .post__body .callout--purple .callout__kicker { color: #a855f7; }
+    .post__body .callout p { margin: 6px 0 0; color: #cdd7e2; }
+    .post__body .callout p + p { margin-top: 10px; }
+
+    /* ── Gate badge pills ── */
+    .gate {
+      display: inline-block;
+      font-family: ui-monospace, 'SF Mono', 'Cascadia Code', Consolas, monospace;
+      font-size: 0.75rem;
+      font-weight: 600;
+      padding: 3px 10px;
+      border-radius: 5px;
+      margin-right: 4px;
+    }
+    .gate--build  { background: rgba(56,189,248,0.12);  color: #38bdf8; border: 1px solid rgba(56,189,248,0.3); }
+    .gate--test   { background: rgba(52,211,153,0.12);  color: #34d399; border: 1px solid rgba(52,211,153,0.3); }
+    .gate--behave { background: rgba(168,85,247,0.12);  color: #a855f7; border: 1px solid rgba(168,85,247,0.3); }
+    .gate--feel   { background: rgba(245,158,11,0.12);  color: #f59e0b; border: 1px solid rgba(245,158,11,0.3); }
+    .gate--visual { background: rgba(201,147,90,0.12);  color: #c9935a; border: 1px solid rgba(201,147,90,0.3); }
+
+    /* ── Section divider accent ── */
+    .post__body hr { border-color: #2a3340; margin: 36px 0; }
+    .post__body hr.accent {
+      border: none;
+      height: 2px;
+      background: linear-gradient(90deg, transparent, #38bdf8 20%, #a855f7 50%, #f59e0b 80%, transparent);
+      margin: 40px 0;
+    }
+
+    /* ── Two-column kitchen comparison ── */
+    .kitchen-grid {
+      display: grid;
+      grid-template-columns: 1fr 1fr;
+      gap: 16px;
+      margin: 18px 0;
+    }
+    @media (max-width: 640px) { .kitchen-grid { grid-template-columns: 1fr; } }
+    .kitchen-col {
+      background: #0c1119;
+      border: 1px solid #2a3340;
+      border-radius: 10px;
+      padding: 16px 18px;
+    }
+    .kitchen-col__title {
+      font-family: ui-monospace, 'SF Mono', 'Cascadia Code', Consolas, monospace;
+      font-size: 0.68rem;
+      letter-spacing: 0.16em;
+      text-transform: uppercase;
+      margin-bottom: 10px;
+      display: block;
+    }
+    .kitchen-col__title--kitchen { color: #f59e0b; }
+    .kitchen-col__title--reality  { color: #38bdf8; }
+    .kitchen-col p { font-size: 0.9rem; color: #9aa7b4; margin: 4px 0; }
+    .kitchen-col p strong { color: #e6edf3; }
   </style>
 </head>
 <body>
@@ -194,14 +279,19 @@
 
   <article class="post">
     <a href="/blog/" class="post__back">&larr; All Posts</a>
-    <span class="post__date">3 June 2026</span>
+    <span class="post__date">4 June 2026</span>
     <h1 class="post__title">How Pi Agents Build, Test, and Ship Code with Oracle-Backed Flows</h1>
-    <p class="post__lead">When we ask Pi to build a feature for Ariki — say, "add a double-jump with a cooldown indicator" — five things happen. The agent writes the code. A build gate compiles it. A test gate runs the test suite. A behaviour gate drives the live game and checks the character actually double-jumps. A feel gate measures apex height, airtime, and landing settle. And if CI disagrees with any of it, the agent reads the failure log and fixes it. None of this is magic. It's Pi flows.</p>
+    <p class="post__lead">Think of a restaurant kitchen during dinner rush. The head chef doesn't cook every dish. She runs the pass — each plate gets inspected before it leaves. One cook handles sauces, another pastry, another the grill. The expediter calls orders, coordinates timing, makes sure table 4's mains don't arrive before table 2's starters. A dish comes back? It goes straight to the station that messed up, with a ticket explaining exactly what's wrong. That kitchen runs on flows. So does our game engine.</p>
 
     <div class="post__body">
 
-<h2>What Happens When You Ask Pi to Build Something</h2>
-<p>The flow starts the same way every agent task does: context, then plan, then implement. That's the standard loop. What makes it interesting is what happens <em>after</em> implementation — a ladder of five gates, each run by a specialised sub-agent with its own tools and its own pass/fail authority.</p>
+<div class="callout">
+  <span class="callout__kicker">The Kitchen ↔ Flows Analogy</span>
+  <p><strong>The kitchen</strong> = Pi (the agent harness). <strong>The recipe</strong> = a flow YAML (the DAG). <strong>The line cooks</strong> = agents (each with a station and tools). <strong>The pass</strong> = the flow engine (routes finished work). <strong>The head chef's inspection</strong> = the five gates. <strong>The order ticket</strong> = a slash command. <strong>"Send it back!"</strong> = the fix loop.</p>
+</div>
+
+<h2>What Happens When You Type a Slash Command</h2>
+<p>You type <code>/game-feature add a double-jump with cooldown</code> and hit enter. The ticket hits the kitchen. What follows is not one agent doing everything — it's a brigade running their stations.</p>
 
 <figure style="margin:28px 0;">
   <svg viewBox="0 0 920 350" role="img" aria-label="The verify-heavy flow: context, plan, implement, five gates, a Reflexion loop, and one judge" style="width:100%;height:auto;display:block;background:#0a0e14;border:1px solid #2a3340;border-radius:12px;font-family:'IBM Plex Sans',system-ui,sans-serif;">
@@ -240,46 +330,88 @@
   <figcaption style="color:#9aa7b4;font-size:0.85rem;margin-top:8px;">A real failure loops back to <em>implement</em> with gate evidence (bounded to three tries); anything green falls through to the judge.</figcaption>
 </figure>
 
-<h2>The Five Gates</h2>
-<p>Each gate is a sub-agent with one job and one tool.</p>
+<h2>The Five Gates: What the Head Chef Checks</h2>
+<p>In a kitchen, the head chef doesn't trust — she verifies. Every plate hits the pass and gets inspected. Our flows have the same instinct. Each gate is a sub-agent with one job, one tool, and absolute veto power.</p>
 
-<p><strong style="color:#f59e0b;">G1 — Build.</strong> Runs <code>dotnet build</code> on the game and sim. Returns PASS/FAIL with file:line errors. If the code doesn't compile, nothing proceeds.</p>
+<div class="kitchen-grid">
+  <div class="kitchen-col">
+    <span class="kitchen-col__title kitchen-col__title--kitchen">In the Kitchen</span>
+    <p><strong>Check the base.</strong> Is the protein cooked through? If the chicken is raw, the whole plate stops here. Nothing else matters.</p>
+  </div>
+  <div class="kitchen-col">
+    <span class="kitchen-col__title kitchen-col__title--reality">In the Flow</span>
+    <p><span class="gate gate--build">G1 · Build</span> Runs <code>dotnet build</code>. PASS/FAIL with file:line errors. Won't compile? Nothing proceeds.</p>
+  </div>
+  <div class="kitchen-col">
+    <span class="kitchen-col__title kitchen-col__title--kitchen">In the Kitchen</span>
+    <p><strong>Taste the sauce.</strong> Seasoning right? Acid balanced? The dish might look perfect but taste flat.</p>
+  </div>
+  <div class="kitchen-col">
+    <span class="kitchen-col__title kitchen-col__title--reality">In the Flow</span>
+    <p><span class="gate gate--test">G2 · Tests</span> Runs <code>dotnet test</code>. Parses which assertions broke. Fixed code that passes build but fails logic gets caught here.</p>
+  </div>
+  <div class="kitchen-col">
+    <span class="kitchen-col__title kitchen-col__title--kitchen">In the Kitchen</span>
+    <p><strong>Does it work?</strong> Pick it up. Does the sauce hold? Does the plating survive the walk to table 6?</p>
+  </div>
+  <div class="kitchen-col">
+    <span class="kitchen-col__title kitchen-col__title--reality">In the Flow</span>
+    <p><span class="gate gate--behave">G3 · Behaviour</span> Sends <code>{"jump":true}</code> to the LIVE game. Samples the player body 30 times at 50ms. Did the character actually jump? Double-jump fire? This is the ground-truth oracle — what makes game dev fundamentally different from web dev.</p>
+  </div>
+  <div class="kitchen-col">
+    <span class="kitchen-col__title kitchen-col__title--kitchen">In the Kitchen</span>
+    <p><strong>How does it feel?</strong> The steak is cooked but chewy. The sauce is seasoned but gloopy. Edible ≠ good.</p>
+  </div>
+  <div class="kitchen-col">
+    <span class="kitchen-col__title kitchen-col__title--reality">In the Flow</span>
+    <p><span class="gate gate--feel">G4 · Feel</span> Measures apex height, airtime, liftoff latency, rise/fall asymmetry, landing settle. Numeric thresholds. A jump that works but takes 400ms to lift off fails. Behaviour says it happened. Feel says it felt good.</p>
+  </div>
+  <div class="kitchen-col">
+    <span class="kitchen-col__title kitchen-col__title--kitchen">In the Kitchen</span>
+    <p><strong>How does it look?</strong> Is the garnish wilting? Sauce smeared? Does it match the menu photo?</p>
+  </div>
+  <div class="kitchen-col">
+    <span class="kitchen-col__title kitchen-col__title--reality">In the Flow</span>
+    <p><span class="gate gate--visual">G5 · Visual</span> Captures 8 frames at 100ms intervals, grids them, feeds to <code>gemini-2.5-flash</code>. Checks: T-pose? Foot-slide? Frozen animation? Wrong clip? Missing transitions?</p>
+  </div>
+</div>
 
-<p><strong style="color:#f59e0b;">G2 — Tests.</strong> Runs <code>dotnet test</code> and parses results. The agent reads which tests broke and fixes assertions, mocks, or test setup.</p>
+<div class="callout callout--amber">
+  <span class="callout__kicker">The Loop</span>
+  <p>Any red gate → evidence sent back to the cook → fix → re-enter the inspection line. Three chances max, then the head chef escalates to a human. This is the same instinct that makes a good kitchen work: catch it early, send it back with a clear note, give them a chance to fix it, but don't let the same dish circle the pass forever.</p>
+</div>
 
-<p><strong style="color:#f59e0b;">G3 — Behaviour (live game).</strong> This is the one that makes game dev different from web dev. The agent sends input to the running game — <code>{"jump": true}</code> — and samples the player body 30 times at 50ms intervals. It checks: did the character actually jump? Did the double-jump fire? Was there a cooldown? The <code>drive_game</code> tool is the ground-truth oracle for whether a movement feature works in-game, not just in tests.</p>
+<h2>Composability: Adding a New Station</h2>
+<p>A kitchen doesn't redesign the whole line when they add a new dish. They add a station. Same in flows. Started with three gates — build, test, vision. Behaviour and feel came later, each a single-file extension. Gates aren't hardcoded. They're sub-agents declared in YAML. Want a linting gate? Add a sub-agent with a linter. Security scan? Same pattern. Asset bundle size check? Write the tool, declare the agent, wire it in.</p>
 
-<p><strong style="color:#f59e0b;">G4 — Feel (measured game-feel).</strong> Behaviour checks whether it worked. Feel checks whether it felt good. The agent measures apex height, airtime, liftoff latency, rise/fall asymmetry, and landing settle. Numeric metrics with thresholds. A jump that technically works but takes 400ms to lift off fails the feel gate.</p>
+<div class="callout callout--purple">
+  <span class="callout__kicker">Self-Improving Kitchen</span>
+  <p>Agents can extend the flow at runtime. If the behaviour gate keeps failing because the game window isn't focused, an agent notices the pattern and inserts a pre-condition gate that checks window focus. The flow engine handles routing; the agents handle decisions. This is what makes flows fundamentally different from a script — the pipeline isn't fixed at compile time. It's a graph that agents read, understand, and modify while they run.</p>
+</div>
 
-<p><strong style="color:#f59e0b;">G5 — Visual.</strong> Captures frame sequences from the live game and feeds them to a vision model. Checks: is the animation playing? Is the cooldown indicator visible? Are there visual artifacts?</p>
+<hr class="accent">
 
-<p>Anything green falls through to the judge. Anything red loops back to implement with the failure evidence — the agent reads what went wrong, fixes it, and re-enters the gate ladder. Three retries max, then escalation to a human.</p>
+<h2>The CI Loop: The Dish That Came Back After It Left</h2>
+<p>Gates inspect plates at the pass. But what about after the plate leaves the kitchen? What about the customer who finds a hair in their soup after it's been served?</p>
 
-<h2>Composability: Gates Are Cheap to Add</h2>
-<p>The flow started with three gates — build, test, vision. Behaviour and feel were added later, each as a one-file extension. Gates are not hardcoded. They're sub-agents declared in a flow config. Want a linting gate? Add a sub-agent with a linter tool. Want a security scan? Same pattern. Want a gate that checks asset bundle sizes haven't bloated? Write the tool, declare the sub-agent, wire it into the flow.</p>
+<p>Most coding agents don't care. They write code, push, walk away. A human discovers the broken CI build an hour later. That's the equivalent of a cook plating a dish, sending it out, and never checking if the diner is still alive.</p>
 
-<p>Agents themselves can extend the flow. If a sub-agent notices a pattern of failures — "the last three behaviour checks failed because the game window wasn't focused" — it can insert a pre-condition gate that checks window focus before proceeding. The flow engine handles routing; the agents handle decisions.</p>
-
-<p>This is what makes flows fundamentally different from a script: the pipeline is not fixed at compile time. It's a graph that agents read, understand, and modify at runtime.</p>
-
-<h2>The CI Loop: Agents That Fix Their Own Builds</h2>
-<p>Gates handle pre-push verification. But what about after push? What about CI?</p>
-
-<p>Most coding agents don't care if the code compiles on the CI runner. They write, they push, they walk away. A human discovers the broken build an hour later.</p>
-
-<p>We closed this loop with the <code>tinqs-ci</code> extension — three tools that give agents post-push autonomy:</p>
+<p>We closed this loop with three tools — the waiter who brings the plate back:</p>
 
 <ul>
-  <li><strong>ci_status</strong> — checks pipeline state for a branch</li>
-  <li><strong>ci_logs</strong> — fetches the full build log from the most recent failed run</li>
-  <li><strong>ci_wait</strong> — polls every 15 seconds until the pipeline finishes</li>
+  <li><strong>ci_wait</strong> — stands by the table, polls every 15 seconds until the diner finishes</li>
+  <li><strong>ci_status</strong> — checks: did they enjoy it or send it back?</li>
+  <li><strong>ci_logs</strong> — reads the complaint card: exactly what was wrong</li>
 </ul>
 
-<p>The agent pushes its branch, calls <code>ci_wait</code>, and if CI fails, reads <code>ci_logs</code>, fixes the issue, pushes again, and polls again. DeepSeek V4 parses compiler errors, identifies files and lines, and fixes them. A missing import, a type mismatch, a module not found — pattern-matched and corrected in seconds.</p>
+<p>The agent pushes, calls <code>ci_wait</code>. If CI fails, it reads <code>ci_logs</code>, fixes the exact error, pushes again. DeepSeek V4 parses compiler errors the way a cook reads a ticket: "missing import" = forgot the salt, "type mismatch" = wrong pan size, "module not found" = ingredient not in stock. Pattern-matched and fixed in seconds.</p>
 
-<p>A real example from last week: adding a health check endpoint to a Go service. Agent wrote the handler and test, pushed. CI failed — the test imported a package that didn't exist on the runner. Agent read <code>ci_logs</code>, saw <code>go: module not found</code>, added the missing <code>go.mod</code> replace directive, pushed again. CI passed. PR opened. <strong>4 minutes. $0.06.</strong></p>
+<div class="callout callout--amber">
+  <span class="callout__kicker">Real Example</span>
+  <p>Adding a health check endpoint to a Go service. Agent wrote the handler and test, pushed. CI failed — the test imported a package that didn't exist on the runner. Agent read <code>ci_logs</code>, saw <code>go: module not found</code>, added the missing <code>go.mod</code> replace directive, pushed again. CI passed. PR opened. <strong>4 minutes. $0.06.</strong></p>
+</div>
 
-<p>Three safeguards prevent runaway loops: <strong>retry limit</strong> (3, hard-coded in the orchestrator), <strong>diff budget</strong> (retries only touch files already in the changeset), and <strong>hallucination detection</strong> (if the agent claims CI passed without calling <code>ci_status</code>, it gets corrected).</p>
+<p>Three safeguards prevent the kitchen grinding to a halt: <strong>retry limit</strong> (3, same dish doesn't circle forever), <strong>diff budget</strong> (retries only touch files already on the ticket), and <strong>hallucination detection</strong> (if the cook claims the customer loved it without actually asking the waiter, they get corrected).</p>
 
 <h2>The Numbers</h2>
 <p>Over three weeks of running the orchestrator:</p>
@@ -316,6 +448,196 @@
 
 <hr>
 
+<h2>A Real Flow in Action: Fixing 19 Tests After a Crash</h2>
+<p>This morning, a machine crash cut off a flow mid-stream. Nineteen tests were left red — contracts written, implementation half-done. The task: finish the interrupted jump & locomotion animation work and make them all green.</p>
+
+<p>I typed one slash command into Pi:</p>
+
+<pre><code>/game-feature Finish the leftover jump & locomotion animation work — make the 19 FAILING tests GREEN. They are existing RED contracts written by an earlier animation flow that a machine crash cut off mid-stream; the contracts are already written, so IMPLEMENT to satisfy them (do not rewrite the contracts).</code></pre>
+
+<p>What happened next was fully autonomous. Here's the flow, verbatim — this is the exact YAML that runs in production:</p>
+
+<pre><code>name: game-feature
+description: Build a PLAYABLE game feature and prove it in the LIVE game.
+task_required: true
+
+steps:
+  # G0: Pre-flight — validate vision CAN run before any build work
+  - id: preflight
+    agent: vision-preflight
+    task: Check GEMINI_API_KEY is set AND game_frames reaches a live instance.
+          If EITHER fails, STOP — vision is not optional.
+
+  # Context + plan
+  - id: context
+    agent: project-context-reader
+    blockedBy: [preflight]
+
+  - id: plan
+    agent: feature-planner
+    blockedBy: [context]
+
+  # TDD: write tests FIRST (different agent than implementer)
+  - id: test-author
+    agent: test-author
+    blockedBy: [plan]
+
+  - id: implement
+    agent: game-builder
+    blockedBy: [test-author]
+
+  # G1–G5: Oracle gates (build, tests, behaviour, feel, visual)
+  - id: build       → agent: build-verifier
+  - id: tests       → agent: test-runner
+  - id: behavior    → agent: behavioral-prober (drives LIVE game via drive_game)
+  - id: feel        → agent: feel-judge (apex, airtime, latency, rise/fall)
+  - id: visual      → agent: animation-vision-judge (multimodal gemini-2.5-flash)
+
+  # Self-recurring fix-loop: bounded loop back to implement with evidence
+  - id: fix-loop
+    type: agent-loop-decision
+    agent: flow-decision
+    loop_target: implement
+    exit_target: report
+    max_iterations: 3
+
+  # Final judge: one honest verdict
+  - id: report
+    agent: game-judge</code></pre>
+
+<p>Eighteen steps, seven custom agents, five oracle gates, and one judge. The whole thing runs as a slash command.</p>
+
+<p>Here's what actually happened. The <strong>vision-preflight</strong> agent fired first — checked that <code>GEMINI_API_KEY</code> was set and that <code>game_frames</code> could reach the live game instance. Both passed in under a second. Without this gate, the rest of the flow would be meaningless — we'd do all the build work only to discover the vision judge can't run. So we check first.</p>
+
+<p>The <strong>project-context-reader</strong> ingested <code>PlayerController.cs</code>, <code>PlayerAnimController.cs</code>, <code>PlayerAnimationLogic.cs</code>, the test files, and the manifest. The <strong>feature-planner</strong> decomposed the 19 failures into four fix groups: (1) vegetation manifest — 146 items with broken <code>prefabPath</code>, (2) animation controller — crouch parameter not plumbed through, (3) jump physics — coyote time, variable height, air control all unimplemented, (4) animation tree — state machine missing entirely.</p>
+
+<p>Then the <strong>game-builder</strong> agent went to work. It read the test failure messages, traced each one to the source, and started implementing. Coyote time: a 100ms grace period after <code>IsOnFloor()</code> becomes false. Variable jump height: scale velocity by key hold duration, 3.5 at tap, 6.5 at 300ms hold. Air control: reduce horizontal velocity by 40% when airborne. Jump phases: minimum 0.15s duration on jump_start before transitioning to airborne. Landing timer: wait full <code>jump_land</code> length + one frame, not <code>length - blend</code>. Animation tree: state machine with <code>jump_start → jump → jump_land</code> states, 0.1s blend transitions.</p>
+
+<p>The <strong>build-verifier</strong> compiled it. <strong>Test-runner</strong> ran the suite. <strong>Behavioral-prober</strong> sent <code>{"jump": true}</code> to the live game and sampled the player body 30 times. <strong>Feel-judge</strong> measured apex height, airtime, and liftoff latency against thresholds. <strong>Animation-vision-judge</strong> grabbed 8 frames at 100ms intervals, composed them into a grid, and had <code>gemini-2.5-flash</code> check for T-poses, foot-slide, frozen frames, and missing transitions.</p>
+
+<p>Any red gate → evidence fed back to the game-builder → fix → re-enter the gate ladder. Bounded to 3 retries per the <code>max_iterations</code> in the loop decision. Any green gate → falls through to the next. All green → the <strong>game-judge</strong> produces the final honest verdict.</p>
+
+<p>This isn't a demo. It's running right now, as I write this, in a Pi session on my machine. The flow is a file at <code>.pi/flows/flows/game-feature.yaml</code>. I trigger it with a slash command. It dispatches sub-agents, runs them through oracle gates, loops on failures, and reports a verdict. That's it.</p>
+
+<h2>The Flow-as-Command Pattern</h2>
+<p>Every flow registers as a slash command. <code>.pi/flows/flows/game-feature.yaml</code> becomes <code>/game-feature</code>. Type it in Pi, describe what you want, hit enter. The flow architect dispatches the DAG, the dashboard shows agent cards with live status, and you watch it happen — or walk away and check the result later.</p>
+
+<p>This is the pattern that makes flows different from scripts. Flows are not hardcoded pipelines you invoke from the terminal. They're slash commands you type in conversation. You describe what you want in natural language, the flow wires it through the agents, and the agents route through the gates. The YAML is the skeleton; the conversation is the context.</p>
+
+<p>A few flows I use daily:</p>
+
+<ul>
+  <li><strong>/game-feature</strong> — "add wall-running" or "fix the 19 red tests from the crash" → research, plan, implement, five gates, judge</li>
+  <li><strong>/review</strong> — "review the last PR" → research → review with code-quality agent</li>
+  <li><strong>/flows:new</strong> — "I need a flow that..." → the Flow Architect reads the agent catalog, selects agents, designs a DAG, and writes the YAML</li>
+</ul>
+
+<p>The slash command is the interface. The flow is the implementation. The oracle gates are the safety net.</p>
+
+<h2>How Agents Communicate (It's Not Chat)</h2>
+<p>A common question: are the agents constantly talking to each other? The answer is no — and that's deliberate. Agents don't chat. They pass structured results through the flow engine bus.</p>
+
+<p>Each agent runs in an isolated session with scoped tools and file access. When agent A finishes, it calls <code>finish({ summary: "...", artifacts: "...", files: "..." })</code>. The flow engine records the result. Agent B receives exactly what it needs via template variables — <code>${{result.A.summary}}</code>, <code>${{result.A.artifacts}}</code>, <code>${{result.A.files}}</code> — wired through the <code>inputs:</code> block in the flow YAML.</p>
+
+<p>This is not agent-to-agent chatter. It's a publish/subscribe bus where the flow engine is the broker. Agents never directly invoke each other. They never read each other's raw output unless the flow explicitly wires it. The DAG's <code>blockedBy</code> edges define who waits for whom; the <code>inputs:</code> block defines what data flows across the edge.</p>
+
+<p>Why not let agents talk freely? Because unstructured chatter is the fastest path to hallucination cascades. Agent A confidently states something wrong, agent B builds on it, agent C compounds it. By the time a human notices, you have three agents collectively wrong about a file that doesn't exist. Structured result-passing with typed outputs (<code>verdict: pass</code>, <code>findings: ["missing import", "type mismatch"]</code>) keeps each agent's output machine-readable and verifiable by the gates.</p>
+
+<p>Pi itself is designed for solo interactive work — you ask, it does, you review. The orchestration layer I wrote on top inverts that pattern. Pi becomes the agent harness; the flow engine becomes the conductor. Agents don't talk to each other. They talk to the engine. The engine talks to the gates. The gates talk to the live game. That's the architecture.</p>
+
+<h2>The Setup: Extensions, Agents, and 15–20 Flows</h2>
+<p>"How did you set this up?" is the question I get most often. Here's the honest answer: there's no dashboard with drag-and-drop. You write three kinds of files.</p>
+
+<p><strong style="color:#f59e0b;">Extensions</strong> are TypeScript tools that agents call. Each is about 300 lines, MIT licensed:</p>
+
+<table style="width:100%;border-collapse:collapse;margin:18px 0;font-size:0.89rem;">
+  <thead>
+    <tr style="text-align:left;border-bottom:1px solid #2a3340;">
+      <th style="padding:8px 12px;color:#c9935a;">Extension</th>
+      <th style="padding:8px 12px;color:#c9935a;">What agents call it for</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr style="border-bottom:1px solid #1c2230;"><td style="padding:7px 12px;color:#e6edf3;"><code>verify_build</code></td><td style="padding:7px 12px;color:#cdd7e2;">Compile the game + sim, return file:line errors</td></tr>
+    <tr style="border-bottom:1px solid #1c2230;"><td style="padding:7px 12px;color:#e6edf3;"><code>drive_game</code></td><td style="padding:7px 12px;color:#cdd7e2;">Send input to the live game, sample player body</td></tr>
+    <tr style="border-bottom:1px solid #1c2230;"><td style="padding:7px 12px;color:#e6edf3;"><code>game_frames</code></td><td style="padding:7px 12px;color:#cdd7e2;">Capture screenshot sequences for vision judging</td></tr>
+    <tr style="border-bottom:1px solid #1c2230;"><td style="padding:7px 12px;color:#e6edf3;"><code>ci_status</code></td><td style="padding:7px 12px;color:#cdd7e2;">Check Gitea Actions pipeline state for a branch</td></tr>
+    <tr style="border-bottom:1px solid #1c2230;"><td style="padding:7px 12px;color:#e6edf3;"><code>ci_logs</code></td><td style="padding:7px 12px;color:#cdd7e2;">Fetch full build log from the most recent failed run</td></tr>
+    <tr style="border-bottom:1px solid #1c2230;"><td style="padding:7px 12px;color:#e6edf3;"><code>ci_wait</code></td><td style="padding:7px 12px;color:#cdd7e2;">Poll every 15 seconds until the pipeline finishes</td></tr>
+    <tr style="border-bottom:1px solid #1c2230;"><td style="padding:7px 12px;color:#e6edf3;"><code>gen_image</code></td><td style="padding:7px 12px;color:#cdd7e2;">Generate brand/marketing images via fal.ai flux-2-pro</td></tr>
+    <tr><td style="padding:7px 12px;color:#e6edf3;"><code>agent_catalog</code></td><td style="padding:7px 12px;color:#cdd7e2;">List available agents with their tools, inputs, outputs</td></tr>
+  </tbody>
+</table>
+
+<p><strong style="color:#f59e0b;">Agents</strong> are Markdown files with YAML frontmatter. Each declares its role, model tier, tools, inputs, and outputs:</p>
+
+<pre><code>---
+name: game-builder
+description: Implements game features in C# (Godot)
+model: @coding
+tools: read, write, edit, bash, verify_build, drive_game
+inputs: [context, plan, build_fail, behaviour_fail, feel_fail, visual_fail]
+outputs: [summary, files]
+---
+You are a game developer. Task: ${{task}}
+Context: ${{input.context}}</code></pre>
+
+<p><strong style="color:#f59e0b;">Flows</strong> are YAML DAGs that wire agents together. I have about <strong>15–20 flows</strong> running across different domains:</p>
+
+<ul>
+  <li><strong>Game dev:</strong> /game-feature, /review, /bug-hunt, /refactor</li>
+  <li><strong>Design:</strong> /concept-art, /sound-design (plans → ElevenLabs generation → judge evaluates with other models)</li>
+  <li><strong>Marketing:</strong> /brand-image, /trailer-clip (Sora 2 video generation → vision judge)</li>
+  <li><strong>Infra:</strong> /ci-fix, /deploy-check, /tstudio-jobs (action runners on AWS Lambda, workspace management)</li>
+  <li><strong>Meta:</strong> A flow that periodically reads and improves the other flows — yes, flows that edit flows</li>
+</ul>
+
+<p>The setup is not a product you install. It's a stack: Pi as the agent harness, custom extensions as the tool layer, markdown agents as the role layer, YAML flows as the orchestration layer. The whole thing lives in <code>.pi/flows/</code>. Version-controlled. CI-tested. Slash-command invoked.</p>
+
+<h2>Structure vs. Freestyle: The Skeleton and the Muscle</h2>
+<p>"Do you define the process with these trees, or do the agents freestyle a bit?" Both — and knowing which is which is the whole game.</p>
+
+<p>The <strong>skeleton is rigid</strong>. The flow YAML defines exactly which agents run, in what order, with what dependencies (<code>blockedBy</code>), what inputs they receive, and which gates they must pass. The DAG is not negotiable. An agent cannot decide to skip the build gate because it feels confident. The build gate runs. Period.</p>
+
+<p>The <strong>muscle is autonomous</strong>. Inside its step, an agent has full agency. The game-builder decides which files to read, which approach to take, which code to write. It discovers project structure with <code>grep</code> and <code>find</code>. It runs the test suite to understand failures. It writes the fix and verifies it compiles. No human tells it "edit line 247 of PlayerController.cs." The agent figures that out.</p>
+
+<p>Think of it like a company: the org chart (DAG) defines reporting lines and handoff points. The people (agents) do the actual work their own way. The compliance department (gates) checks everything before it ships. The CEO (judge) signs off.</p>
+
+<p>This balance is why the system works at all. Too much structure → agents can't adapt to unexpected situations. Too much freestyle → agents hallucinate, skip checks, ship broken code. The skeleton guarantees the right things happen in the right order. The muscle handles the messy reality of actual code.</p>
+
+<p>And when a flow's skeleton is wrong? The meta-flow improves it. It reads flow performance data, identifies bottlenecks ("the feel gate keeps failing because the game-builder doesn't know the jump velocity threshold"), edits the YAML to wire that threshold into the builder's inputs, and commits the change. Flows that improve flows. That's the endgame.</p>
+
+<h2>Model Strategy: DeepSeek for Code, Gemini for Vision</h2>
+<p>"Which DeepSeek model?" The short answer: <strong>DeepSeek V4</strong> for coding-heavy agents, <strong>DeepSeek V4 Flash</strong> for fast routing decisions. The long answer: model selection is not one-size-fits-all.</p>
+
+<p>Flows use <strong>role-based model tiers</strong> — each agent declares a tier (<code>@coding</code>, <code>@planning</code>, <code>@research</code>, <code>@fast</code>, <code>@compact</code>, <code>@vision</code>), and the engine resolves it to a concrete model at dispatch time. This means you can swap models globally without touching any agent or flow file.</p>
+
+<table style="width:100%;border-collapse:collapse;margin:18px 0;font-size:0.89rem;">
+  <thead>
+    <tr style="text-align:left;border-bottom:1px solid #2a3340;">
+      <th style="padding:8px 12px;color:#c9935a;">Tier</th>
+      <th style="padding:8px 12px;color:#c9935a;">Model</th>
+      <th style="padding:8px 12px;color:#c9935a;">Used for</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr style="border-bottom:1px solid #1c2230;"><td style="padding:7px 12px;color:#e6edf3;"><code>@coding</code></td><td style="padding:7px 12px;color:#38bdf8;">deepseek/deepseek-v4</td><td style="padding:7px 12px;color:#cdd7e2;">Reading, writing, editing code — the game-builder, fixer, test-author</td></tr>
+    <tr style="border-bottom:1px solid #1c2230;"><td style="padding:7px 12px;color:#e6edf3;"><code>@planning</code></td><td style="padding:7px 12px;color:#38bdf8;">deepseek/deepseek-v4</td><td style="padding:7px 12px;color:#cdd7e2;">Flow architect, feature planner — decomposing tasks, designing DAGs</td></tr>
+    <tr style="border-bottom:1px solid #1c2230;"><td style="padding:7px 12px;color:#e6edf3;"><code>@fast</code></td><td style="padding:7px 12px;color:#38bdf8;">deepseek/deepseek-v4-flash</td><td style="padding:7px 12px;color:#cdd7e2;">Routing decisions — gate pass/fail, fork choices, loop exit checks</td></tr>
+    <tr style="border-bottom:1px solid #1c2230;"><td style="padding:7px 12px;color:#e6edf3;"><code>@research</code></td><td style="padding:7px 12px;color:#38bdf8;">deepseek/deepseek-v4</td><td style="padding:7px 12px;color:#cdd7e2;">Codebase investigation, reading project docs, pattern analysis</td></tr>
+    <tr style="border-bottom:1px solid #1c2230;"><td style="padding:7px 12px;color:#e6edf3;"><code>@vision</code></td><td style="padding:7px 12px;color:#38bdf8;">google/gemini-2.5-flash</td><td style="padding:7px 12px;color:#cdd7e2;">Multimodal frame judging — T-pose detection, animation clip verification</td></tr>
+    <tr><td style="padding:7px 12px;color:#e6edf3;"><code>@compact</code></td><td style="padding:7px 12px;color:#38bdf8;">deepseek/deepseek-v4-flash</td><td style="padding:7px 12px;color:#cdd7e2;">Summarisation, report generation, lightweight post-processing</td></tr>
+  </tbody>
+</table>
+
+<p>Why DeepSeek? Two reasons. First, <strong>it's free</strong> — the coding tier runs on DeepSeek's API with no usage limits, which matters when your game-builder agent is reading 800-line files and writing 200-line diffs ten times a session. Second, <strong>it's genuinely good at C# and Godot</strong> — I've had it write a full lighting module for our Godot fork by reading Unity API docs and adapting patterns. No agent had pulled that off before.</p>
+
+<p>Vision is the exception. DeepSeek can't do multimodal, so the visual gate uses <strong>Gemini 2.5 Flash</strong>. It's fast (under 2 seconds per frame grid), cheap, and catches the things that matter: T-poses, foot-slide, frozen animations, missing transitions. The vision preflight gate checks the Gemini API key is set before any build work starts — if it's missing, the entire flow hard-stops. Vision is never silently skipped.</p>
+
+<p>The key insight: <strong>different work needs different brains</strong>. Code writing needs a model that understands language semantics and type systems. Vision judging needs a model that sees pixels and understands motion. Routing decisions need a model that's fast and decisive, not one that overthinks. The role-tier system means you configure this once, at the model level, and every agent that declares <code>model: @coding</code> gets the right brain automatically.</p>
+
+<hr>
+
 <p>The oracle tools — <code>verify_build</code>, <code>drive_game</code>, <code>game_frames</code> — are the durable assets. About 300 lines of TypeScript each, MIT licensed, reusable in any Pi project. The flow engine composes them; the agents route through them.</p>
 
 <p>A year ago we had a supervisor written in 1,050 lines of hardcoded TypeScript that did one thing: verify agent output compiled and passed tests. We deleted it. The same verification now runs as a composable flow with five gates, live-game testing, and CI integration. Sometimes the best architecture decision is knowing what to delete.</p>

Extension	What agents call it for
`verify_build`	Compile the game + sim, return file:line errors
`drive_game`	Send input to the live game, sample player body
`game_frames`	Capture screenshot sequences for vision judging
`ci_status`	Check Gitea Actions pipeline state for a branch
`ci_logs`	Fetch full build log from the most recent failed run
`ci_wait`	Poll every 15 seconds until the pipeline finishes
`gen_image`	Generate brand/marketing images via fal.ai flux-2-pro
`agent_catalog`	List available agents with their tools, inputs, outputs
Tier	Model	Used for
`@coding`	deepseek/deepseek-v4	Reading, writing, editing code — the game-builder, fixer, test-author
`@planning`	deepseek/deepseek-v4	Flow architect, feature planner — decomposing tasks, designing DAGs
`@fast`	deepseek/deepseek-v4-flash	Routing decisions — gate pass/fail, fork choices, loop exit checks
`@research`	deepseek/deepseek-v4	Codebase investigation, reading project docs, pattern analysis
`@vision`	google/gemini-2.5-flash	Multimodal frame judging — T-pose detection, animation clip verification
`@compact`	deepseek/deepseek-v4-flash	Summarisation, report generation, lightweight post-processing