Files
blog/pi-flow-native-brain.html
T

787 lines
47 KiB
HTML
Raw Normal View History

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>How Pi Agents Build, Test, and Ship Game Code with Oracle-Backed Flows — Tinqs Blog</title>
<meta name="description" content="We use Pi flows with oracle-backed gates to make agents compile, test, drive the live game, measure feel, fix CI failures, and ship green PRs — all autonomously.">
<meta name="robots" content="index, follow">
<link rel="canonical" href="https://www.tinqs.com/blog/pi-flow-native-brain">
<meta property="og:type" content="article">
<meta property="og:url" content="https://www.tinqs.com/blog/pi-flow-native-brain">
<meta property="og:title" content="How Pi Agents Build, Test, and Ship Game Code with Oracle-Backed Flows">
<meta property="og:description" content="Pi flows + oracle-backed gates: agents that compile, test, drive the game, measure feel, fix CI, and ship green PRs.">
<meta property="og:image" content="https://www.tinqs.com/img/og-cover.jpg">
<meta name="twitter:card" content="summary_large_image">
<meta name="twitter:title" content="How Pi Agents Build, Test, and Ship Game Code with Oracle-Backed Flows">
<meta name="twitter:description" content="Pi flows + oracle-backed gates: agents that compile, test, drive the game, measure feel, fix CI, and ship green PRs.">
<meta name="twitter:image" content="https://www.tinqs.com/img/og-cover.jpg">
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "BlogPosting",
"headline": "How Pi Agents Build, Test, and Ship Game Code with Oracle-Backed Flows",
"datePublished": "2026-06-04",
"author": {
"@type": "Person",
"name": "Ozan Bozkurt"
},
"publisher": {
"@type": "Organization",
"name": "Tinqs Limited",
"url": "https://www.tinqs.com"
},
"description": "We use Pi flows with oracle-backed gates to make agents compile, test, drive the live game, measure feel, fix CI failures, and ship green PRs — all autonomously."
}
</script>
<style>
/* ── Self-contained post styles (Studio provides site chrome) ── */
:root {
--c-accent: #c9935a;
--c-accent-l: #d4a87c;
--c-bg: #0d1117;
--c-text: #e6edf3;
--c-muted: #9aa7b4;
--c-border: #2a3340;
--c-blue: #38bdf8;
--c-purple: #a855f7;
--c-gold: #f59e0b;
--c-code-bg: #1c2230;
--c-pre-bg: #0a0e14;
}
*, *::before, *::after { box-sizing: border-box; }
body {
margin: 0;
padding: 0;
background: transparent;
color: var(--c-text);
font-family: system-ui, -apple-system, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif;
line-height: 1.6;
-webkit-font-smoothing: antialiased;
}
/* ── Post container ── */
.post {
max-width: 720px;
margin: 0 auto;
padding: 40px 24px 48px;
}
/* ── Back link ── */
.post__back {
color: var(--c-blue);
text-decoration: none;
font-size: 0.9rem;
display: inline-block;
margin-bottom: 24px;
}
.post__back:hover { color: var(--c-purple); }
/* ── Gradient title ── */
.post__title {
background: linear-gradient(90deg, #c9935a, #f59e0b 40%, #38bdf8);
-webkit-background-clip: text;
background-clip: text;
color: transparent;
font-weight: 800;
font-size: 2.2rem;
line-height: 1.25;
margin: 0 0 16px;
}
/* ── Date pill ── */
.post__date {
display: inline-block;
font-family: ui-monospace, 'SF Mono', 'Cascadia Code', Consolas, monospace;
font-size: 0.72rem;
letter-spacing: 0.22em;
text-transform: uppercase;
color: var(--c-blue);
border: 1px solid rgba(147, 140, 129, 0.25);
border-radius: 999px;
padding: 4px 14px;
margin-bottom: 16px;
}
/* ── Lead ── */
.post__lead {
color: var(--c-muted);
font-size: 1.08rem;
line-height: 1.7;
}
/* ── Body ── */
.post__body { font-size: 1rem; line-height: 1.7; }
.post__body p { margin: 14px 0; }
.post__body h2 {
font-size: 1.7rem;
margin: 54px 0 6px;
padding-left: 16px;
border-left: 4px solid var(--c-accent);
line-height: 1.3;
}
.post__body h3 {
color: var(--c-purple);
font-size: 1.18rem;
margin: 30px 0 4px;
}
.post__body h4, .post__body h5, .post__body h6 {
margin: 20px 0 4px;
}
/* ── Inline code ── */
.post__body code {
font-family: ui-monospace, 'SF Mono', 'Cascadia Code', Consolas, monospace;
font-size: 0.86em;
background: var(--c-code-bg);
color: #9fe6c0;
padding: 2px 6px;
border-radius: 5px;
border: 1px solid var(--c-border);
}
/* ── Code blocks ── */
.post__body pre {
background: var(--c-pre-bg);
border: 1px solid var(--c-border);
border-radius: 10px;
padding: 16px 18px;
overflow-x: auto;
margin: 14px 0;
font-family: ui-monospace, 'SF Mono', 'Cascadia Code', Consolas, monospace;
font-size: 0.85rem;
line-height: 1.55;
color: var(--c-text);
}
.post__body pre code {
background: transparent;
padding: 0;
border: none;
font-size: inherit;
color: inherit;
border-radius: 0;
}
/* ── Blockquote ── */
.post__body blockquote {
background: rgba(245, 158, 11, 0.08);
border: 1px solid rgba(245, 158, 11, 0.25);
border-left: 4px solid var(--c-gold);
border-radius: 0 12px 12px 0;
padding: 16px 18px;
margin: 18px 0;
color: #f4e3c4;
font-size: 0.94rem;
}
/* ── Links ── */
.post__body a { color: var(--c-blue); }
.post__body a:hover { color: var(--c-purple); }
/* ── Strong ── */
.post__body strong { color: var(--c-gold); }
/* ── HR ── */
.post__body hr {
border: none;
border-top: 1px solid var(--c-border);
margin: 32px 0;
}
/* ── Figures ── */
.post__body figure { margin: 20px 0; }
.post__body figure img {
max-width: 100%;
border-radius: 12px;
border: 1px solid var(--c-border);
}
.post__body figcaption {
color: var(--c-muted);
font-size: 0.85rem;
margin-top: 6px;
}
/* ── Lists ── */
.post__body ul, .post__body ol { padding-left: 1.5em; margin: 10px 0; }
.post__body li { margin: 4px 0; }
/* ── Author ── */
.post__author {
display: flex;
align-items: center;
gap: 14px;
margin-top: 48px;
padding-top: 24px;
border-top: 1px solid var(--c-border);
}
.post__author-avatar {
width: 48px;
height: 48px;
border-radius: 50%;
background: var(--c-accent);
color: var(--c-bg);
display: flex;
align-items: center;
justify-content: center;
font-weight: 700;
font-size: 0.85rem;
flex-shrink: 0;
}
.post__author-info {
font-size: 0.85rem;
color: var(--c-muted);
line-height: 1.4;
}
.post__author-name {
color: var(--c-text);
font-weight: 600;
}
/* ── Analogy callout box ── */
.post__body .callout {
background: linear-gradient(135deg, rgba(56,189,248,0.06), rgba(168,85,247,0.06));
border: 1px solid rgba(56,189,248,0.2);
border-left: 4px solid #38bdf8;
border-radius: 0 12px 12px 0;
padding: 18px 20px;
margin: 22px 0;
}
.post__body .callout--amber {
background: linear-gradient(135deg, rgba(245,158,11,0.07), rgba(201,147,90,0.05));
border-color: rgba(245,158,11,0.25);
border-left-color: #f59e0b;
}
.post__body .callout--purple {
background: linear-gradient(135deg, rgba(168,85,247,0.07), rgba(56,189,248,0.04));
border-color: rgba(168,85,247,0.25);
border-left-color: #a855f7;
}
.post__body .callout__kicker {
font-family: ui-monospace, 'SF Mono', 'Cascadia Code', Consolas, monospace;
font-size: 0.7rem;
letter-spacing: 0.18em;
text-transform: uppercase;
color: #38bdf8;
margin-bottom: 8px;
display: block;
}
.post__body .callout--amber .callout__kicker { color: #f59e0b; }
.post__body .callout--purple .callout__kicker { color: #a855f7; }
.post__body .callout p { margin: 6px 0 0; color: #cdd7e2; }
.post__body .callout p + p { margin-top: 10px; }
/* ── Gate badge pills ── */
.gate {
display: inline-block;
font-family: ui-monospace, 'SF Mono', 'Cascadia Code', Consolas, monospace;
font-size: 0.75rem;
font-weight: 600;
padding: 3px 10px;
border-radius: 5px;
margin-right: 4px;
}
.gate--build { background: rgba(56,189,248,0.12); color: #38bdf8; border: 1px solid rgba(56,189,248,0.3); }
.gate--test { background: rgba(52,211,153,0.12); color: #34d399; border: 1px solid rgba(52,211,153,0.3); }
.gate--behave { background: rgba(168,85,247,0.12); color: #a855f7; border: 1px solid rgba(168,85,247,0.3); }
.gate--feel { background: rgba(245,158,11,0.12); color: #f59e0b; border: 1px solid rgba(245,158,11,0.3); }
.gate--visual { background: rgba(201,147,90,0.12); color: #c9935a; border: 1px solid rgba(201,147,90,0.3); }
/* ── Section divider accent ── */
.post__body hr { border-color: #2a3340; margin: 36px 0; }
.post__body hr.accent {
border: none;
height: 2px;
background: linear-gradient(90deg, transparent, #38bdf8 20%, #a855f7 50%, #f59e0b 80%, transparent);
margin: 40px 0;
}
/* ── Two-column kitchen comparison ── */
.kitchen-grid {
display: grid;
grid-template-columns: 1fr 1fr;
gap: 16px;
margin: 18px 0;
}
@media (max-width: 640px) { .kitchen-grid { grid-template-columns: 1fr; } }
.kitchen-col {
background: #0c1119;
border: 1px solid #2a3340;
border-radius: 10px;
padding: 16px 18px;
}
.kitchen-col__title {
font-family: ui-monospace, 'SF Mono', 'Cascadia Code', Consolas, monospace;
font-size: 0.68rem;
letter-spacing: 0.16em;
text-transform: uppercase;
margin-bottom: 10px;
display: block;
}
.kitchen-col__title--kitchen { color: #f59e0b; }
.kitchen-col__title--reality { color: #38bdf8; }
.kitchen-col p { font-size: 0.9rem; color: #9aa7b4; margin: 4px 0; }
.kitchen-col p strong { color: #e6edf3; }
/* ── Table styles ── */
.post__body table {
width: 100%;
border-collapse: collapse;
margin: 18px 0;
font-size: 0.92rem;
}
.post__body th {
text-align: left;
border-bottom: 1px solid var(--c-border);
padding: 10px 12px;
color: var(--c-accent);
font-weight: 600;
}
.post__body td {
padding: 9px 12px;
border-bottom: 1px solid #1c2230;
vertical-align: top;
}
</style>
</head>
<body>
<article class="post">
<a href="/blog/" class="post__back">&larr; All Posts</a>
<span class="post__date">4 June 2026</span>
<h1 class="post__title">How Pi Agents Build, Test, and Ship Code with Oracle-Backed Flows</h1>
<p class="post__lead">Think of a restaurant kitchen during dinner rush. The head chef doesn't cook every dish. She runs the pass — each plate gets inspected before it leaves. One cook handles sauces, another pastry, another the grill. The expediter calls orders, coordinates timing, makes sure table 4's mains don't arrive before table 2's starters. A dish comes back? It goes straight to the station that messed up, with a ticket explaining exactly what's wrong. That kitchen runs on flows. So does our game engine.</p>
<div class="post__body">
<div class="callout">
<span class="callout__kicker">The Kitchen ↔ Flows Analogy</span>
<p><strong>The kitchen</strong> = Pi (the agent harness). <strong>The recipe</strong> = a flow YAML (the DAG). <strong>The line cooks</strong> = agents (each with a station and tools). <strong>The pass</strong> = the flow engine (routes finished work). <strong>The head chef's inspection</strong> = the five gates. <strong>The order ticket</strong> = a slash command. <strong>"Send it back!"</strong> = the fix loop.</p>
</div>
<h2>What Happens When You Type a Slash Command</h2>
<p>You type <code>/game-feature add a double-jump with cooldown</code> and hit enter. The ticket hits the kitchen. What follows is not one agent doing everything — it's a brigade running their stations.</p>
<figure style="margin:28px 0;">
<svg viewBox="0 0 920 350" role="img" aria-label="The verify-heavy flow: context, plan, implement, five gates, a Reflexion loop, and one judge" style="width:100%;height:auto;display:block;background:#0a0e14;border:1px solid #2a3340;border-radius:12px;font-family:'IBM Plex Sans',system-ui,sans-serif;">
<defs>
<marker id="ah" markerWidth="10" markerHeight="10" refX="7" refY="3.2" orient="auto"><path d="M0,0 L7,3.2 L0,6.4 Z" fill="#5b6b7d"/></marker>
<marker id="ahA" markerWidth="10" markerHeight="10" refX="7" refY="3.2" orient="auto"><path d="M0,0 L7,3.2 L0,6.4 Z" fill="#f59e0b"/></marker>
</defs>
<rect x="40" y="40" width="140" height="46" rx="9" fill="#121821" stroke="#2a3340"/>
<text x="110" y="68" text-anchor="middle" fill="#cdd7e2" font-size="15">Context</text>
<rect x="210" y="40" width="140" height="46" rx="9" fill="#121821" stroke="#2a3340"/>
<text x="280" y="68" text-anchor="middle" fill="#cdd7e2" font-size="15">Plan</text>
<rect x="400" y="40" width="150" height="46" rx="9" fill="#15202e" stroke="#3a4656"/>
<text x="475" y="68" text-anchor="middle" fill="#e6edf3" font-size="15">Implement</text>
<line x1="180" y1="63" x2="206" y2="63" stroke="#5b6b7d" stroke-width="1.6" marker-end="url(#ah)"/>
<line x1="350" y1="63" x2="396" y2="63" stroke="#5b6b7d" stroke-width="1.6" marker-end="url(#ah)"/>
<rect x="40" y="150" width="840" height="82" rx="12" fill="#0c1119" stroke="#2a3340"/>
<text x="56" y="171" fill="#6b7a8d" font-size="11" letter-spacing="1.4">VERIFY-HEAVY GATES — most compute is spent checking, not writing</text>
<rect x="56" y="180" width="148" height="42" rx="8" fill="#10141c" stroke="#38bdf8" stroke-opacity="0.55"/>
<text x="130" y="206" text-anchor="middle" fill="#38bdf8" font-size="13.5">G1 · Build</text>
<rect x="222" y="180" width="148" height="42" rx="8" fill="#10141c" stroke="#34d399" stroke-opacity="0.55"/>
<text x="296" y="206" text-anchor="middle" fill="#9fe6c0" font-size="13.5">G2 · Tests</text>
<rect x="388" y="180" width="148" height="42" rx="8" fill="#10141c" stroke="#a855f7" stroke-opacity="0.55"/>
<text x="462" y="206" text-anchor="middle" fill="#c4a0f7" font-size="13.5">G3 · Behaviour</text>
<rect x="554" y="180" width="148" height="42" rx="8" fill="#10141c" stroke="#f59e0b" stroke-opacity="0.55"/>
<text x="628" y="206" text-anchor="middle" fill="#f5b44b" font-size="13.5">G4 · Feel</text>
<rect x="720" y="180" width="148" height="42" rx="8" fill="#10141c" stroke="#c9935a" stroke-opacity="0.55"/>
<text x="794" y="206" text-anchor="middle" fill="#d9ac7b" font-size="13.5">G5 · Visual</text>
<line x1="475" y1="86" x2="475" y2="148" stroke="#5b6b7d" stroke-width="1.6" marker-end="url(#ah)"/>
<line x1="460" y1="232" x2="460" y2="276" stroke="#5b6b7d" stroke-width="1.6" marker-end="url(#ah)"/>
<text x="472" y="258" fill="#6b7a8d" font-size="11">all green ⇒ done · any fail ⇒ report</text>
<rect x="380" y="278" width="160" height="46" rx="9" fill="#1b1505" stroke="#c9935a"/>
<text x="460" y="306" text-anchor="middle" fill="#f3d6a0" font-size="15">Judge — honest verdict</text>
<path d="M820,150 C 908,96 716,50 556,61" fill="none" stroke="#f59e0b" stroke-width="1.8" stroke-dasharray="6 5" marker-end="url(#ahA)"/>
<text x="694" y="96" fill="#f59e0b" font-size="12.5">Reflexion · fix & retry ≤ 3</text>
</svg>
<figcaption style="color:#9aa7b4;font-size:0.85rem;margin-top:8px;">A real failure loops back to <em>implement</em> with gate evidence (bounded to three tries); anything green falls through to the judge.</figcaption>
</figure>
<h2>The Five Gates: What the Head Chef Checks</h2>
<p>In a kitchen, the head chef doesn't trust — she verifies. Every plate hits the pass and gets inspected. Our flows have the same instinct. Each gate is a sub-agent with one job, one tool, and absolute veto power.</p>
<div class="kitchen-grid">
<div class="kitchen-col">
<span class="kitchen-col__title kitchen-col__title--kitchen">In the Kitchen</span>
<p><strong>Check the base.</strong> Is the protein cooked through? If the chicken is raw, the whole plate stops here. Nothing else matters.</p>
</div>
<div class="kitchen-col">
<span class="kitchen-col__title kitchen-col__title--reality">In the Flow</span>
<p><span class="gate gate--build">G1 · Build</span> Runs <code>dotnet build</code>. PASS/FAIL with file:line errors. Won't compile? Nothing proceeds.</p>
</div>
<div class="kitchen-col">
<span class="kitchen-col__title kitchen-col__title--kitchen">In the Kitchen</span>
<p><strong>Taste the sauce.</strong> Seasoning right? Acid balanced? The dish might look perfect but taste flat.</p>
</div>
<div class="kitchen-col">
<span class="kitchen-col__title kitchen-col__title--reality">In the Flow</span>
<p><span class="gate gate--test">G2 · Tests</span> Runs <code>dotnet test</code>. Parses which assertions broke. Fixed code that passes build but fails logic gets caught here.</p>
</div>
<div class="kitchen-col">
<span class="kitchen-col__title kitchen-col__title--kitchen">In the Kitchen</span>
<p><strong>Does it work?</strong> Pick it up. Does the sauce hold? Does the plating survive the walk to table 6?</p>
</div>
<div class="kitchen-col">
<span class="kitchen-col__title kitchen-col__title--reality">In the Flow</span>
<p><span class="gate gate--behave">G3 · Behaviour</span> Sends <code>{"jump":true}</code> to the LIVE game. Samples the player body 30 times at 50ms. Did the character actually jump? Double-jump fire? This is the ground-truth oracle — what makes game dev fundamentally different from web dev.</p>
</div>
<div class="kitchen-col">
<span class="kitchen-col__title kitchen-col__title--kitchen">In the Kitchen</span>
<p><strong>How does it feel?</strong> The steak is cooked but chewy. The sauce is seasoned but gloopy. Edible ≠ good.</p>
</div>
<div class="kitchen-col">
<span class="kitchen-col__title kitchen-col__title--reality">In the Flow</span>
<p><span class="gate gate--feel">G4 · Feel</span> Measures apex height, airtime, liftoff latency, rise/fall asymmetry, landing settle. Numeric thresholds. A jump that works but takes 400ms to lift off fails. Behaviour says it happened. Feel says it felt good.</p>
</div>
<div class="kitchen-col">
<span class="kitchen-col__title kitchen-col__title--kitchen">In the Kitchen</span>
<p><strong>How does it look?</strong> Is the garnish wilting? Sauce smeared? Does it match the menu photo?</p>
</div>
<div class="kitchen-col">
<span class="kitchen-col__title kitchen-col__title--reality">In the Flow</span>
<p><span class="gate gate--visual">G5 · Visual</span> Captures 8 frames at 100ms intervals, grids them, feeds to <code>gemini-2.5-flash</code>. Checks: T-pose? Foot-slide? Frozen animation? Wrong clip? Missing transitions?</p>
</div>
</div>
<div class="callout callout--amber">
<span class="callout__kicker">The Loop</span>
<p>Any red gate → evidence sent back to the cook → fix → re-enter the inspection line. Three chances max, then the head chef escalates to a human. This is the same instinct that makes a good kitchen work: catch it early, send it back with a clear note, give them a chance to fix it, but don't let the same dish circle the pass forever.</p>
</div>
<h2>Composability: Adding a New Station</h2>
<p>A kitchen doesn't redesign the whole line when they add a new dish. They add a station. Same in flows. Started with three gates — build, test, vision. Behaviour and feel came later, each a single-file extension. Gates aren't hardcoded. They're sub-agents declared in YAML. Want a linting gate? Add a sub-agent with a linter. Security scan? Same pattern. Asset bundle size check? Write the tool, declare the agent, wire it in.</p>
<div class="callout callout--purple">
<span class="callout__kicker">Self-Improving Kitchen</span>
<p>Agents can extend the flow at runtime. If the behaviour gate keeps failing because the game window isn't focused, an agent notices the pattern and inserts a pre-condition gate that checks window focus. The flow engine handles routing; the agents handle decisions. This is what makes flows fundamentally different from a script — the pipeline isn't fixed at compile time. It's a graph that agents read, understand, and modify while they run.</p>
</div>
<hr class="accent">
<h2>The CI Loop: The Dish That Came Back After It Left</h2>
<p>Gates inspect plates at the pass. But what about after the plate leaves the kitchen? What about the customer who finds a hair in their soup after it's been served?</p>
<p>Most coding agents don't care. They write code, push, walk away. A human discovers the broken CI build an hour later. That's the equivalent of a cook plating a dish, sending it out, and never checking if the diner is still alive.</p>
<p>We closed this loop with three tools — the waiter who brings the plate back:</p>
<ul>
<li><strong>ci_wait</strong> — stands by the table, polls every 15 seconds until the diner finishes</li>
<li><strong>ci_status</strong> — checks: did they enjoy it or send it back?</li>
<li><strong>ci_logs</strong> — reads the complaint card: exactly what was wrong</li>
</ul>
<p>The agent pushes, calls <code>ci_wait</code>. If CI fails, it reads <code>ci_logs</code>, fixes the exact error, pushes again. DeepSeek V4 parses compiler errors the way a cook reads a ticket: "missing import" = forgot the salt, "type mismatch" = wrong pan size, "module not found" = ingredient not in stock. Pattern-matched and fixed in seconds.</p>
<div class="callout callout--amber">
<span class="callout__kicker">Real Example</span>
<p>Adding a health check endpoint to a Go service. Agent wrote the handler and test, pushed. CI failed — the test imported a package that didn't exist on the runner. Agent read <code>ci_logs</code>, saw <code>go: module not found</code>, added the missing <code>go.mod</code> replace directive, pushed again. CI passed. PR opened. <strong>4 minutes. $0.06.</strong></p>
</div>
<p>Three safeguards prevent the kitchen grinding to a halt: <strong>retry limit</strong> (3, same dish doesn't circle forever), <strong>diff budget</strong> (retries only touch files already on the ticket), and <strong>hallucination detection</strong> (if the cook claims the customer loved it without actually asking the waiter, they get corrected).</p>
<h2>The Numbers</h2>
<p>Over three weeks of running the orchestrator:</p>
<ul>
<li><strong>87 tasks</strong> completed end-to-end</li>
<li><strong>23 tasks</strong> needed at least one CI retry (26%)</li>
<li><strong>19 of those 23</strong> resolved on the first retry</li>
<li><strong>4 tasks</strong> hit the retry limit and escalated to a human</li>
<li><strong>0 tasks</strong> produced a merged PR that later broke something else</li>
</ul>
<p>The 26% retry rate matches what you'd see from a junior developer. The difference: the agent fixes it in 30 seconds.</p>
<h2>The Architecture</h2>
<table style="width:100%;border-collapse:collapse;margin:18px 0;font-size:0.92rem;">
<thead>
<tr style="text-align:left;border-bottom:1px solid #2a3340;">
<th style="padding:10px 12px;color:#c9935a;font-weight:600;">Layer</th>
<th style="padding:10px 12px;color:#c9935a;font-weight:600;">What</th>
<th style="padding:10px 12px;color:#c9935a;font-weight:600;">How</th>
</tr>
</thead>
<tbody>
<tr style="border-bottom:1px solid #1c2230;"><td style="padding:9px 12px;color:#e6edf3;vertical-align:top;"><strong style="color:#f59e0b;">Flow engine</strong></td><td style="padding:9px 12px;color:#cdd7e2;vertical-align:top;">pi-flows orchestrator</td><td style="padding:9px 12px;color:#9aa7b4;vertical-align:top;">Composes agents, gates and decision points</td></tr>
<tr style="border-bottom:1px solid #1c2230;"><td style="padding:9px 12px;color:#e6edf3;vertical-align:top;"><strong style="color:#f59e0b;">Oracle gates</strong></td><td style="padding:9px 12px;color:#cdd7e2;vertical-align:top;">verify_build, drive_game, game_frames</td><td style="padding:9px 12px;color:#9aa7b4;vertical-align:top;">Return structured PASS/FAIL with evidence</td></tr>
<tr style="border-bottom:1px solid #1c2230;"><td style="padding:9px 12px;color:#e6edf3;vertical-align:top;"><strong style="color:#f59e0b;">Sub-agents</strong></td><td style="padding:9px 12px;color:#cdd7e2;vertical-align:top;">G1 build · G2 tests · G3 behaviour · G4 feel · G5 visual</td><td style="padding:9px 12px;color:#9aa7b4;vertical-align:top;">Role-split, each with its own toolset</td></tr>
<tr style="border-bottom:1px solid #1c2230;"><td style="padding:9px 12px;color:#e6edf3;vertical-align:top;"><strong style="color:#f59e0b;">CI loop</strong></td><td style="padding:9px 12px;color:#cdd7e2;vertical-align:top;">tinqs-ci extension</td><td style="padding:9px 12px;color:#9aa7b4;vertical-align:top;">ci_status, ci_logs, ci_wait — polls Gitea Actions, reads logs, retries</td></tr>
<tr style="border-bottom:1px solid #1c2230;"><td style="padding:9px 12px;color:#e6edf3;vertical-align:top;"><strong style="color:#f59e0b;">Decision</strong></td><td style="padding:9px 12px;color:#cdd7e2;vertical-align:top;">Agent-loop Reflexion</td><td style="padding:9px 12px;color:#9aa7b4;vertical-align:top;">Self-reflect on failures, retry (≤3) or escalate</td></tr>
<tr><td style="padding:9px 12px;color:#e6edf3;vertical-align:top;"><strong style="color:#f59e0b;">Visualization</strong></td><td style="padding:9px 12px;color:#cdd7e2;vertical-align:top;">FlowDashboard</td><td style="padding:9px 12px;color:#9aa7b4;vertical-align:top;">Real-time pipeline state</td></tr>
</tbody>
</table>
<hr>
<h2>Three Kitchens, One Morning</h2>
<p>This morning, I ran three flows. Each is a different kitchen, a different brigade, a different dish. Here's what actually happened — real flow logs, real verdicts, nothing staged.</p>
<div class="callout callout--amber">
<span class="callout__kicker">Flow 1 · 4 June, 18:32</span>
<p><strong>/deep-implement</strong> — "Build the tinqs-gitea-read extension: list_org_repos, read_repo_file, list_repo_dir, search_repos." Nine steps, 14 minutes. Verdict: <span class="gate gate--test">PASS</span>. 31/31 vitest tests green, zero new TypeScript errors, session-level caching, path traversal protection. Every <code>execute()</code> body fully wired — no stubs, no placeholders. Like a saucier who doesn't just list ingredients but actually makes the sauce.</p>
</div>
<div class="callout callout--purple">
<span class="callout__kicker">Flow 2 · 4 June, 19:04</span>
<p><strong>/game-feature</strong> — "Make the player jump." Build: <span class="gate gate--build">PASS</span>. Tests: <span class="gate gate--test">PASS</span>. Behaviour/Feel/Visual: <span style="color:#f59e0b;">NOT RUN</span> — no live game instance was reachable. The flow didn't silently skip the visual gate. It <strong>hard-stopped</strong> and reported honestly: "FAIL — the feature has not been verified in-game." This is the kitchen saying: "The dish is cooked, but nobody tasted it. I'm not sending it out."</p>
</div>
<div class="callout callout--amber">
<span class="callout__kicker">Flow 3 · 4 June, 19:49</span>
<p><strong>/cto-infra</strong> — "Synthesize cost, stability, and VCS research into an AWS architecture decision." Four research streams fed into one CTO agent. Output: 14 requirements mapped to specific decisions, cost-vs-stability tradeoffs resolved with dollar figures, EC2+EBS over Fargate+EFS, RDS Multi-AZ mandatory, S3+CloudFront for LFS. Like an executive chef reading four menu proposals, reconciling them into one service, and pricing every plate.</p>
</div>
<hr class="accent">
<h2>Dinner Rush Recovery: The Crash That Interrupted Service</h2>
<p>Earlier today, a machine crash cut off a flow mid-stream — the kitchen lost power during dinner rush. Nineteen tests were left red. Contracts written, implementation half-done. Half-cooked dishes on every station.</p>
<p>I typed one slash command — the expediter reassembled the brigade:</p>
<pre><code>/game-feature Finish the leftover jump & locomotion animation work — make the 19 FAILING tests GREEN.</code></pre>
<p>What happened next: the team picked up exactly where the crash left off. Here's the recipe — the exact YAML that runs in production:</p>
<pre><code>name: game-feature
description: Build a PLAYABLE game feature and prove it in the LIVE game.
task_required: true
steps:
# G0: Pre-flight — validate vision CAN run before any build work
- id: preflight
agent: vision-preflight
task: Check GEMINI_API_KEY is set AND game_frames reaches a live instance.
If EITHER fails, STOP — vision is not optional.
# Context + plan
- id: context
agent: project-context-reader
blockedBy: [preflight]
- id: plan
agent: feature-planner
blockedBy: [context]
# TDD: write tests FIRST (different agent than implementer)
- id: test-author
agent: test-author
blockedBy: [plan]
- id: implement
agent: game-builder
blockedBy: [test-author]
# G1G5: Oracle gates (build, tests, behaviour, feel, visual)
- id: build → agent: build-verifier
- id: tests → agent: test-runner
- id: behavior → agent: behavioral-prober (drives LIVE game via drive_game)
- id: feel → agent: feel-judge (apex, airtime, latency, rise/fall)
- id: visual → agent: animation-vision-judge (multimodal gemini-2.5-flash)
# Self-recurring fix-loop: bounded loop back to implement with evidence
- id: fix-loop
type: agent-loop-decision
agent: flow-decision
loop_target: implement
exit_target: report
max_iterations: 3
# Final judge: one honest verdict
- id: report
agent: game-judge</code></pre>
<p>Eighteen steps, seven cooks, five inspection points, one head chef. Triggered by a single order ticket.</p>
<p>Here's how the brigade actually worked. The <strong>vision-preflight</strong> agent — the chef who checks the gas is on before anyone starts cooking — verified <code>GEMINI_API_KEY</code> was set and <code>game_frames</code> could reach the live game. Both green in under a second. Without this, the whole kitchen would prep for an hour only to discover the oven doesn't work.</p>
<p>The <strong>project-context-reader</strong> — the commis who reads the entire recipe book — ingested <code>PlayerController.cs</code>, <code>PlayerAnimController.cs</code>, <code>PlayerAnimationLogic.cs</code>, the test files, the manifest. The <strong>feature-planner</strong> — the sous-chef who breaks down the order into station tasks — decomposed 19 failures into four fix groups: vegetation manifest (146 broken <code>prefabPath</code> items), animation controller (crouch parameter not plumbed), jump physics (coyote time, variable height, air control — all missing), and animation tree (entire state machine absent).</p>
<p>Then the <strong>game-builder</strong> — the line cook at the hot station — read each test failure like a dish ticket, traced it to the source, and started cooking. Coyote time: 100ms grace period after feet leave the ground. Variable jump height: velocity scaled by hold duration, tap gives 3.5, full hold gives 6.5. Air control: horizontal speed cut 40% while airborne. Jump phases: minimum 0.15s on jump_start before transitioning up. Landing timer: wait the full animation length, not length-minus-blend. Animation tree: <code>jump_start → jump → jump_land</code> states with 0.1s blends.</p>
<p>Then the inspection line: <strong>build-verifier</strong> compiled. <strong>Test-runner</strong> ran the suite. <strong>Behavioral-prober</strong> sent <code>{"jump":true}</code> to the live game and sampled the player body. <strong>Feel-judge</strong> measured apex, airtime, liftoff latency. <strong>Animation-vision-judge</strong> captured 8 frames, gridded them, had <code>gemini-2.5-flash</code> scan for T-poses and foot-slide.</p>
<p>Anything red → ticket back to the cook with the specific failure → fix → re-enter the line. Bounded to 3 returns. Anything green → falls through. All green → <strong>game-judge</strong> gives the final verdict.</p>
<div class="callout">
<span class="callout__kicker">Not a Demo</span>
<p>This flow is a file at <code>.pi/flows/flows/game-feature.yaml</code>. I trigger it by typing <code>/game-feature</code> in Pi. It dispatches agents, runs gates, loops on failures, reports a verdict. There is no dashboard with drag-and-drop. There is a YAML file and a slash command. That's the whole product.</p>
</div>
<hr class="accent">
<h2>The Menu: Flows Are Slash Commands</h2>
<p>Every flow becomes a slash command — the menu you read to the expediter. <code>.pi/flows/flows/game-feature.yaml</code><code>/game-feature</code>. You don't invoke a pipeline from a terminal. You order a dish in conversation.</p>
<p>"Add wall-running" is not a CLI flag. It's natural language. The flow reads it, wires it through the agents, routes it through the gates. The YAML is the recipe. The conversation is the context.</p>
<p>The menu I call from daily:</p>
<ul>
<li><strong>/game-feature</strong> — "add a double-jump" or "fix the 19 red tests" → brigade assembles, cooks, inspects, plates</li>
<li><strong>/deep-implement</strong> — "build the gitea-read extension" → research → plan → implement → test → review → judge</li>
<li><strong>/cto-infra</strong> — "reconcile cost, stability, and VCS research into architecture decisions" → 4 research streams → 1 synthesis agent → 14 requirements mapped to decisions</li>
<li><strong>/flows:new</strong> — "I need a flow that..." → the Flow Architect reads the agent catalog, selects cooks, designs the recipe, writes the YAML</li>
</ul>
<h2>The Pass: How Agents Hand Off Work</h2>
<p>In a real kitchen, cooks don't shout instructions across the room. They place finished plates on the pass. The expediter reads the ticket, checks the plate, routes it to the next station or to the dining room. Nobody yells. Nobody grabs someone else's pan.</p>
<p>Flows work the same way. Agents never talk to each other directly. When the game-builder finishes, it doesn't ping the test-runner. It calls <code>finish({ summary: "...", artifacts: "...", files: "..." })</code> — placing its work on the pass. The flow engine — the expediter — records it and routes it. The next agent receives exactly the inputs wired in the YAML: <code>${{result.game-builder.summary}}</code>, <code>${{result.game-builder.files}}</code>.</p>
<div class="kitchen-grid">
<div class="kitchen-col">
<span class="kitchen-col__title kitchen-col__title--kitchen">What People Expect</span>
<p>Agents chatting freely, PM-slack style: "Hey test-runner, I just pushed some code, can you check it? Also the jump feels off, maybe tune the velocity?"</p>
</div>
<div class="kitchen-col">
<span class="kitchen-col__title kitchen-col__title--reality">What Actually Happens</span>
<p>Agent A → <code>finish({verdict: "pass", findings: ["coyote_time=100ms"]})</code> → engine records → Agent B receives <code>${{result.A.findings}}</code> via <code>inputs:</code> block. No chatter. Structured handoff.</p>
</div>
</div>
<p>Why? Because unstructured chatter is how hallucination cascades start. Agent A confidently states something wrong. Agent B builds on it. Agent C compounds it. Three agents later, they're collectively wrong about a file that doesn't exist, and nobody can trace where the error came from. The pass — structured result-passing with typed outputs — makes every handoff auditable, verifiable, and debuggable.</p>
<p>Pi itself is built for solo interactive work: you ask, it does, you review. The orchestration layer I wrote on top inverts that. Pi becomes the kitchen. The flow engine becomes the expediter. Agents become line cooks who place plates on the pass, never shouting across the room.</p>
<h2>The Setup: Extensions, Agents, and 1520 Flows</h2>
<p>"How did you set this up?" is the question I get most often. Here's the honest answer: there's no dashboard with drag-and-drop. You write three kinds of files.</p>
<p><strong style="color:#f59e0b;">Extensions</strong> are TypeScript tools that agents call. Each is about 300 lines, MIT licensed:</p>
<table style="width:100%;border-collapse:collapse;margin:18px 0;font-size:0.89rem;">
<thead>
<tr style="text-align:left;border-bottom:1px solid #2a3340;">
<th style="padding:8px 12px;color:#c9935a;">Extension</th>
<th style="padding:8px 12px;color:#c9935a;">What agents call it for</th>
</tr>
</thead>
<tbody>
<tr style="border-bottom:1px solid #1c2230;"><td style="padding:7px 12px;color:#e6edf3;"><code>verify_build</code></td><td style="padding:7px 12px;color:#cdd7e2;">Compile the game + sim, return file:line errors</td></tr>
<tr style="border-bottom:1px solid #1c2230;"><td style="padding:7px 12px;color:#e6edf3;"><code>drive_game</code></td><td style="padding:7px 12px;color:#cdd7e2;">Send input to the live game, sample player body</td></tr>
<tr style="border-bottom:1px solid #1c2230;"><td style="padding:7px 12px;color:#e6edf3;"><code>game_frames</code></td><td style="padding:7px 12px;color:#cdd7e2;">Capture screenshot sequences for vision judging</td></tr>
<tr style="border-bottom:1px solid #1c2230;"><td style="padding:7px 12px;color:#e6edf3;"><code>ci_status</code></td><td style="padding:7px 12px;color:#cdd7e2;">Check Gitea Actions pipeline state for a branch</td></tr>
<tr style="border-bottom:1px solid #1c2230;"><td style="padding:7px 12px;color:#e6edf3;"><code>ci_logs</code></td><td style="padding:7px 12px;color:#cdd7e2;">Fetch full build log from the most recent failed run</td></tr>
<tr style="border-bottom:1px solid #1c2230;"><td style="padding:7px 12px;color:#e6edf3;"><code>ci_wait</code></td><td style="padding:7px 12px;color:#cdd7e2;">Poll every 15 seconds until the pipeline finishes</td></tr>
<tr style="border-bottom:1px solid #1c2230;"><td style="padding:7px 12px;color:#e6edf3;"><code>gen_image</code></td><td style="padding:7px 12px;color:#cdd7e2;">Generate brand/marketing images via fal.ai flux-2-pro</td></tr>
<tr><td style="padding:7px 12px;color:#e6edf3;"><code>agent_catalog</code></td><td style="padding:7px 12px;color:#cdd7e2;">List available agents with their tools, inputs, outputs</td></tr>
</tbody>
</table>
<p><strong style="color:#f59e0b;">Agents</strong> are Markdown files with YAML frontmatter. Each declares its role, model tier, tools, inputs, and outputs:</p>
<pre><code>---
name: game-builder
description: Implements game features in C# (Godot)
model: @coding
tools: read, write, edit, bash, verify_build, drive_game
inputs: [context, plan, build_fail, behaviour_fail, feel_fail, visual_fail]
outputs: [summary, files]
---
You are a game developer. Task: ${{task}}
Context: ${{input.context}}</code></pre>
<p><strong style="color:#f59e0b;">Flows</strong> are YAML DAGs that wire agents together. I have about <strong>1520 flows</strong> running across different domains:</p>
<ul>
<li><strong>Game dev:</strong> /game-feature, /review, /bug-hunt, /refactor</li>
<li><strong>Design:</strong> /concept-art, /sound-design (plans → ElevenLabs generation → judge evaluates with other models)</li>
<li><strong>Marketing:</strong> /brand-image, /trailer-clip (Sora 2 video generation → vision judge)</li>
<li><strong>Infra:</strong> /ci-fix, /deploy-check, /tstudio-jobs (action runners on AWS Lambda, workspace management)</li>
<li><strong>Meta:</strong> A flow that periodically reads and improves the other flows — yes, flows that edit flows</li>
</ul>
<p>The setup is not a product you install. It's a stack: Pi as the agent harness, custom extensions as the tool layer, markdown agents as the role layer, YAML flows as the orchestration layer. The whole thing lives in <code>.pi/flows/</code>. Version-controlled. CI-tested. Slash-command invoked.</p>
<h2>The Recipe vs. The Technique</h2>
<p>"Do you define the process with these trees, or do the agents freestyle?" Both. The recipe says what to make and in what order. The technique is how each cook executes their station.</p>
<div class="kitchen-grid">
<div class="kitchen-col">
<span class="kitchen-col__title kitchen-col__title--kitchen">The Recipe (Rigid)</span>
<p>The flow YAML is the recipe. It says: first the prep cook dices onions, then the saucier makes the base, then the grill cook sears the protein. After every station, the plate hits the pass for inspection. <strong>This order is not negotiable.</strong> A cook cannot skip the inspection because they feel confident. The inspection runs. Period.</p>
</div>
<div class="kitchen-col">
<span class="kitchen-col__title kitchen-col__title--reality">The Technique (Autonomous)</span>
<p>Inside their station, a cook has full agency. How they dice the onions — brunoise or rough chop — is their call. Which pan they use, how they adjust the heat, whether they taste midway. The game-builder decides which files to read, which approach to take. Nobody tells it "edit line 247." It figures that out with <code>grep</code>, <code>find</code>, and reading code.</p>
</div>
</div>
<p>This balance is everything. Too much recipe → agents can't handle surprises. Too much freestyle → agents hallucinate, skip checks, ship broken code. The recipe guarantees the right things happen in the right order — preflight before build, build before test, test before ship. The technique handles the messy, unpredictable reality of actual code.</p>
<div class="callout callout--purple">
<span class="callout__kicker">The Meta-Kitchen</span>
<p>And when a recipe is wrong? Another flow improves it. A meta-flow reads performance data, spots bottlenecks — "the feel gate keeps failing because the cook doesn't know the jump velocity threshold" — edits the YAML to wire that threshold into the builder's inputs, and commits the change. <strong>Flows that edit flows.</strong> The kitchen that renovates itself between services.</p>
</div>
<hr class="accent">
<h2>Picking the Right Knife: Model Strategy</h2>
<p>You don't use a paring knife to butcher a cow. You don't use a cleaver to supreme an orange. Different work needs different blades. Flows use <strong>role-based model tiers</strong> — each agent declares the blade it needs, and the engine hands it the right one at dispatch time.</p>
<table style="width:100%;border-collapse:collapse;margin:18px 0;font-size:0.89rem;">
<thead>
<tr style="text-align:left;border-bottom:1px solid #2a3340;">
<th style="padding:8px 12px;color:#c9935a;">Tier</th>
<th style="padding:8px 12px;color:#c9935a;">The Knife</th>
<th style="padding:8px 12px;color:#c9935a;">What It Cuts</th>
</tr>
</thead>
<tbody>
<tr style="border-bottom:1px solid #1c2230;"><td style="padding:7px 12px;color:#e6edf3;"><code>@coding</code></td><td style="padding:7px 12px;color:#f59e0b;">DeepSeek V4</td><td style="padding:7px 12px;color:#cdd7e2;"><strong>Chef's knife</strong> — your workhorse. Reads 800-line files, writes 200-line diffs. Game-builder, fixer, test-author. Free.</td></tr>
<tr style="border-bottom:1px solid #1c2230;"><td style="padding:7px 12px;color:#e6edf3;"><code>@planning</code></td><td style="padding:7px 12px;color:#f59e0b;">DeepSeek V4</td><td style="padding:7px 12px;color:#cdd7e2;"><strong>Boning knife</strong> — precision decomposition. Breaks tasks into steps, designs DAGs. Flow architect, feature planner.</td></tr>
<tr style="border-bottom:1px solid #1c2230;"><td style="padding:7px 12px;color:#e6edf3;"><code>@fast</code></td><td style="padding:7px 12px;color:#38bdf8;">DeepSeek V4 Flash</td><td style="padding:7px 12px;color:#cdd7e2;"><strong>Paring knife</strong> — quick, decisive cuts. Gate pass/fail, fork choices, loop exits. No overthinking.</td></tr>
<tr style="border-bottom:1px solid #1c2230;"><td style="padding:7px 12px;color:#e6edf3;"><code>@research</code></td><td style="padding:7px 12px;color:#f59e0b;">DeepSeek V4</td><td style="padding:7px 12px;color:#cdd7e2;"><strong>Fillet knife</strong> — flexible, follows contours. Reads codebase, traces patterns, finds what matters.</td></tr>
<tr style="border-bottom:1px solid #1c2230;"><td style="padding:7px 12px;color:#e6edf3;"><code>@vision</code></td><td style="padding:7px 12px;color:#a855f7;">Gemini 2.5 Flash</td><td style="padding:7px 12px;color:#cdd7e2;"><strong>The inspector's eyes</strong> — the only knife that sees. Multimodal frame judging: T-poses, foot-slide, frozen anims.</td></tr>
<tr><td style="padding:7px 12px;color:#e6edf3;"><code>@compact</code></td><td style="padding:7px 12px;color:#38bdf8;">DeepSeek V4 Flash</td><td style="padding:7px 12px;color:#cdd7e2;"><strong>Kitchen shears</strong> — lightweight, versatile. Summaries, verdicts, post-processing. Fast and cheap.</td></tr>
</tbody>
</table>
<div class="callout callout--amber">
<span class="callout__kicker">Why DeepSeek?</span>
<p>Two reasons. <strong>It's free</strong> — no usage limits, which matters when your game-builder reads 800-line files and writes 200-line diffs ten times a session. <strong>It's genuinely good at C# and Godot</strong> — I've had it write a full lighting module for our Godot fork by reading Unity API docs and adapting patterns. No agent had pulled that off before. DeepSeek can't do multimodal, so vision goes to Gemini — but for everything else, it's the chef's knife you reach for 90% of the time.</p>
</div>
<p>The point of the knife rack: you configure this <strong>once</strong>. Every agent declares <code>model: @coding</code> and gets DeepSeek V4 automatically. Swap models globally without touching any flow or agent file. The right blade, every time, no thinking required.</p>
<hr class="accent">
<p>The oracle tools — <code>verify_build</code>, <code>drive_game</code>, <code>game_frames</code> — are the durable assets. About 300 lines of TypeScript each, MIT licensed, reusable in any Pi project. The flow engine composes them; the agents route through them.</p>
<p>A year ago we had a supervisor written in 1,050 lines of hardcoded TypeScript that did one thing: verify agent output compiled and passed tests. We deleted it. The same verification now runs as a composable flow with five gates, live-game testing, and CI integration. Sometimes the best architecture decision is knowing what to delete.</p>
<p><em>The flow-native brain runs on our <a href="https://tinqs.com/tinqs/pi">Pi fork</a> inside <a href="https://tinqs.com">Tinqs Studio</a>. The oracle extensions are MIT licensed and reusable in any Pi project.</em></p>
</div>
<div class="post__author">
<div class="post__author-avatar">OB</div>
<div class="post__author-info">
<span class="post__author-name">Ozan Bozkurt</span><br>
CTO & Developer, Tinqs
</div>
</div>
</article>
</body>
</html>