Files
blog/pi-flow-native-brain.html
T

370 lines
21 KiB
HTML
Raw Normal View History

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>How Pi Agents Build, Test, and Ship Game Code with Oracle-Backed Flows — Tinqs Blog</title>
<meta name="description" content="We use Pi flows with oracle-backed gates to make agents compile, test, drive the live game, measure feel, fix CI failures, and ship green PRs — all autonomously.">
<meta name="robots" content="index, follow">
<link rel="canonical" href="https://www.tinqs.com/blog/pi-flow-native-brain">
<meta property="og:type" content="article">
<meta property="og:url" content="https://www.tinqs.com/blog/pi-flow-native-brain">
<meta property="og:title" content="How Pi Agents Build, Test, and Ship Game Code with Oracle-Backed Flows">
<meta property="og:description" content="Pi flows + oracle-backed gates: agents that compile, test, drive the game, measure feel, fix CI, and ship green PRs.">
<meta property="og:image" content="https://www.tinqs.com/img/og-cover.jpg">
<meta name="twitter:card" content="summary_large_image">
<meta name="twitter:title" content="How Pi Agents Build, Test, and Ship Game Code with Oracle-Backed Flows">
<meta name="twitter:description" content="Pi flows + oracle-backed gates: agents that compile, test, drive the game, measure feel, fix CI, and ship green PRs.">
<meta name="twitter:image" content="https://www.tinqs.com/img/og-cover.jpg">
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "BlogPosting",
"headline": "How Pi Agents Build, Test, and Ship Game Code with Oracle-Backed Flows",
"datePublished": "2026-06-03",
"author": {
"@type": "Person",
"name": "Ozan Bozkurt"
},
"publisher": {
"@type": "Organization",
"name": "Tinqs Limited",
"url": "https://www.tinqs.com"
},
"description": "We use Pi flows with oracle-backed gates to make agents compile, test, drive the live game, measure feel, fix CI failures, and ship green PRs — all autonomously."
}
</script>
<link rel="icon" type="image/svg+xml" href="/img/favicon.svg">
<link rel="preconnect" href="https://fonts.googleapis.com">
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
<link href="https://fonts.googleapis.com/css2?family=IBM+Plex+Sans:ital,wght@0,300;0,400;0,500;0,600;0,700;1,300;1,400;1,500;1,600;1,700&display=swap" rel="stylesheet">
<link rel="stylesheet" href="../style.css">
<style>
/* ── Team guide aesthetic: self-contained overrides ── */
.post__title {
background: linear-gradient(90deg, #c9935a, #f59e0b 40%, #38bdf8);
-webkit-background-clip: text;
background-clip: text;
color: transparent;
font-weight: 800;
}
.post__date {
display: inline-block;
font-family: ui-monospace, 'SF Mono', 'Cascadia Code', Consolas, monospace;
font-size: 0.72rem;
letter-spacing: 0.22em;
text-transform: uppercase;
color: #38bdf8;
border: 1px solid rgba(147, 140, 129, 0.25);
border-radius: 999px;
padding: 4px 14px;
margin-bottom: 16px;
}
.post__lead {
color: #9aa7b4;
font-size: 1.08rem;
line-height: 1.7;
}
.post__body h2 {
font-size: 1.7rem;
margin: 54px 0 6px;
padding-left: 16px;
border-left: 4px solid #c9935a;
}
.post__body h3 {
color: #a855f7;
font-size: 1.18rem;
margin: 30px 0 4px;
}
.post__body code {
font-family: ui-monospace, 'SF Mono', 'Cascadia Code', Consolas, monospace;
font-size: 0.86em;
background: #1c2230;
color: #9fe6c0;
padding: 2px 6px;
border-radius: 5px;
border: 1px solid #2a3340;
}
.post__body pre {
background: #0a0e14;
border: 1px solid #2a3340;
border-radius: 10px;
padding: 16px 18px;
overflow-x: auto;
margin: 14px 0;
font-family: ui-monospace, 'SF Mono', 'Cascadia Code', Consolas, monospace;
font-size: 0.85rem;
line-height: 1.55;
color: #e6edf3;
}
.post__body pre code {
background: transparent;
padding: 0;
border: none;
font-size: inherit;
color: inherit;
border-radius: 0;
}
.post__body blockquote {
background: rgba(245, 158, 11, 0.08);
border: 1px solid rgba(245, 158, 11, 0.25);
border-left: 4px solid #f59e0b;
border-radius: 0 12px 12px 0;
padding: 16px 18px;
margin: 18px 0;
color: #f4e3c4;
font-size: 0.94rem;
}
.post__body a {
color: #38bdf8;
}
.post__body a:hover {
color: #a855f7;
}
.post__body strong {
color: #f59e0b;
}
.post__body hr {
border: none;
border-top: 1px solid #2a3340;
margin: 32px 0;
}
.post__body figure img {
border-radius: 12px;
border: 1px solid #2a3340;
}
.post__body figcaption {
color: #9aa7b4;
font-size: 0.85rem;
margin-top: 6px;
}
.post__body li {
margin: 4px 0;
}
</style>
</head>
<body>
<nav class="nav nav--scrolled" id="nav">
<a href="/" class="nav__logo" aria-label="Tinqs home">
<span class="nav__wordmark">TINQS</span>
</a>
<div class="nav__links">
<a href="/#game" class="nav__link">Games</a>
<a href="/#tech" class="nav__link">Technology</a>
<a href="/#about" class="nav__link">About</a>
<a href="/blog/" class="nav__link" style="color: var(--c-accent-l);">Blog</a>
<a href="/#signup" class="nav__link">Contact</a>
<a href="/press" class="nav__link">Press</a>
</div>
<button class="nav__burger" aria-label="Open menu" id="navBurger">
<span></span><span></span><span></span>
</button>
</nav>
<div class="mobile-menu" id="mobileMenu">
<a href="/#game" class="mobile-menu__link">Games</a>
<a href="/#tech" class="mobile-menu__link">Technology</a>
<a href="/#about" class="mobile-menu__link">About</a>
<a href="/blog/" class="mobile-menu__link">Blog</a>
<a href="/#signup" class="mobile-menu__link">Contact</a>
<a href="/press" class="mobile-menu__link">Press</a>
</div>
<article class="post">
<a href="/blog/" class="post__back">&larr; All Posts</a>
<span class="post__date">3 June 2026</span>
<h1 class="post__title">How Pi Agents Build, Test, and Ship Code with Oracle-Backed Flows</h1>
<p class="post__lead">When we ask Pi to build a feature for Ariki — say, "add a double-jump with a cooldown indicator" — five things happen. The agent writes the code. A build gate compiles it. A test gate runs the test suite. A behaviour gate drives the live game and checks the character actually double-jumps. A feel gate measures apex height, airtime, and landing settle. And if CI disagrees with any of it, the agent reads the failure log and fixes it. None of this is magic. It's Pi flows.</p>
<div class="post__body">
<h2>What Happens When You Ask Pi to Build Something</h2>
<p>The flow starts the same way every agent task does: context, then plan, then implement. That's the standard loop. What makes it interesting is what happens <em>after</em> implementation — a ladder of five gates, each run by a specialised sub-agent with its own tools and its own pass/fail authority.</p>
<figure style="margin:28px 0;">
<svg viewBox="0 0 920 350" role="img" aria-label="The verify-heavy flow: context, plan, implement, five gates, a Reflexion loop, and one judge" style="width:100%;height:auto;display:block;background:#0a0e14;border:1px solid #2a3340;border-radius:12px;font-family:'IBM Plex Sans',system-ui,sans-serif;">
<defs>
<marker id="ah" markerWidth="10" markerHeight="10" refX="7" refY="3.2" orient="auto"><path d="M0,0 L7,3.2 L0,6.4 Z" fill="#5b6b7d"/></marker>
<marker id="ahA" markerWidth="10" markerHeight="10" refX="7" refY="3.2" orient="auto"><path d="M0,0 L7,3.2 L0,6.4 Z" fill="#f59e0b"/></marker>
</defs>
<rect x="40" y="40" width="140" height="46" rx="9" fill="#121821" stroke="#2a3340"/>
<text x="110" y="68" text-anchor="middle" fill="#cdd7e2" font-size="15">Context</text>
<rect x="210" y="40" width="140" height="46" rx="9" fill="#121821" stroke="#2a3340"/>
<text x="280" y="68" text-anchor="middle" fill="#cdd7e2" font-size="15">Plan</text>
<rect x="400" y="40" width="150" height="46" rx="9" fill="#15202e" stroke="#3a4656"/>
<text x="475" y="68" text-anchor="middle" fill="#e6edf3" font-size="15">Implement</text>
<line x1="180" y1="63" x2="206" y2="63" stroke="#5b6b7d" stroke-width="1.6" marker-end="url(#ah)"/>
<line x1="350" y1="63" x2="396" y2="63" stroke="#5b6b7d" stroke-width="1.6" marker-end="url(#ah)"/>
<rect x="40" y="150" width="840" height="82" rx="12" fill="#0c1119" stroke="#2a3340"/>
<text x="56" y="171" fill="#6b7a8d" font-size="11" letter-spacing="1.4">VERIFY-HEAVY GATES — most compute is spent checking, not writing</text>
<rect x="56" y="180" width="148" height="42" rx="8" fill="#10141c" stroke="#38bdf8" stroke-opacity="0.55"/>
<text x="130" y="206" text-anchor="middle" fill="#38bdf8" font-size="13.5">G1 · Build</text>
<rect x="222" y="180" width="148" height="42" rx="8" fill="#10141c" stroke="#34d399" stroke-opacity="0.55"/>
<text x="296" y="206" text-anchor="middle" fill="#9fe6c0" font-size="13.5">G2 · Tests</text>
<rect x="388" y="180" width="148" height="42" rx="8" fill="#10141c" stroke="#a855f7" stroke-opacity="0.55"/>
<text x="462" y="206" text-anchor="middle" fill="#c4a0f7" font-size="13.5">G3 · Behaviour</text>
<rect x="554" y="180" width="148" height="42" rx="8" fill="#10141c" stroke="#f59e0b" stroke-opacity="0.55"/>
<text x="628" y="206" text-anchor="middle" fill="#f5b44b" font-size="13.5">G4 · Feel</text>
<rect x="720" y="180" width="148" height="42" rx="8" fill="#10141c" stroke="#c9935a" stroke-opacity="0.55"/>
<text x="794" y="206" text-anchor="middle" fill="#d9ac7b" font-size="13.5">G5 · Visual</text>
<line x1="475" y1="86" x2="475" y2="148" stroke="#5b6b7d" stroke-width="1.6" marker-end="url(#ah)"/>
<line x1="460" y1="232" x2="460" y2="276" stroke="#5b6b7d" stroke-width="1.6" marker-end="url(#ah)"/>
<text x="472" y="258" fill="#6b7a8d" font-size="11">all green ⇒ done · any fail ⇒ report</text>
<rect x="380" y="278" width="160" height="46" rx="9" fill="#1b1505" stroke="#c9935a"/>
<text x="460" y="306" text-anchor="middle" fill="#f3d6a0" font-size="15">Judge — honest verdict</text>
<path d="M820,150 C 908,96 716,50 556,61" fill="none" stroke="#f59e0b" stroke-width="1.8" stroke-dasharray="6 5" marker-end="url(#ahA)"/>
<text x="694" y="96" fill="#f59e0b" font-size="12.5">Reflexion · fix & retry ≤ 3</text>
</svg>
<figcaption style="color:#9aa7b4;font-size:0.85rem;margin-top:8px;">A real failure loops back to <em>implement</em> with gate evidence (bounded to three tries); anything green falls through to the judge.</figcaption>
</figure>
<h2>The Five Gates</h2>
<p>Each gate is a sub-agent with one job and one tool.</p>
<p><strong style="color:#f59e0b;">G1 — Build.</strong> Runs <code>dotnet build</code> on the game and sim. Returns PASS/FAIL with file:line errors. If the code doesn't compile, nothing proceeds.</p>
<p><strong style="color:#f59e0b;">G2 — Tests.</strong> Runs <code>dotnet test</code> and parses results. The agent reads which tests broke and fixes assertions, mocks, or test setup.</p>
<p><strong style="color:#f59e0b;">G3 — Behaviour (live game).</strong> This is the one that makes game dev different from web dev. The agent sends input to the running game — <code>{"jump": true}</code> — and samples the player body 30 times at 50ms intervals. It checks: did the character actually jump? Did the double-jump fire? Was there a cooldown? The <code>drive_game</code> tool is the ground-truth oracle for whether a movement feature works in-game, not just in tests.</p>
<p><strong style="color:#f59e0b;">G4 — Feel (measured game-feel).</strong> Behaviour checks whether it worked. Feel checks whether it felt good. The agent measures apex height, airtime, liftoff latency, rise/fall asymmetry, and landing settle. Numeric metrics with thresholds. A jump that technically works but takes 400ms to lift off fails the feel gate.</p>
<p><strong style="color:#f59e0b;">G5 — Visual.</strong> Captures frame sequences from the live game and feeds them to a vision model. Checks: is the animation playing? Is the cooldown indicator visible? Are there visual artifacts?</p>
<p>Anything green falls through to the judge. Anything red loops back to implement with the failure evidence — the agent reads what went wrong, fixes it, and re-enters the gate ladder. Three retries max, then escalation to a human.</p>
<h2>Composability: Gates Are Cheap to Add</h2>
<p>The flow started with three gates — build, test, vision. Behaviour and feel were added later, each as a one-file extension. Gates are not hardcoded. They're sub-agents declared in a flow config. Want a linting gate? Add a sub-agent with a linter tool. Want a security scan? Same pattern. Want a gate that checks asset bundle sizes haven't bloated? Write the tool, declare the sub-agent, wire it into the flow.</p>
<p>Agents themselves can extend the flow. If a sub-agent notices a pattern of failures — "the last three behaviour checks failed because the game window wasn't focused" — it can insert a pre-condition gate that checks window focus before proceeding. The flow engine handles routing; the agents handle decisions.</p>
<p>This is what makes flows fundamentally different from a script: the pipeline is not fixed at compile time. It's a graph that agents read, understand, and modify at runtime.</p>
<h2>The CI Loop: Agents That Fix Their Own Builds</h2>
<p>Gates handle pre-push verification. But what about after push? What about CI?</p>
<p>Most coding agents don't care if the code compiles on the CI runner. They write, they push, they walk away. A human discovers the broken build an hour later.</p>
<p>We closed this loop with the <code>tinqs-ci</code> extension — three tools that give agents post-push autonomy:</p>
<ul>
<li><strong>ci_status</strong> — checks pipeline state for a branch</li>
<li><strong>ci_logs</strong> — fetches the full build log from the most recent failed run</li>
<li><strong>ci_wait</strong> — polls every 15 seconds until the pipeline finishes</li>
</ul>
<p>The agent pushes its branch, calls <code>ci_wait</code>, and if CI fails, reads <code>ci_logs</code>, fixes the issue, pushes again, and polls again. DeepSeek V4 parses compiler errors, identifies files and lines, and fixes them. A missing import, a type mismatch, a module not found — pattern-matched and corrected in seconds.</p>
<p>A real example from last week: adding a health check endpoint to a Go service. Agent wrote the handler and test, pushed. CI failed — the test imported a package that didn't exist on the runner. Agent read <code>ci_logs</code>, saw <code>go: module not found</code>, added the missing <code>go.mod</code> replace directive, pushed again. CI passed. PR opened. <strong>4 minutes. $0.06.</strong></p>
<p>Three safeguards prevent runaway loops: <strong>retry limit</strong> (3, hard-coded in the orchestrator), <strong>diff budget</strong> (retries only touch files already in the changeset), and <strong>hallucination detection</strong> (if the agent claims CI passed without calling <code>ci_status</code>, it gets corrected).</p>
<h2>The Numbers</h2>
<p>Over three weeks of running the orchestrator:</p>
<ul>
<li><strong>87 tasks</strong> completed end-to-end</li>
<li><strong>23 tasks</strong> needed at least one CI retry (26%)</li>
<li><strong>19 of those 23</strong> resolved on the first retry</li>
<li><strong>4 tasks</strong> hit the retry limit and escalated to a human</li>
<li><strong>0 tasks</strong> produced a merged PR that later broke something else</li>
</ul>
<p>The 26% retry rate matches what you'd see from a junior developer. The difference: the agent fixes it in 30 seconds.</p>
<h2>The Architecture</h2>
<table style="width:100%;border-collapse:collapse;margin:18px 0;font-size:0.92rem;">
<thead>
<tr style="text-align:left;border-bottom:1px solid #2a3340;">
<th style="padding:10px 12px;color:#c9935a;font-weight:600;">Layer</th>
<th style="padding:10px 12px;color:#c9935a;font-weight:600;">What</th>
<th style="padding:10px 12px;color:#c9935a;font-weight:600;">How</th>
</tr>
</thead>
<tbody>
<tr style="border-bottom:1px solid #1c2230;"><td style="padding:9px 12px;color:#e6edf3;vertical-align:top;"><strong style="color:#f59e0b;">Flow engine</strong></td><td style="padding:9px 12px;color:#cdd7e2;vertical-align:top;">pi-flows orchestrator</td><td style="padding:9px 12px;color:#9aa7b4;vertical-align:top;">Composes agents, gates and decision points</td></tr>
<tr style="border-bottom:1px solid #1c2230;"><td style="padding:9px 12px;color:#e6edf3;vertical-align:top;"><strong style="color:#f59e0b;">Oracle gates</strong></td><td style="padding:9px 12px;color:#cdd7e2;vertical-align:top;">verify_build, drive_game, game_frames</td><td style="padding:9px 12px;color:#9aa7b4;vertical-align:top;">Return structured PASS/FAIL with evidence</td></tr>
<tr style="border-bottom:1px solid #1c2230;"><td style="padding:9px 12px;color:#e6edf3;vertical-align:top;"><strong style="color:#f59e0b;">Sub-agents</strong></td><td style="padding:9px 12px;color:#cdd7e2;vertical-align:top;">G1 build · G2 tests · G3 behaviour · G4 feel · G5 visual</td><td style="padding:9px 12px;color:#9aa7b4;vertical-align:top;">Role-split, each with its own toolset</td></tr>
<tr style="border-bottom:1px solid #1c2230;"><td style="padding:9px 12px;color:#e6edf3;vertical-align:top;"><strong style="color:#f59e0b;">CI loop</strong></td><td style="padding:9px 12px;color:#cdd7e2;vertical-align:top;">tinqs-ci extension</td><td style="padding:9px 12px;color:#9aa7b4;vertical-align:top;">ci_status, ci_logs, ci_wait — polls Gitea Actions, reads logs, retries</td></tr>
<tr style="border-bottom:1px solid #1c2230;"><td style="padding:9px 12px;color:#e6edf3;vertical-align:top;"><strong style="color:#f59e0b;">Decision</strong></td><td style="padding:9px 12px;color:#cdd7e2;vertical-align:top;">Agent-loop Reflexion</td><td style="padding:9px 12px;color:#9aa7b4;vertical-align:top;">Self-reflect on failures, retry (≤3) or escalate</td></tr>
<tr><td style="padding:9px 12px;color:#e6edf3;vertical-align:top;"><strong style="color:#f59e0b;">Visualization</strong></td><td style="padding:9px 12px;color:#cdd7e2;vertical-align:top;">FlowDashboard</td><td style="padding:9px 12px;color:#9aa7b4;vertical-align:top;">Real-time pipeline state</td></tr>
</tbody>
</table>
<hr>
<p>The oracle tools — <code>verify_build</code>, <code>drive_game</code>, <code>game_frames</code> — are the durable assets. About 300 lines of TypeScript each, MIT licensed, reusable in any Pi project. The flow engine composes them; the agents route through them.</p>
<p>A year ago we had a supervisor written in 1,050 lines of hardcoded TypeScript that did one thing: verify agent output compiled and passed tests. We deleted it. The same verification now runs as a composable flow with five gates, live-game testing, and CI integration. Sometimes the best architecture decision is knowing what to delete.</p>
<p><em>The flow-native brain runs on our <a href="https://tinqs.com/tinqs/pi">Pi fork</a> inside <a href="https://tinqs.com">Tinqs Studio</a>. The oracle extensions are MIT licensed and reusable in any Pi project.</em></p>
</div>
<div class="post__author">
<div class="post__author-avatar">OB</div>
<div class="post__author-info">
<span class="post__author-name">Ozan Bozkurt</span><br>
CTO & Developer, Tinqs
</div>
</div>
</article>
<footer class="footer">
<div class="footer__inner">
<span class="footer__wordmark">TINQS</span>
<div class="footer__links">
<a href="/#game">Games</a>
<a href="/#tech">Technology</a>
<a href="/#about">About</a>
<a href="/blog/">Blog</a>
<a href="mailto:hello@tinqs.com">hello@tinqs.com</a>
<a href="/press">Press Kit</a>
</div>
<p class="footer__copy">Tinqs Limited &mdash; London, est. 2020</p>
</div>
</footer>
<script>
const burger = document.getElementById('navBurger');
const mobileMenu = document.getElementById('mobileMenu');
burger.addEventListener('click', () => {
const open = mobileMenu.classList.toggle('mobile-menu--open');
burger.classList.toggle('nav__burger--open', open);
document.body.style.overflow = open ? 'hidden' : '';
});
mobileMenu.querySelectorAll('a').forEach(link => {
link.addEventListener('click', () => {
mobileMenu.classList.remove('mobile-menu--open');
burger.classList.remove('nav__burger--open');
document.body.style.overflow = '';
});
});
</script>
</body>
</html>