From d39e9b9534aef17b6150c2acd08c34120c24bc77 Mon Sep 17 00:00:00 2001
From: tinqs-limited <ozan@tinqs.com>
Date: Wed, 3 Jun 2026 02:37:00 +0100
Subject: [PATCH] merge: combine pi-flow-native-brain + pi-ci-integrator into
 one post
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Single public-facing post that tells the complete Pi autonomy story:
- Part 1: Retiring the hardcoded supervisor (1,050 lines deleted)
  replaced by composable oracle-backed pi-flows with 5 gates
  (build, test, behaviour, feel, visual)
- Part 2: The CI integrator — agents that watch CI, read failure
  logs, and fix their own broken builds (tinqs-ci extension)
- Combined stack: flow engine → gates → sub-agents → CI loop
- SVG gate pipeline + deletion bar diagrams preserved via
  build.js <!--raw--> passthrough
- Removed old separate pi-ci-integrator post; pi-flow-native-brain
  is now a proper .md source (replaces hand-authored HTML)
---
 README.md                     |   2 +
 index.html                    |  14 +-
 pi-ci-integrator.html         | 331 ----------------------------------
 pi-flow-native-brain.html     | 124 ++++++++++---
 posts/pi-ci-integrator.md     | 103 -----------
 posts/pi-flow-native-brain.md | 259 ++++++++++++++++++++++++++
 6 files changed, 363 insertions(+), 470 deletions(-)
 delete mode 100644 pi-ci-integrator.html
 delete mode 100644 posts/pi-ci-integrator.md
 create mode 100644 posts/pi-flow-native-brain.md
diff --git a/README.md b/README.md
index cbcb65d..309b999 100644
--- a/README.md
+++ b/README.md
@@ -12,6 +12,8 @@ We're building Tinqs Studio while using it to make our own game --- a survival c
 - [Streaming a 12km Archipelago in Godot 4](posts/godot-optimisation.md) (2026-05-22)
 - [AI Art at Scale: Using fal.ai Flux for Game Asset Generation](posts/fal-image-generation.md) (2026-05-25)
 - [Tinqs Studio Is an Agent Harness for Game Dev](posts/agent-harness.md) (2026-05-25)
+- [Pi's Flow-Native Brain: Retiring the Supervisor, Teaching Agents to Fix Their Own Builds](posts/pi-flow-native-brain.md) (2026-06-03)
+- [Our Blog Just Got a Visual Upgrade — Here's How We Did It](posts/blog-visual-upgrade.md) (2026-06-03)
 
 ## Skills
 
diff --git a/index.html b/index.html
index 9c86bff..54e4393 100644
--- a/index.html
+++ b/index.html
@@ -128,6 +128,13 @@
       <span class="blog-card__read">Read &rarr;</span>
     </a>
 
+    <a href="pi-flow-native-brain" class="blog-card">
+      <span class="blog-card__date">3 June 2026</span>
+      <h2 class="blog-card__title">Pi's Flow-Native Brain: Retiring the Supervisor, Teaching Agents to Fix Their Own Builds</h2>
+      <p class="blog-card__excerpt">Two changes made Pi genuinely autonomous: we deleted the hardcoded supervisor and replaced it with composable oracle-backed flows, and we taught agents to watch CI, read failure logs, and fix their own broken builds.</p>
+      <span class="blog-card__read">Read &rarr;</span>
+    </a>
+
     <a href="cloud-harness" class="blog-card">
       <span class="blog-card__date">26 May 2026</span>
       <h2 class="blog-card__title">Building a Cloud Agent Harness with DeepSeek V4 and Pi</h2>
@@ -163,13 +170,6 @@
       <span class="blog-card__read">Read &rarr;</span>
     </a>
 
-    <a href="pi-ci-integrator" class="blog-card">
-      <span class="blog-card__date">25 May 2026</span>
-      <h2 class="blog-card__title">Pi as CI Integrator: Agents That Fix Their Own Builds</h2>
-      <p class="blog-card__excerpt">Most coding agents stop at git push. Our Pi fork watches CI, reads failure logs, and fixes its own code until the pipeline goes green.</p>
-      <span class="blog-card__read">Read &rarr;</span>
-    </a>
-
     <a href="pre-commit-agent" class="blog-card">
       <span class="blog-card__date">25 May 2026</span>
       <h2 class="blog-card__title">A Pre-Commit Agent That Guards Your Secrets for $0.001</h2>
diff --git a/pi-ci-integrator.html b/pi-ci-integrator.html
deleted file mode 100644
index 8096fc5..0000000
--- a/pi-ci-integrator.html
+++ /dev/null
@@ -1,331 +0,0 @@
-<!DOCTYPE html>
-<html lang="en">
-<head>
-  <meta charset="UTF-8">
-  <meta name="viewport" content="width=device-width, initial-scale=1.0">
-
-  <title>Pi as CI Integrator: Agents That Fix Their Own Builds — Tinqs Blog</title>
-  <meta name="description" content="Most coding agents stop at git push. Our Pi fork watches CI, reads failure logs, and fixes its own code until the pipeline goes green.">
-  <meta name="robots" content="index, follow">
-  <link rel="canonical" href="https://www.tinqs.com/blog/pi-ci-integrator">
-
-  <meta property="og:type" content="article">
-  <meta property="og:url" content="https://www.tinqs.com/blog/pi-ci-integrator">
-  <meta property="og:title" content="Pi as CI Integrator: Agents That Fix Their Own Builds">
-  <meta property="og:description" content="Coding agents that watch CI and fix their own builds.">
-  <meta property="og:image" content="https://www.tinqs.com/img/og-cover.jpg">
-
-  <meta name="twitter:card" content="summary_large_image">
-  <meta name="twitter:title" content="Pi as CI Integrator: Agents That Fix Their Own Builds">
-  <meta name="twitter:description" content="Coding agents that watch CI and fix their own builds.">
-  <meta name="twitter:image" content="https://www.tinqs.com/img/og-cover.jpg">
-
-  <script type="application/ld+json">
-  {
-    "@context": "https://schema.org",
-    "@type": "BlogPosting",
-    "headline": "Pi as CI Integrator: Agents That Fix Their Own Builds",
-    "datePublished": "2026-05-25",
-    "author": {
-      "@type": "Person",
-      "name": "Ozan Bozkurt"
-    },
-    "publisher": {
-      "@type": "Organization",
-      "name": "Tinqs Limited",
-      "url": "https://www.tinqs.com"
-    },
-    "description": "Most coding agents stop at git push. Our Pi fork watches CI, reads failure logs, and fixes its own code until the pipeline goes green."
-  }
-  </script>
-
-  <!-- PostHog (EU) -->
-  <script>
-    !function(t,e){var o,n,p,r;e.__SV||(window.posthog=e,e._i=[],e.init=function(i,s,a){function g(t,e){var o=e.split(".");2==o.length&&(t=t[o[0]],e=o[1]),t[e]=function(){t.push([e].concat(Array.prototype.slice.call(arguments,0)))}}(p=t.createElement("script")).type="text/javascript",p.crossOrigin="anonymous",p.async=!0,p.src=s.api_host.replace(".i.posthog.com","-assets.i.posthog.com")+"/static/array.js",(r=t.getElementsByTagName("script")[0]).parentNode.insertBefore(p,r);var u=e;for(void 0!==a?u=e[a]=[]:a="posthog",u.people=u.people||[],u.toString=function(t){var e="posthog";return"posthog"!==a&&(e+="."+a),t||(e+=" (stub)"),e},u.people.toString=function(){return u.toString(1)+".people (stub)"},o="init capture register register_once register_for_session unregister unregister_for_session getFeatureFlag getFeatureFlagPayload isFeatureEnabled reloadFeatureFlags updateEarlyAccessFeatureEnrollment getEarlyAccessFeatures on onFeatureFlags onSessionId getSurveys getActiveMatchingSurveys renderSurvey canRenderSurvey getNextSurveyStep identify setPersonProperties group resetGroups setPersonPropertiesForFlags resetPersonPropertiesForFlags setGroupPropertiesForFlags resetGroupPropertiesForFlags reset get_distinct_id getGroups get_session_id get_session_replay_url alias set_config startSessionRecording stopSessionRecording sessionRecordingStarted captureException loadToolbar get_property getSessionProperty createPersonProfile opt_in_capturing opt_out_capturing has_opted_in_capturing has_opted_out_capturing clear_opt_in_out_capturing debug".split(" "),n=0;n<o.length;n++)g(u,o[n]);e._i.push([i,s,a])},e.__SV=1)}(document,window.posthog||[]);
-    posthog.init('phc_teG6p5oxf6poQHPThq5AGKzWQNhw4bHW9arLwWAVXm3f',{api_host:'https://eu.i.posthog.com',ui_host:'https://eu.posthog.com',person_profiles:'identified_only',defaults:'2026-01-30'})
-  </script>
-
-  <link rel="icon" type="image/svg+xml" href="/img/favicon.svg">
-  <link rel="preconnect" href="https://fonts.googleapis.com">
-  <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
-  <link href="https://fonts.googleapis.com/css2?family=IBM+Plex+Sans:ital,wght@0,300;0,400;0,500;0,600;0,700;1,300;1,400;1,500;1,600;1,700&display=swap" rel="stylesheet">
-  <link rel="stylesheet" href="../style.css">
-  <style>
-    /* ── Team guide aesthetic: self-contained overrides ── */
-
-    /* ── Gradient title (amber → warm gold, hint of blue) ── */
-    .post__title {
-      background: linear-gradient(90deg, #c9935a, #f59e0b 40%, #38bdf8);
-      -webkit-background-clip: text;
-      background-clip: text;
-      color: transparent;
-      font-weight: 800;
-    }
-
-    /* ── Date pill ── */
-    .post__date {
-      display: inline-block;
-      font-family: ui-monospace, 'SF Mono', 'Cascadia Code', Consolas, monospace;
-      font-size: 0.72rem;
-      letter-spacing: 0.22em;
-      text-transform: uppercase;
-      color: #38bdf8;
-      border: 1px solid rgba(147, 140, 129, 0.25);
-      border-radius: 999px;
-      padding: 4px 14px;
-      margin-bottom: 16px;
-    }
-
-    /* ── Lead ── */
-    .post__lead {
-      color: #9aa7b4;
-      font-size: 1.08rem;
-      line-height: 1.7;
-    }
-
-    /* ── H2: left accent bar ── */
-    .post__body h2 {
-      font-size: 1.7rem;
-      margin: 54px 0 6px;
-      padding-left: 16px;
-      border-left: 4px solid #c9935a;
-    }
-
-    /* ── H3: purple secondary accent ── */
-    .post__body h3 {
-      color: #a855f7;
-      font-size: 1.18rem;
-      margin: 30px 0 4px;
-    }
-
-    /* ── Inline code ── */
-    .post__body code {
-      font-family: ui-monospace, 'SF Mono', 'Cascadia Code', Consolas, monospace;
-      font-size: 0.86em;
-      background: #1c2230;
-      color: #9fe6c0;
-      padding: 2px 6px;
-      border-radius: 5px;
-      border: 1px solid #2a3340;
-    }
-
-    /* ── Code blocks (dark panel) ── */
-    .post__body pre {
-      background: #0a0e14;
-      border: 1px solid #2a3340;
-      border-radius: 10px;
-      padding: 16px 18px;
-      overflow-x: auto;
-      margin: 14px 0;
-      font-family: ui-monospace, 'SF Mono', 'Cascadia Code', Consolas, monospace;
-      font-size: 0.85rem;
-      line-height: 1.55;
-      color: #e6edf3;
-    }
-
-    /* Reset inline-code double-up inside pre */
-    .post__body pre code {
-      background: transparent;
-      padding: 0;
-      border: none;
-      font-size: inherit;
-      color: inherit;
-      border-radius: 0;
-    }
-
-    /* ── Blockquote callout (ready for future use; build.js does not emit blockquote yet) ── */
-    .post__body blockquote {
-      background: rgba(245, 158, 11, 0.08);
-      border: 1px solid rgba(245, 158, 11, 0.25);
-      border-left: 4px solid #f59e0b;
-      border-radius: 0 12px 12px 0;
-      padding: 16px 18px;
-      margin: 18px 0;
-      color: #f4e3c4;
-      font-size: 0.94rem;
-    }
-
-    /* ── Links ── */
-    .post__body a {
-      color: #38bdf8;
-    }
-
-    .post__body a:hover {
-      color: #a855f7;
-    }
-
-    /* ── Strong ── */
-    .post__body strong {
-      color: #f59e0b;
-    }
-
-    /* ── HR ── */
-    .post__body hr {
-      border: none;
-      border-top: 1px solid #2a3340;
-      margin: 32px 0;
-    }
-
-    /* ── Figures ── */
-    .post__body figure img {
-      border-radius: 12px;
-      border: 1px solid #2a3340;
-    }
-
-    .post__body figcaption {
-      color: #9aa7b4;
-      font-size: 0.85rem;
-      margin-top: 6px;
-    }
-
-    /* ── List spacing ── */
-    .post__body li {
-      margin: 4px 0;
-    }
-  </style>
-</head>
-<body>
-
-  <!-- NAV -->
-  <nav class="nav nav--scrolled" id="nav">
-    <a href="/" class="nav__logo" aria-label="Tinqs home">
-      <span class="nav__wordmark">TINQS</span>
-    </a>
-    <div class="nav__links">
-      <a href="/#game" class="nav__link">Games</a>
-      <a href="/#tech" class="nav__link">Technology</a>
-      <a href="/#about" class="nav__link">About</a>
-      <a href="/blog/" class="nav__link" style="color: var(--c-accent-l);">Blog</a>
-      <a href="/#signup" class="nav__link">Contact</a>
-      <a href="/press" class="nav__link">Press</a>
-    </div>
-    <button class="nav__burger" aria-label="Open menu" id="navBurger">
-      <span></span><span></span><span></span>
-    </button>
-  </nav>
-
-  <!-- MOBILE MENU -->
-  <div class="mobile-menu" id="mobileMenu">
-    <a href="/#game" class="mobile-menu__link">Games</a>
-    <a href="/#tech" class="mobile-menu__link">Technology</a>
-    <a href="/#about" class="mobile-menu__link">About</a>
-    <a href="/blog/" class="mobile-menu__link">Blog</a>
-    <a href="/#signup" class="mobile-menu__link">Contact</a>
-    <a href="/press" class="mobile-menu__link">Press</a>
-  </div>
-
-  <!-- POST -->
-  <article class="post">
-    <a href="/blog/" class="post__back">&larr; All Posts</a>
-    <span class="post__date">25 May 2026</span>
-    <h1 class="post__title">Pi as CI Integrator: Agents That Fix Their Own Builds</h1>
-    <p class="post__lead">Most coding agents have a dirty secret: they don't care if the code compiles. They write, they push, they walk away. The human discovers the broken build an hour later. We built a Pi extension that closes the loop &mdash; agents that watch CI, read failure logs, and fix their own mistakes.</p>
-
-    <div class="post__body">
-<h2>The Gap</h2>
-<p>Every agent demo looks the same. The AI writes code, commits, pushes. The presenter says "and now we have a pull request!" Cut. End of demo.</p>
-<p>What happens next? The CI pipeline runs. Tests fail. Linting screams. The build breaks because someone forgot an import. A human opens the PR, reads the red badge, clicks into the logs, finds the error, fixes it, pushes again. The agent did 90% of the work but left the last 10% &mdash; the most tedious part &mdash; for a person.</p>
-<p>We wanted agents that finish the job.</p>
-<h2>The tinqs-ci Extension</h2>
-<p>Our <a href="https://tinqs.com/tinqs/pi" style="color: var(&ndash;c-accent-l);">Pi fork</a> has a <code>tinqs-ci</code> extension &mdash; a single TypeScript file, about 200 lines &mdash; that gives the agent three tools:</p>
-<ul>
-  <li><strong>ci_status</strong> &mdash; checks the current pipeline state for a branch (pending, running, success, failure)</li>
-  <li><strong>ci_logs</strong> &mdash; fetches the full build log from the most recent failed run</li>
-  <li><strong>ci_wait</strong> &mdash; polls the pipeline every 15 seconds until it finishes, then returns the result</li>
-</ul>
-<p>These are Gitea Actions API calls under the hood. The agent authenticates with the same PAT it uses for git push. No extra credentials, no special CI service account.</p>
-<h2>The Loop</h2>
-<p>Here's what a Pi task looks like end to end:</p>
-<pre><code>Agent receives task brief
-  → reads codebase, plans approach
-  → writes code
-  → runs local tests (bash tool)
-  → commits and pushes branch
-  → calls ci_wait
-  → CI passes → opens PR via Gitea API
-  → CI fails → calls ci_logs
-  → reads error output
-  → fixes the issue
-  → pushes again
-  → calls ci_wait again
-  → repeats until green (max 3 retries)</code></pre>
-<p>The key is that <code>ci_logs</code> returns the raw build output &mdash; compiler errors, test failures, lint violations &mdash; as plain text in the agent's context. DeepSeek V4 is surprisingly good at reading build logs. It parses a Go compiler error, identifies the file and line, and fixes it. It reads a test assertion failure, understands what the test expected, and corrects the implementation.</p>
-<p>Three retries is the hard limit. If the agent can't fix it in three rounds, it opens the PR anyway with a comment explaining what failed and why. A human takes over from there. In practice, most failures resolve on the first retry &mdash; it's usually a missing import or a type mismatch.</p>
-<h2>What This Actually Looks Like</h2>
-<p>A real run from last week. The task: add a health check endpoint to a Go service.</p>
-<ul>
-  <li><strong>Turn 1:</strong> Agent reads the codebase, writes the handler and test, pushes. CI fails &mdash; the test imports a package that doesn't exist on the runner.</li>
-  <li><strong>Turn 2:</strong> Agent reads <code>ci_logs</code>, sees the <code>go: module not found</code> error, adds the missing <code>go.mod</code> replace directive, pushes. CI passes.</li>
-  <li><strong>Turn 3:</strong> Agent opens PR with passing checks.</li>
-</ul>
-<p>Total time: 4 minutes. Total cost: $0.06. No human touched the keyboard.</p>
-<p>Without the CI extension, this would have been a PR with a red badge and a Slack message saying "hey, the agent's PR is broken again." Someone would have context-switched, opened the logs, seen the trivial error, fixed it, and lost 20 minutes of flow state.</p>
-<h2>Why This Matters More Than You Think</h2>
-<p>CI integration isn't a feature. It's the difference between an agent that helps and an agent that creates work.</p>
-<p>An agent that pushes broken code is worse than no agent at all. It creates a false sense of progress &mdash; "the PR is up!" &mdash; while actually adding a task to someone's plate. Every broken PR is an interruption. Every interruption costs 15 minutes of context-switching.</p>
-<p>An agent that watches CI and fixes its own builds is genuinely autonomous. You submit a task, you walk away, you come back to a green PR ready for review. The agent handled the mechanical iteration that a human would have done anyway &mdash; the fix-push-wait-check cycle that eats hours of developer time every week.</p>
-<h2>The Guardrail Problem</h2>
-<p>Letting an agent retry its own builds sounds dangerous. What if it enters an infinite loop? What if it starts making increasingly wild changes to get the build to pass?</p>
-<p>Three safeguards:</p>
-<p><strong>Retry limit.</strong> Three attempts maximum. After that, the agent stops and reports. This is a hard limit in the orchestrator, not a suggestion to the model.</p>
-<p><strong>Diff budget.</strong> Each retry can only touch files that were already in the original changeset. The agent can't "fix" a build failure by rewriting the test suite or disabling the linter. If the fix requires touching new files, it fails and escalates.</p>
-<p><strong>Hallucination detection.</strong> The guardrail extension monitors every turn. If the agent claims "the build passed" without having called <code>ci_status</code> or <code>ci_wait</code>, it gets corrected. Agents are not allowed to guess the CI result.</p>
-<h2>The Numbers</h2>
-<p>Over three weeks of running the orchestrator:</p>
-<ul>
-  <li><strong>87 tasks</strong> completed end-to-end</li>
-  <li><strong>23 tasks</strong> needed at least one CI retry (26%)</li>
-  <li><strong>19 of those 23</strong> resolved on the first retry</li>
-  <li><strong>4 tasks</strong> hit the retry limit and escalated to a human</li>
-  <li><strong>0 tasks</strong> produced a merged PR that later broke something else</li>
-</ul>
-<p>The 26% retry rate tells you how often agents push code that doesn't build on the first try. That's not a bad number &mdash; it's the same rate you'd see from a junior developer. The difference is the agent fixes it in 30 seconds instead of 20 minutes.</p>
-<hr>
-<p><em>The CI extension is part of our <a href="https://tinqs.com/tinqs/pi" style="color: var(&ndash;c-accent-l);">Pi fork</a>, which runs inside <a href="https://tinqs.com" style="color: var(&ndash;c-accent-l);">Tinqs Studio</a> &mdash; a Gitea-based platform for game development with built-in AI agents. The whole thing is MIT licensed.</em></p>
-
-    </div>
-
-    <div class="post__author">
-      <div class="post__author-avatar">OB</div>
-      <div class="post__author-info">
-        <span class="post__author-name">Ozan Bozkurt</span><br>
-        CTO & Developer, Tinqs
-      </div>
-    </div>
-  </article>
-
-  <!-- FOOTER -->
-  <footer class="footer">
-    <div class="footer__inner">
-      <span class="footer__wordmark">TINQS</span>
-      <div class="footer__links">
-        <a href="/#game">Games</a>
-        <a href="/#tech">Technology</a>
-        <a href="/#about">About</a>
-        <a href="/blog/">Blog</a>
-        <a href="mailto:hello@tinqs.com">hello@tinqs.com</a>
-        <a href="/press">Press Kit</a>
-      </div>
-      <p class="footer__copy">Tinqs Limited &mdash; London, est. 2020</p>
-    </div>
-  </footer>
-
-  <script>
-    const burger = document.getElementById('navBurger');
-    const mobileMenu = document.getElementById('mobileMenu');
-    burger.addEventListener('click', () => {
-      const open = mobileMenu.classList.toggle('mobile-menu--open');
-      burger.classList.toggle('nav__burger--open', open);
-      document.body.style.overflow = open ? 'hidden' : '';
-    });
-    mobileMenu.querySelectorAll('a').forEach(link => {
-      link.addEventListener('click', () => {
-        mobileMenu.classList.remove('mobile-menu--open');
-        burger.classList.remove('nav__burger--open');
-        document.body.style.overflow = '';
-      });
-    });
-  </script>
-
-</body>
-</html>
diff --git a/pi-flow-native-brain.html b/pi-flow-native-brain.html
index c48b256..a12fb10 100644
--- a/pi-flow-native-brain.html
+++ b/pi-flow-native-brain.html
@@ -4,27 +4,27 @@
   <meta charset="UTF-8">
   <meta name="viewport" content="width=device-width, initial-scale=1.0">
 
-  <title>Retiring the Supervisor: Pi's Flow-Native Brain — Tinqs Blog</title>
-  <meta name="description" content="We deleted 1,050 lines of hardcoded supervisor logic and replaced it with oracle-backed pi-flows. The verify_build oracle now powers a gate-based pipeline that agents compose dynamically.">
+  <title>Pi's Flow-Native Brain: Retiring the Supervisor, Teaching Agents to Fix Their Own Builds — Tinqs Blog</title>
+  <meta name="description" content="We deleted 1,050 lines of hardcoded supervisor logic, replaced it with oracle-backed pi-flows, and gave agents the tools to watch CI and fix their own broken builds.">
   <meta name="robots" content="index, follow">
   <link rel="canonical" href="https://www.tinqs.com/blog/pi-flow-native-brain">
 
   <meta property="og:type" content="article">
   <meta property="og:url" content="https://www.tinqs.com/blog/pi-flow-native-brain">
-  <meta property="og:title" content="Retiring the Supervisor: Pi's Flow-Native Brain">
-  <meta property="og:description" content="Pi's standalone supervisor is gone — replaced by a flow-native brain with oracle-backed gates.">
+  <meta property="og:title" content="Pi's Flow-Native Brain: Retiring the Supervisor, Teaching Agents to Fix Their Own Builds">
+  <meta property="og:description" content="Pi's supervisor is gone — replaced by oracle-backed flows and CI-integrating agents that fix their own builds.">
   <meta property="og:image" content="https://www.tinqs.com/img/og-cover.jpg">
 
   <meta name="twitter:card" content="summary_large_image">
-  <meta name="twitter:title" content="Retiring the Supervisor: Pi's Flow-Native Brain">
-  <meta name="twitter:description" content="Pi's standalone supervisor is gone — replaced by a flow-native brain with oracle-backed gates.">
+  <meta name="twitter:title" content="Pi's Flow-Native Brain: Retiring the Supervisor, Teaching Agents to Fix Their Own Builds">
+  <meta name="twitter:description" content="Pi's supervisor is gone — replaced by oracle-backed flows and CI-integrating agents that fix their own builds.">
   <meta name="twitter:image" content="https://www.tinqs.com/img/og-cover.jpg">
 
   <script type="application/ld+json">
   {
     "@context": "https://schema.org",
     "@type": "BlogPosting",
-    "headline": "Retiring the Supervisor: Pi's Flow-Native Brain",
+    "headline": "Pi's Flow-Native Brain: Retiring the Supervisor, Teaching Agents to Fix Their Own Builds",
     "datePublished": "2026-06-03",
     "author": {
       "@type": "Person",
@@ -35,7 +35,7 @@
       "name": "Tinqs Limited",
       "url": "https://www.tinqs.com"
     },
-    "description": "We deleted 1,050 lines of hardcoded supervisor logic and replaced it with oracle-backed pi-flows. The verify_build oracle now powers a gate-based pipeline that agents compose dynamically."
+    "description": "We deleted 1,050 lines of hardcoded supervisor logic, replaced it with oracle-backed pi-flows, and gave agents the tools to watch CI and fix their own broken builds."
   }
   </script>
 
@@ -218,11 +218,13 @@
   <article class="post">
     <a href="/blog/" class="post__back">&larr; All Posts</a>
     <span class="post__date">3 June 2026</span>
-    <h1 class="post__title">Retiring the Supervisor: Pi's Flow-Native Brain</h1>
-    <p class="post__lead">The supervisor was 1,050 lines of TypeScript spread across 15 files — a hardcoded orchestration loop that ran contract-gated, verify-heavy sessions over isolated Pi processes. Today we deleted it. What replaced it is simpler, more flexible, and already passing real builds.</p>
+    <h1 class="post__title">Pi's Flow-Native Brain: Retiring the Supervisor, Teaching Agents to Fix Their Own Builds</h1>
+    <p class="post__lead">Two changes this week made Pi genuinely autonomous. First, we deleted 1,050 lines of hardcoded supervisor logic and replaced it with a flow-native brain — oracle-backed gates that agents compose dynamically. Second, we closed the loop: agents now watch CI, read failure logs, and fix their own broken builds until the pipeline goes green.</p>
 
     <div class="post__body">
-<h2>What the Supervisor Did</h2>
+<hr>
+<h2>Part 1: Retiring the Supervisor</h2>
+<h3>What the Supervisor Did</h3>
 <p>The <code>.pi/supervisor/</code> directory was the orchestration brain Pi left to us. For every task, it ran a fixed loop:</p>
 <p>1. <strong>Contract gate</strong> — skip-to-human if "done" wasn't programmatically verifiable</p>
 <p>2. <strong>TDAID phase A</strong> — a test-writer agent writes RED tests, never implementation</p>
@@ -230,11 +232,11 @@
 <p>4. <strong>Verification gate</strong> — run the build, check tests, pass or fail with a report</p>
 <p>It worked. It caught broken builds before they hit CI. It enforced the discipline of "define done before you start." But it had a structural problem: the loop was <strong>hardcoded</strong>. Every decision tree, every gate, every retry policy was baked into TypeScript. To change the workflow, you changed code. To add a new gate — vision QA, linting, asset validation — you added more code to the same monolithic loop.</p>
 <p>The supervisor was doing what <code>pi-flows</code> was designed to do, but from the wrong side of the architecture. Flows composes agents, gates, and decision points into pipelines. The supervisor reimplemented that logic in a single file. It was fighting the framework.</p>
-<h2>What Replaced It</h2>
+<h3>What Replaced It</h3>
 <p>The verify-heavy brain now runs <strong>as a pi-flows flow</strong> — a pipeline of oracle-backed gates orchestrated by the flow engine, visualized in FlowDashboard, and composable by agents themselves.</p>
 <p>The core pieces:</p>
 <ul>
-  <li><strong>Oracle-backed gates.</strong> The <code>verify_build</code> tool (<code>.pi/extensions/tinqs-verify.ts</code>) is the canonical gate. It compiles the game and sim, runs tests, and returns a structured PASS/FAIL verdict with file:line errors. Agents route through it; the gate decides whether to proceed.</li>
+  <li><strong>Oracle-backed gates.</strong> The <code>verify_build</code> tool is the canonical gate. It compiles the game and sim, runs tests, and returns a structured PASS/FAIL verdict with file:line errors. Agents route through it; the gate decides whether to proceed.</li>
   <li><strong>Agent-loop-decision Reflexion.</strong> Instead of a fixed two-phase TDAID loop, agents self-reflect on build failures. The flow engine gives them the failure report; they decide whether to fix and retry or escalate.</li>
   <li><strong>Role-split agents.</strong> G1 build, G2 tests, G3 behaviour (drives the live game), G4 feel (measured game-feel) and G5 visual (animation) are separate sub-agents, each with its own toolset and context, composed by the flow.</li>
 </ul>
@@ -267,16 +269,16 @@
     <text x="794" y="206" text-anchor="middle" fill="#d9ac7b" font-size="13.5">G5 · Visual</text>
     <line x1="475" y1="86" x2="475" y2="148" stroke="#5b6b7d" stroke-width="1.6" marker-end="url(#ah)"/>
     <line x1="460" y1="232" x2="460" y2="276" stroke="#5b6b7d" stroke-width="1.6" marker-end="url(#ah)"/>
-    <text x="472" y="258" fill="#6b7a8d" font-size="11">all green &#8658; done&#8195;&#183;&#8195;any fail &#8658; report</text>
+    <text x="472" y="258" fill="#6b7a8d" font-size="11">all green ⇒ done · any fail ⇒ report</text>
     <rect x="380" y="278" width="160" height="46" rx="9" fill="#1b1505" stroke="#c9935a"/>
-    <text x="460" y="306" text-anchor="middle" fill="#f3d6a0" font-size="15">Judge &#8212; honest verdict</text>
+    <text x="460" y="306" text-anchor="middle" fill="#f3d6a0" font-size="15">Judge — honest verdict</text>
     <path d="M820,150 C 908,96 716,50 556,61" fill="none" stroke="#f59e0b" stroke-width="1.8" stroke-dasharray="6 5" marker-end="url(#ahA)"/>
-    <text x="694" y="96" fill="#f59e0b" font-size="12.5">Reflexion &#183; fix &amp; retry &#8804; 3</text>
+    <text x="694" y="96" fill="#f59e0b" font-size="12.5">Reflexion · fix & retry ≤ 3</text>
   </svg>
   <figcaption style="color:#9aa7b4;font-size:0.85rem;margin-top:8px;">A real in-game failure loops back to <em>implement</em> with the gate evidence (bounded to three tries); anything green — or skipped because no live instance is running — falls through to a single honest judge.</figcaption>
 </figure>
 <p>It started as three gates — build, test, vision. Gates are cheap to add, so it grew: a feature now also passes a live-game <strong>behaviour</strong> probe and a measured <strong>feel</strong> check before the judge signs off. Critically, the flow is not fixed. Agents can add gates, reorder steps, or branch on conditions. The flow engine handles orchestration; the agents handle decisions.</p>
-<h2>What We Deleted</h2>
+<h3>What We Deleted</h3>
 <p>The commit removes 1,050 lines across 15 files:</p>
 <ul>
   <li><code>runner.ts</code> (115 lines) — the main orchestration loop</li>
@@ -290,29 +292,92 @@
 </ul>
 <figure style="margin:24px 0;">
   <svg viewBox="0 0 920 180" role="img" aria-label="Lines of code: 1,050 deleted versus about 300 kept" style="width:100%;height:auto;display:block;background:#0a0e14;border:1px solid #2a3340;border-radius:12px;font-family:'IBM Plex Sans',system-ui,sans-serif;">
-    <text x="40" y="34" fill="#9aa7b4" font-size="13">Net change: <tspan fill="#f59e0b" font-weight="600">&#8722;750 lines</tspan>, &#43; a composable pipeline</text>
+    <text x="40" y="34" fill="#9aa7b4" font-size="13">Net change: <tspan fill="#f59e0b" font-weight="600">−750 lines</tspan>, + a composable pipeline</text>
     <text x="40" y="76" fill="#f0816a" font-size="13">Deleted</text>
     <rect x="150" y="58" width="730" height="30" rx="6" fill="#2a1416" stroke="#f0816a" stroke-opacity="0.6"/>
-    <text x="868" y="78" text-anchor="end" fill="#f3b4a8" font-size="12.5">supervisor/ &#8212; 1,050 lines &#183; 15 files</text>
+    <text x="868" y="78" text-anchor="end" fill="#f3b4a8" font-size="12.5">supervisor/ — 1,050 lines · 15 files</text>
     <text x="40" y="136" fill="#34d399" font-size="13">Kept</text>
     <rect x="150" y="118" width="209" height="30" rx="6" fill="#0f2a22" stroke="#34d399" stroke-opacity="0.6"/>
-    <text x="369" y="138" fill="#9fe6c0" font-size="12.5">verify_build &#8212; ~300 lines &#183; 1 oracle</text>
+    <text x="369" y="138" fill="#9fe6c0" font-size="12.5">verify_build — ~300 lines · 1 oracle</text>
   </svg>
   <figcaption style="color:#9aa7b4;font-size:0.85rem;margin-top:8px;">The whole orchestration loop was deleted; only the build oracle survived — and it became the gate that powers the flow.</figcaption>
 </figure>
 <p>None of this was bad code. It was just the wrong layer. Flows gives us all of this — orchestration, state, gates, retry policy, event routing — as a framework primitive. We were maintaining a parallel implementation of something the framework already provided.</p>
 <p>The durable asset we kept: <code>verify_build</code>, the build oracle. It's now reused as the gate tool that powers the flow pipeline.</p>
-<h2>The Bug That Took a Day to Find</h2>
+<h3>The Bug That Took a Day to Find</h3>
 <p>Moving to flows exposed a subtle problem. Flow sub-agents run in their <strong>own extension stack</strong> — the main session's extensions don't reach them. The build-verifier and test-runner agents declared <code>verify_build</code> in their frontmatter, but the tool was never actually in their toolset.</p>
-<p>The symptom was confusing: the agents would report "oracle not available" and route to fail/report, silently skipping the test gate entirely. The build would pass, tests would never run, and the pipeline would report success. A false green.</p>
+<p>The symptom was confusing: agents reported "oracle not available" and routed to fail/report, silently skipping the test gate entirely. A false green — the build passed, tests never ran, and the pipeline reported success.</p>
 <p>The fix was a single pattern: emit <code>flow:register-tool</code> with the full tool definition at extension activation, and re-announce on <code>flow:rediscover</code>. The flow engine collects these into <code>getExtensionTools()</code> and hands them to every sub-agent that declares the tool. Three lines of orchestration, a day of debugging.</p>
 <p>Verified live: <code>game-check</code> now routes <code>context → build → build-gate(pass) → tests → tests-gate(pass) → vision</code>. Every gate fires. No false greens.</p>
-<h2>Why This Architecture Wins</h2>
+<h3>Why This Architecture Wins</h3>
 <p><strong>Composability.</strong> Agents can add gates without touching framework code. Want a linting gate? Add a sub-agent with a linter tool. Want a security scan? Same pattern. The flow engine handles routing; you just declare the gate.</p>
 <p><strong>Reusability.</strong> The <code>verify_build</code> oracle that powered the old supervisor now powers the flow gates. Same tool, same interface, different orchestration. No rewrite needed.</p>
 <p><strong>Observability.</strong> FlowDashboard visualizes the entire pipeline. You can see which gates passed, which failed, and where the agent decided to retry. The old supervisor logged to stdout.</p>
 <p><strong>Self-modification.</strong> Agents can read the flow graph, understand where they are in the pipeline, and decide what to do next. The supervisor's decision tree was opaque to the agents it was supervising. Flows makes the pipeline itself part of the agent's context.</p>
-<h2>The Stack Today</h2>
+<hr>
+<h2>Part 2: Agents That Fix Their Own Builds</h2>
+<p>Most coding agents have a dirty secret: they don't care if the code compiles. They write, they push, they walk away. The human discovers the broken build an hour later. The flow-native brain handles verification inside the pipeline — but what about after push? What about CI?</p>
+<h3>The Gap</h3>
+<p>Every agent demo looks the same. The AI writes code, commits, pushes. The presenter says "and now we have a pull request!" Cut. End of demo.</p>
+<p>What happens next? The CI pipeline runs. Tests fail. Linting screams. The build breaks because someone forgot an import. A human opens the PR, reads the red badge, clicks into the logs, finds the error, fixes it, pushes again. The agent did 90% of the work but left the last 10% — the most tedious part — for a person.</p>
+<p>We wanted agents that finish the job.</p>
+<h3>The tinqs-ci Extension</h3>
+<p>Our <a href="https://tinqs.com/tinqs/pi" style="color: var(&ndash;c-accent-l);">Pi fork</a> has a <code>tinqs-ci</code> extension — a single TypeScript file, about 200 lines — that gives the agent three tools:</p>
+<ul>
+  <li><strong>ci_status</strong> — checks the current pipeline state for a branch (pending, running, success, failure)</li>
+  <li><strong>ci_logs</strong> — fetches the full build log from the most recent failed run</li>
+  <li><strong>ci_wait</strong> — polls the pipeline every 15 seconds until it finishes, then returns the result</li>
+</ul>
+<p>These are Gitea Actions API calls under the hood. The agent authenticates with the same PAT it uses for git push. No extra credentials, no special CI service account.</p>
+<h3>The Loop</h3>
+<p>Here's what a Pi task looks like end to end:</p>
+<pre><code>Agent receives task brief
+  → reads codebase, plans approach
+  → writes code
+  → runs local tests (bash tool)
+  → commits and pushes branch
+  → calls ci_wait
+  → CI passes → opens PR via Gitea API
+  → CI fails → calls ci_logs
+  → reads error output
+  → fixes the issue
+  → pushes again
+  → calls ci_wait again
+  → repeats until green (max 3 retries)</code></pre>
+<p>The key is that <code>ci_logs</code> returns the raw build output — compiler errors, test failures, lint violations — as plain text in the agent's context. DeepSeek V4 is surprisingly good at reading build logs. It parses a Go compiler error, identifies the file and line, and fixes it. It reads a test assertion failure, understands what the test expected, and corrects the implementation.</p>
+<p>Three retries is the hard limit. If the agent can't fix it in three rounds, it opens the PR anyway with a comment explaining what failed and why. A human takes over from there. In practice, most failures resolve on the first retry — it's usually a missing import or a type mismatch.</p>
+<h3>A Real Run</h3>
+<p>Last week. The task: add a health check endpoint to a Go service.</p>
+<ul>
+  <li><strong>Turn 1:</strong> Agent reads the codebase, writes the handler and test, pushes. CI fails — the test imports a package that doesn't exist on the runner.</li>
+  <li><strong>Turn 2:</strong> Agent reads <code>ci_logs</code>, sees the <code>go: module not found</code> error, adds the missing <code>go.mod</code> replace directive, pushes. CI passes.</li>
+  <li><strong>Turn 3:</strong> Agent opens PR with passing checks.</li>
+</ul>
+<p>Total time: 4 minutes. Total cost: $0.06. No human touched the keyboard.</p>
+<p>Without the CI extension, this would have been a PR with a red badge and a Slack message saying "hey, the agent's PR is broken again." Someone would have context-switched, opened the logs, seen the trivial error, fixed it, and lost 20 minutes of flow state.</p>
+<h3>Why This Matters More Than You Think</h3>
+<p>CI integration isn't a feature. It's the difference between an agent that helps and an agent that creates work.</p>
+<p>An agent that pushes broken code is worse than no agent at all. It creates a false sense of progress — "the PR is up!" — while actually adding a task to someone's plate. Every broken PR is an interruption. Every interruption costs 15 minutes of context-switching.</p>
+<p>An agent that watches CI and fixes its own builds is genuinely autonomous. You submit a task, you walk away, you come back to a green PR ready for review. The agent handled the mechanical iteration that a human would have done anyway — the fix-push-wait-check cycle that eats hours of developer time every week.</p>
+<h3>The Guardrail Problem</h3>
+<p>Letting an agent retry its own builds sounds dangerous. What if it enters an infinite loop? What if it starts making increasingly wild changes to get the build to pass?</p>
+<p>Three safeguards:</p>
+<p><strong>Retry limit.</strong> Three attempts maximum. After that, the agent stops and reports. This is a hard limit in the orchestrator, not a suggestion to the model.</p>
+<p><strong>Diff budget.</strong> Each retry can only touch files that were already in the original changeset. The agent can't "fix" a build failure by rewriting the test suite or disabling the linter. If the fix requires touching new files, it fails and escalates.</p>
+<p><strong>Hallucination detection.</strong> The guardrail extension monitors every turn. If the agent claims "the build passed" without having called <code>ci_status</code> or <code>ci_wait</code>, it gets corrected. Agents are not allowed to guess the CI result.</p>
+<h3>The Numbers</h3>
+<p>Over three weeks of running the orchestrator:</p>
+<ul>
+  <li><strong>87 tasks</strong> completed end-to-end</li>
+  <li><strong>23 tasks</strong> needed at least one CI retry (26%)</li>
+  <li><strong>19 of those 23</strong> resolved on the first retry</li>
+  <li><strong>4 tasks</strong> hit the retry limit and escalated to a human</li>
+  <li><strong>0 tasks</strong> produced a merged PR that later broke something else</li>
+</ul>
+<p>The 26% retry rate tells you how often agents push code that doesn't build on the first try. That's not a bad number — it's the same rate you'd see from a junior developer. The difference is the agent fixes it in 30 seconds instead of 20 minutes.</p>
+<hr>
+<h2>Putting It Together: The Stack</h2>
+<p>The flow-native brain and the CI integrator are two sides of the same coin. The flow handles <strong>pre-push verification</strong> — did the code compile? do the tests pass? does the game behave correctly? The CI integrator handles <strong>post-push verification</strong> — did the CI pipeline agree? did anything break on the runner that didn't break locally?</p>
 <table style="width:100%;border-collapse:collapse;margin:18px 0;font-size:0.92rem;">
   <thead>
     <tr style="text-align:left;border-bottom:1px solid #2a3340;">
@@ -324,14 +389,15 @@
   <tbody>
     <tr style="border-bottom:1px solid #1c2230;"><td style="padding:9px 12px;color:#e6edf3;vertical-align:top;"><strong style="color:#f59e0b;">Flow engine</strong></td><td style="padding:9px 12px;color:#cdd7e2;vertical-align:top;">pi-flows orchestrator</td><td style="padding:9px 12px;color:#9aa7b4;vertical-align:top;">Composes agents, gates and decision points</td></tr>
     <tr style="border-bottom:1px solid #1c2230;"><td style="padding:9px 12px;color:#e6edf3;vertical-align:top;"><strong style="color:#f59e0b;">Gates</strong></td><td style="padding:9px 12px;color:#cdd7e2;vertical-align:top;">verify_build oracle</td><td style="padding:9px 12px;color:#9aa7b4;vertical-align:top;">Compiles, tests, returns PASS/FAIL with file:line errors</td></tr>
-    <tr style="border-bottom:1px solid #1c2230;"><td style="padding:9px 12px;color:#e6edf3;vertical-align:top;"><strong style="color:#f59e0b;">Sub-agents</strong></td><td style="padding:9px 12px;color:#cdd7e2;vertical-align:top;">G1 build &#183; G2 tests &#183; G3 behaviour &#183; G4 feel &#183; G5 visual</td><td style="padding:9px 12px;color:#9aa7b4;vertical-align:top;">Role-split, each with its own toolset</td></tr>
-    <tr style="border-bottom:1px solid #1c2230;"><td style="padding:9px 12px;color:#e6edf3;vertical-align:top;"><strong style="color:#f59e0b;">Decision</strong></td><td style="padding:9px 12px;color:#cdd7e2;vertical-align:top;">Agent-loop Reflexion</td><td style="padding:9px 12px;color:#9aa7b4;vertical-align:top;">Self-reflect on failures, retry (&#8804;3) or escalate</td></tr>
-    <tr><td style="padding:9px 12px;color:#e6edf3;vertical-align:top;"><strong style="color:#f59e0b;">Visualization</strong></td><td style="padding:9px 12px;color:#cdd7e2;vertical-align:top;">FlowDashboard</td><td style="padding:9px 12px;color:#9aa7b4;vertical-align:top;">Real-time pipeline state at localhost:33634</td></tr>
+    <tr style="border-bottom:1px solid #1c2230;"><td style="padding:9px 12px;color:#e6edf3;vertical-align:top;"><strong style="color:#f59e0b;">Sub-agents</strong></td><td style="padding:9px 12px;color:#cdd7e2;vertical-align:top;">G1 build · G2 tests · G3 behaviour · G4 feel · G5 visual</td><td style="padding:9px 12px;color:#9aa7b4;vertical-align:top;">Role-split, each with its own toolset</td></tr>
+    <tr style="border-bottom:1px solid #1c2230;"><td style="padding:9px 12px;color:#e6edf3;vertical-align:top;"><strong style="color:#f59e0b;">CI loop</strong></td><td style="padding:9px 12px;color:#cdd7e2;vertical-align:top;">tinqs-ci extension</td><td style="padding:9px 12px;color:#9aa7b4;vertical-align:top;">ci_status, ci_logs, ci_wait — polls Gitea Actions, reads logs, retries</td></tr>
+    <tr style="border-bottom:1px solid #1c2230;"><td style="padding:9px 12px;color:#e6edf3;vertical-align:top;"><strong style="color:#f59e0b;">Decision</strong></td><td style="padding:9px 12px;color:#cdd7e2;vertical-align:top;">Agent-loop Reflexion</td><td style="padding:9px 12px;color:#9aa7b4;vertical-align:top;">Self-reflect on failures, retry (≤3) or escalate</td></tr>
+    <tr><td style="padding:9px 12px;color:#e6edf3;vertical-align:top;"><strong style="color:#f59e0b;">Visualization</strong></td><td style="padding:9px 12px;color:#cdd7e2;vertical-align:top;">FlowDashboard</td><td style="padding:9px 12px;color:#9aa7b4;vertical-align:top;">Real-time pipeline state</td></tr>
   </tbody>
 </table>
 <hr>
-<p>The old supervisor was 1,050 lines of code that did one thing well: verify that agent output compiled and passed tests. The new flow-native brain does the same thing with less code, more flexibility, and a bug we'll never hit again. Sometimes the best commit is a deletion.</p>
-<p><em>The flow-native brain runs on our <a href="https://tinqs.com/tinqs/pi" style="color: var(&ndash;c-accent-l);">Pi fork</a> inside <a href="https://tinqs.com" style="color: var(&ndash;c-accent-l);">Tinqs Studio</a>. The verify_build extension is ~300 lines of TypeScript, MIT licensed, and reusable in any Pi project.</em></p>
+<p>The old supervisor was 1,050 lines of code that did one thing well: verify that agent output compiled and passed tests. The new system does the same thing with less code, more flexibility, composable gates, live CI integration, and a bug we'll never hit again. Sometimes the best commit is a deletion. Sometimes it's two.</p>
+<p><em>The flow-native brain and CI extension run on our <a href="https://tinqs.com/tinqs/pi" style="color: var(&ndash;c-accent-l);">Pi fork</a> inside <a href="https://tinqs.com" style="color: var(&ndash;c-accent-l);">Tinqs Studio</a>. The verify_build extension is ~300 lines of TypeScript, the tinqs-ci extension is ~200 lines — both MIT licensed and reusable in any Pi project.</em></p>
 
     </div>
 
diff --git a/posts/pi-ci-integrator.md b/posts/pi-ci-integrator.md
deleted file mode 100644
index b75b4af..0000000
--- a/posts/pi-ci-integrator.md
+++ /dev/null
@@ -1,103 +0,0 @@
----
-title: "Pi as CI Integrator: Agents That Fix Their Own Builds"
-slug: pi-ci-integrator
-date: "2026-05-25"
-description: "Most coding agents stop at git push. Our Pi fork watches CI, reads failure logs, and fixes its own code until the pipeline goes green."
-og_description: "Coding agents that watch CI and fix their own builds."
-og_image: "https://www.tinqs.com/img/og-cover.jpg"
-excerpt: "Most coding agents stop at git push. Our Pi fork watches CI, reads failure logs, and fixes its own code until the pipeline goes green."
-author: "Ozan Bozkurt"
-author_initials: "OB"
-author_role: "CTO & Developer, Tinqs"
----
-Most coding agents have a dirty secret: they don't care if the code compiles. They write, they push, they walk away. The human discovers the broken build an hour later. We built a Pi extension that closes the loop --- agents that watch CI, read failure logs, and fix their own mistakes.
-
-## The Gap
-
-Every agent demo looks the same. The AI writes code, commits, pushes. The presenter says "and now we have a pull request!" Cut. End of demo.
-
-What happens next? The CI pipeline runs. Tests fail. Linting screams. The build breaks because someone forgot an import. A human opens the PR, reads the red badge, clicks into the logs, finds the error, fixes it, pushes again. The agent did 90% of the work but left the last 10% --- the most tedious part --- for a person.
-
-We wanted agents that finish the job.
-
-## The tinqs-ci Extension
-
-Our [Pi fork](https://tinqs.com/tinqs/pi) has a `tinqs-ci` extension --- a single TypeScript file, about 200 lines --- that gives the agent three tools:
-
-- **ci_status** --- checks the current pipeline state for a branch (pending, running, success, failure)
-- **ci_logs** --- fetches the full build log from the most recent failed run
-- **ci_wait** --- polls the pipeline every 15 seconds until it finishes, then returns the result
-
-These are Gitea Actions API calls under the hood. The agent authenticates with the same PAT it uses for git push. No extra credentials, no special CI service account.
-
-## The Loop
-
-Here's what a Pi task looks like end to end:
-
-```
-Agent receives task brief
-  → reads codebase, plans approach
-  → writes code
-  → runs local tests (bash tool)
-  → commits and pushes branch
-  → calls ci_wait
-  → CI passes → opens PR via Gitea API
-  → CI fails → calls ci_logs
-  → reads error output
-  → fixes the issue
-  → pushes again
-  → calls ci_wait again
-  → repeats until green (max 3 retries)
-```
-
-The key is that `ci_logs` returns the raw build output --- compiler errors, test failures, lint violations --- as plain text in the agent's context. DeepSeek V4 is surprisingly good at reading build logs. It parses a Go compiler error, identifies the file and line, and fixes it. It reads a test assertion failure, understands what the test expected, and corrects the implementation.
-
-Three retries is the hard limit. If the agent can't fix it in three rounds, it opens the PR anyway with a comment explaining what failed and why. A human takes over from there. In practice, most failures resolve on the first retry --- it's usually a missing import or a type mismatch.
-
-## What This Actually Looks Like
-
-A real run from last week. The task: add a health check endpoint to a Go service.
-
-- **Turn 1:** Agent reads the codebase, writes the handler and test, pushes. CI fails --- the test imports a package that doesn't exist on the runner.
-- **Turn 2:** Agent reads `ci_logs`, sees the `go: module not found` error, adds the missing `go.mod` replace directive, pushes. CI passes.
-- **Turn 3:** Agent opens PR with passing checks.
-
-Total time: 4 minutes. Total cost: $0.06. No human touched the keyboard.
-
-Without the CI extension, this would have been a PR with a red badge and a Slack message saying "hey, the agent's PR is broken again." Someone would have context-switched, opened the logs, seen the trivial error, fixed it, and lost 20 minutes of flow state.
-
-## Why This Matters More Than You Think
-
-CI integration isn't a feature. It's the difference between an agent that helps and an agent that creates work.
-
-An agent that pushes broken code is worse than no agent at all. It creates a false sense of progress --- "the PR is up!" --- while actually adding a task to someone's plate. Every broken PR is an interruption. Every interruption costs 15 minutes of context-switching.
-
-An agent that watches CI and fixes its own builds is genuinely autonomous. You submit a task, you walk away, you come back to a green PR ready for review. The agent handled the mechanical iteration that a human would have done anyway --- the fix-push-wait-check cycle that eats hours of developer time every week.
-
-## The Guardrail Problem
-
-Letting an agent retry its own builds sounds dangerous. What if it enters an infinite loop? What if it starts making increasingly wild changes to get the build to pass?
-
-Three safeguards:
-
-**Retry limit.** Three attempts maximum. After that, the agent stops and reports. This is a hard limit in the orchestrator, not a suggestion to the model.
-
-**Diff budget.** Each retry can only touch files that were already in the original changeset. The agent can't "fix" a build failure by rewriting the test suite or disabling the linter. If the fix requires touching new files, it fails and escalates.
-
-**Hallucination detection.** The guardrail extension monitors every turn. If the agent claims "the build passed" without having called `ci_status` or `ci_wait`, it gets corrected. Agents are not allowed to guess the CI result.
-
-## The Numbers
-
-Over three weeks of running the orchestrator:
-
-- **87 tasks** completed end-to-end
-- **23 tasks** needed at least one CI retry (26%)
-- **19 of those 23** resolved on the first retry
-- **4 tasks** hit the retry limit and escalated to a human
-- **0 tasks** produced a merged PR that later broke something else
-
-The 26% retry rate tells you how often agents push code that doesn't build on the first try. That's not a bad number --- it's the same rate you'd see from a junior developer. The difference is the agent fixes it in 30 seconds instead of 20 minutes.
-
----
-
-*The CI extension is part of our [Pi fork](https://tinqs.com/tinqs/pi), which runs inside [Tinqs Studio](https://tinqs.com) --- a Gitea-based platform for game development with built-in AI agents. The whole thing is MIT licensed.*
diff --git a/posts/pi-flow-native-brain.md b/posts/pi-flow-native-brain.md
new file mode 100644
index 0000000..4a554e1
--- /dev/null
+++ b/posts/pi-flow-native-brain.md
@@ -0,0 +1,259 @@
+---
+title: "Pi's Flow-Native Brain: Retiring the Supervisor, Teaching Agents to Fix Their Own Builds"
+slug: pi-flow-native-brain
+date: "2026-06-03"
+description: "We deleted 1,050 lines of hardcoded supervisor logic, replaced it with oracle-backed pi-flows, and gave agents the tools to watch CI and fix their own broken builds."
+og_description: "Pi's supervisor is gone — replaced by oracle-backed flows and CI-integrating agents that fix their own builds."
+og_image: "https://www.tinqs.com/img/og-cover.jpg"
+excerpt: "Two changes made Pi genuinely autonomous: we deleted the hardcoded supervisor and replaced it with composable oracle-backed flows, and we taught agents to watch CI, read failure logs, and fix their own broken builds."
+author: "Ozan Bozkurt"
+author_initials: "OB"
+author_role: "CTO & Developer, Tinqs"
+---
+Two changes this week made Pi genuinely autonomous. First, we deleted 1,050 lines of hardcoded supervisor logic and replaced it with a flow-native brain — oracle-backed gates that agents compose dynamically. Second, we closed the loop: agents now watch CI, read failure logs, and fix their own broken builds until the pipeline goes green.
+
+---
+
+## Part 1: Retiring the Supervisor
+
+### What the Supervisor Did
+
+The `.pi/supervisor/` directory was the orchestration brain Pi left to us. For every task, it ran a fixed loop:
+
+1. **Contract gate** — skip-to-human if "done" wasn't programmatically verifiable
+2. **TDAID phase A** — a test-writer agent writes RED tests, never implementation
+3. **TDAID phase B** — a code-writer agent makes them green; on failure, a Reflexion follow-up retries (capped)
+4. **Verification gate** — run the build, check tests, pass or fail with a report
+
+It worked. It caught broken builds before they hit CI. It enforced the discipline of "define done before you start." But it had a structural problem: the loop was **hardcoded**. Every decision tree, every gate, every retry policy was baked into TypeScript. To change the workflow, you changed code. To add a new gate — vision QA, linting, asset validation — you added more code to the same monolithic loop.
+
+The supervisor was doing what `pi-flows` was designed to do, but from the wrong side of the architecture. Flows composes agents, gates, and decision points into pipelines. The supervisor reimplemented that logic in a single file. It was fighting the framework.
+
+### What Replaced It
+
+The verify-heavy brain now runs **as a pi-flows flow** — a pipeline of oracle-backed gates orchestrated by the flow engine, visualized in FlowDashboard, and composable by agents themselves.
+
+The core pieces:
+
+- **Oracle-backed gates.** The `verify_build` tool is the canonical gate. It compiles the game and sim, runs tests, and returns a structured PASS/FAIL verdict with file:line errors. Agents route through it; the gate decides whether to proceed.
+- **Agent-loop-decision Reflexion.** Instead of a fixed two-phase TDAID loop, agents self-reflect on build failures. The flow engine gives them the failure report; they decide whether to fix and retry or escalate.
+- **Role-split agents.** G1 build, G2 tests, G3 behaviour (drives the live game), G4 feel (measured game-feel) and G5 visual (animation) are separate sub-agents, each with its own toolset and context, composed by the flow.
+
+The result is a pipeline that flows naturally — a plan, an implementation, then a ladder of oracle-backed gates:
+
+<!--raw-->
+<figure style="margin:28px 0;">
+  <svg viewBox="0 0 920 350" role="img" aria-label="The verify-heavy flow: context, plan, implement, five gates, a Reflexion loop, and one judge" style="width:100%;height:auto;display:block;background:#0a0e14;border:1px solid #2a3340;border-radius:12px;font-family:'IBM Plex Sans',system-ui,sans-serif;">
+    <defs>
+      <marker id="ah" markerWidth="10" markerHeight="10" refX="7" refY="3.2" orient="auto"><path d="M0,0 L7,3.2 L0,6.4 Z" fill="#5b6b7d"/></marker>
+      <marker id="ahA" markerWidth="10" markerHeight="10" refX="7" refY="3.2" orient="auto"><path d="M0,0 L7,3.2 L0,6.4 Z" fill="#f59e0b"/></marker>
+    </defs>
+    <rect x="40" y="40" width="140" height="46" rx="9" fill="#121821" stroke="#2a3340"/>
+    <text x="110" y="68" text-anchor="middle" fill="#cdd7e2" font-size="15">Context</text>
+    <rect x="210" y="40" width="140" height="46" rx="9" fill="#121821" stroke="#2a3340"/>
+    <text x="280" y="68" text-anchor="middle" fill="#cdd7e2" font-size="15">Plan</text>
+    <rect x="400" y="40" width="150" height="46" rx="9" fill="#15202e" stroke="#3a4656"/>
+    <text x="475" y="68" text-anchor="middle" fill="#e6edf3" font-size="15">Implement</text>
+    <line x1="180" y1="63" x2="206" y2="63" stroke="#5b6b7d" stroke-width="1.6" marker-end="url(#ah)"/>
+    <line x1="350" y1="63" x2="396" y2="63" stroke="#5b6b7d" stroke-width="1.6" marker-end="url(#ah)"/>
+    <rect x="40" y="150" width="840" height="82" rx="12" fill="#0c1119" stroke="#2a3340"/>
+    <text x="56" y="171" fill="#6b7a8d" font-size="11" letter-spacing="1.4">VERIFY-HEAVY GATES — most compute is spent checking, not writing</text>
+    <rect x="56" y="180" width="148" height="42" rx="8" fill="#10141c" stroke="#38bdf8" stroke-opacity="0.55"/>
+    <text x="130" y="206" text-anchor="middle" fill="#38bdf8" font-size="13.5">G1 · Build</text>
+    <rect x="222" y="180" width="148" height="42" rx="8" fill="#10141c" stroke="#34d399" stroke-opacity="0.55"/>
+    <text x="296" y="206" text-anchor="middle" fill="#9fe6c0" font-size="13.5">G2 · Tests</text>
+    <rect x="388" y="180" width="148" height="42" rx="8" fill="#10141c" stroke="#a855f7" stroke-opacity="0.55"/>
+    <text x="462" y="206" text-anchor="middle" fill="#c4a0f7" font-size="13.5">G3 · Behaviour</text>
+    <rect x="554" y="180" width="148" height="42" rx="8" fill="#10141c" stroke="#f59e0b" stroke-opacity="0.55"/>
+    <text x="628" y="206" text-anchor="middle" fill="#f5b44b" font-size="13.5">G4 · Feel</text>
+    <rect x="720" y="180" width="148" height="42" rx="8" fill="#10141c" stroke="#c9935a" stroke-opacity="0.55"/>
+    <text x="794" y="206" text-anchor="middle" fill="#d9ac7b" font-size="13.5">G5 · Visual</text>
+    <line x1="475" y1="86" x2="475" y2="148" stroke="#5b6b7d" stroke-width="1.6" marker-end="url(#ah)"/>
+    <line x1="460" y1="232" x2="460" y2="276" stroke="#5b6b7d" stroke-width="1.6" marker-end="url(#ah)"/>
+    <text x="472" y="258" fill="#6b7a8d" font-size="11">all green ⇒ done · any fail ⇒ report</text>
+    <rect x="380" y="278" width="160" height="46" rx="9" fill="#1b1505" stroke="#c9935a"/>
+    <text x="460" y="306" text-anchor="middle" fill="#f3d6a0" font-size="15">Judge — honest verdict</text>
+    <path d="M820,150 C 908,96 716,50 556,61" fill="none" stroke="#f59e0b" stroke-width="1.8" stroke-dasharray="6 5" marker-end="url(#ahA)"/>
+    <text x="694" y="96" fill="#f59e0b" font-size="12.5">Reflexion · fix & retry ≤ 3</text>
+  </svg>
+  <figcaption style="color:#9aa7b4;font-size:0.85rem;margin-top:8px;">A real in-game failure loops back to <em>implement</em> with the gate evidence (bounded to three tries); anything green — or skipped because no live instance is running — falls through to a single honest judge.</figcaption>
+</figure>
+<!--/raw-->
+
+It started as three gates — build, test, vision. Gates are cheap to add, so it grew: a feature now also passes a live-game **behaviour** probe and a measured **feel** check before the judge signs off. Critically, the flow is not fixed. Agents can add gates, reorder steps, or branch on conditions. The flow engine handles orchestration; the agents handle decisions.
+
+### What We Deleted
+
+The commit removes 1,050 lines across 15 files:
+
+- `runner.ts` (115 lines) — the main orchestration loop
+- `supervisor.ts` (119 lines) — the state machine driving sessions
+- `gates.ts` (75 lines) — hardcoded gate definitions
+- `policy.ts` (92 lines) — retry limits and decision logic
+- `store.ts` (54 lines) — session state persistence
+- `types.ts` (76 lines) — type definitions for the whole system
+- `events.ts` (47 lines) — inter-process event bus
+- Plus tests, examples, and documentation
+
+<!--raw-->
+<figure style="margin:24px 0;">
+  <svg viewBox="0 0 920 180" role="img" aria-label="Lines of code: 1,050 deleted versus about 300 kept" style="width:100%;height:auto;display:block;background:#0a0e14;border:1px solid #2a3340;border-radius:12px;font-family:'IBM Plex Sans',system-ui,sans-serif;">
+    <text x="40" y="34" fill="#9aa7b4" font-size="13">Net change: <tspan fill="#f59e0b" font-weight="600">−750 lines</tspan>, + a composable pipeline</text>
+    <text x="40" y="76" fill="#f0816a" font-size="13">Deleted</text>
+    <rect x="150" y="58" width="730" height="30" rx="6" fill="#2a1416" stroke="#f0816a" stroke-opacity="0.6"/>
+    <text x="868" y="78" text-anchor="end" fill="#f3b4a8" font-size="12.5">supervisor/ — 1,050 lines · 15 files</text>
+    <text x="40" y="136" fill="#34d399" font-size="13">Kept</text>
+    <rect x="150" y="118" width="209" height="30" rx="6" fill="#0f2a22" stroke="#34d399" stroke-opacity="0.6"/>
+    <text x="369" y="138" fill="#9fe6c0" font-size="12.5">verify_build — ~300 lines · 1 oracle</text>
+  </svg>
+  <figcaption style="color:#9aa7b4;font-size:0.85rem;margin-top:8px;">The whole orchestration loop was deleted; only the build oracle survived — and it became the gate that powers the flow.</figcaption>
+</figure>
+<!--/raw-->
+
+None of this was bad code. It was just the wrong layer. Flows gives us all of this — orchestration, state, gates, retry policy, event routing — as a framework primitive. We were maintaining a parallel implementation of something the framework already provided.
+
+The durable asset we kept: `verify_build`, the build oracle. It's now reused as the gate tool that powers the flow pipeline.
+
+### The Bug That Took a Day to Find
+
+Moving to flows exposed a subtle problem. Flow sub-agents run in their **own extension stack** — the main session's extensions don't reach them. The build-verifier and test-runner agents declared `verify_build` in their frontmatter, but the tool was never actually in their toolset.
+
+The symptom was confusing: agents reported "oracle not available" and routed to fail/report, silently skipping the test gate entirely. A false green — the build passed, tests never ran, and the pipeline reported success.
+
+The fix was a single pattern: emit `flow:register-tool` with the full tool definition at extension activation, and re-announce on `flow:rediscover`. The flow engine collects these into `getExtensionTools()` and hands them to every sub-agent that declares the tool. Three lines of orchestration, a day of debugging.
+
+Verified live: `game-check` now routes `context → build → build-gate(pass) → tests → tests-gate(pass) → vision`. Every gate fires. No false greens.
+
+### Why This Architecture Wins
+
+**Composability.** Agents can add gates without touching framework code. Want a linting gate? Add a sub-agent with a linter tool. Want a security scan? Same pattern. The flow engine handles routing; you just declare the gate.
+
+**Reusability.** The `verify_build` oracle that powered the old supervisor now powers the flow gates. Same tool, same interface, different orchestration. No rewrite needed.
+
+**Observability.** FlowDashboard visualizes the entire pipeline. You can see which gates passed, which failed, and where the agent decided to retry. The old supervisor logged to stdout.
+
+**Self-modification.** Agents can read the flow graph, understand where they are in the pipeline, and decide what to do next. The supervisor's decision tree was opaque to the agents it was supervising. Flows makes the pipeline itself part of the agent's context.
+
+---
+
+## Part 2: Agents That Fix Their Own Builds
+
+Most coding agents have a dirty secret: they don't care if the code compiles. They write, they push, they walk away. The human discovers the broken build an hour later. The flow-native brain handles verification inside the pipeline — but what about after push? What about CI?
+
+### The Gap
+
+Every agent demo looks the same. The AI writes code, commits, pushes. The presenter says "and now we have a pull request!" Cut. End of demo.
+
+What happens next? The CI pipeline runs. Tests fail. Linting screams. The build breaks because someone forgot an import. A human opens the PR, reads the red badge, clicks into the logs, finds the error, fixes it, pushes again. The agent did 90% of the work but left the last 10% — the most tedious part — for a person.
+
+We wanted agents that finish the job.
+
+### The tinqs-ci Extension
+
+Our [Pi fork](https://tinqs.com/tinqs/pi) has a `tinqs-ci` extension — a single TypeScript file, about 200 lines — that gives the agent three tools:
+
+- **ci_status** — checks the current pipeline state for a branch (pending, running, success, failure)
+- **ci_logs** — fetches the full build log from the most recent failed run
+- **ci_wait** — polls the pipeline every 15 seconds until it finishes, then returns the result
+
+These are Gitea Actions API calls under the hood. The agent authenticates with the same PAT it uses for git push. No extra credentials, no special CI service account.
+
+### The Loop
+
+Here's what a Pi task looks like end to end:
+
+```
+Agent receives task brief
+  → reads codebase, plans approach
+  → writes code
+  → runs local tests (bash tool)
+  → commits and pushes branch
+  → calls ci_wait
+  → CI passes → opens PR via Gitea API
+  → CI fails → calls ci_logs
+  → reads error output
+  → fixes the issue
+  → pushes again
+  → calls ci_wait again
+  → repeats until green (max 3 retries)
+```
+
+The key is that `ci_logs` returns the raw build output — compiler errors, test failures, lint violations — as plain text in the agent's context. DeepSeek V4 is surprisingly good at reading build logs. It parses a Go compiler error, identifies the file and line, and fixes it. It reads a test assertion failure, understands what the test expected, and corrects the implementation.
+
+Three retries is the hard limit. If the agent can't fix it in three rounds, it opens the PR anyway with a comment explaining what failed and why. A human takes over from there. In practice, most failures resolve on the first retry — it's usually a missing import or a type mismatch.
+
+### A Real Run
+
+Last week. The task: add a health check endpoint to a Go service.
+
+- **Turn 1:** Agent reads the codebase, writes the handler and test, pushes. CI fails — the test imports a package that doesn't exist on the runner.
+- **Turn 2:** Agent reads `ci_logs`, sees the `go: module not found` error, adds the missing `go.mod` replace directive, pushes. CI passes.
+- **Turn 3:** Agent opens PR with passing checks.
+
+Total time: 4 minutes. Total cost: $0.06. No human touched the keyboard.
+
+Without the CI extension, this would have been a PR with a red badge and a Slack message saying "hey, the agent's PR is broken again." Someone would have context-switched, opened the logs, seen the trivial error, fixed it, and lost 20 minutes of flow state.
+
+### Why This Matters More Than You Think
+
+CI integration isn't a feature. It's the difference between an agent that helps and an agent that creates work.
+
+An agent that pushes broken code is worse than no agent at all. It creates a false sense of progress — "the PR is up!" — while actually adding a task to someone's plate. Every broken PR is an interruption. Every interruption costs 15 minutes of context-switching.
+
+An agent that watches CI and fixes its own builds is genuinely autonomous. You submit a task, you walk away, you come back to a green PR ready for review. The agent handled the mechanical iteration that a human would have done anyway — the fix-push-wait-check cycle that eats hours of developer time every week.
+
+### The Guardrail Problem
+
+Letting an agent retry its own builds sounds dangerous. What if it enters an infinite loop? What if it starts making increasingly wild changes to get the build to pass?
+
+Three safeguards:
+
+**Retry limit.** Three attempts maximum. After that, the agent stops and reports. This is a hard limit in the orchestrator, not a suggestion to the model.
+
+**Diff budget.** Each retry can only touch files that were already in the original changeset. The agent can't "fix" a build failure by rewriting the test suite or disabling the linter. If the fix requires touching new files, it fails and escalates.
+
+**Hallucination detection.** The guardrail extension monitors every turn. If the agent claims "the build passed" without having called `ci_status` or `ci_wait`, it gets corrected. Agents are not allowed to guess the CI result.
+
+### The Numbers
+
+Over three weeks of running the orchestrator:
+
+- **87 tasks** completed end-to-end
+- **23 tasks** needed at least one CI retry (26%)
+- **19 of those 23** resolved on the first retry
+- **4 tasks** hit the retry limit and escalated to a human
+- **0 tasks** produced a merged PR that later broke something else
+
+The 26% retry rate tells you how often agents push code that doesn't build on the first try. That's not a bad number — it's the same rate you'd see from a junior developer. The difference is the agent fixes it in 30 seconds instead of 20 minutes.
+
+---
+
+## Putting It Together: The Stack
+
+The flow-native brain and the CI integrator are two sides of the same coin. The flow handles **pre-push verification** — did the code compile? do the tests pass? does the game behave correctly? The CI integrator handles **post-push verification** — did the CI pipeline agree? did anything break on the runner that didn't break locally?
+
+<!--raw-->
+<table style="width:100%;border-collapse:collapse;margin:18px 0;font-size:0.92rem;">
+  <thead>
+    <tr style="text-align:left;border-bottom:1px solid #2a3340;">
+      <th style="padding:10px 12px;color:#c9935a;font-weight:600;">Layer</th>
+      <th style="padding:10px 12px;color:#c9935a;font-weight:600;">What</th>
+      <th style="padding:10px 12px;color:#c9935a;font-weight:600;">How</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr style="border-bottom:1px solid #1c2230;"><td style="padding:9px 12px;color:#e6edf3;vertical-align:top;"><strong style="color:#f59e0b;">Flow engine</strong></td><td style="padding:9px 12px;color:#cdd7e2;vertical-align:top;">pi-flows orchestrator</td><td style="padding:9px 12px;color:#9aa7b4;vertical-align:top;">Composes agents, gates and decision points</td></tr>
+    <tr style="border-bottom:1px solid #1c2230;"><td style="padding:9px 12px;color:#e6edf3;vertical-align:top;"><strong style="color:#f59e0b;">Gates</strong></td><td style="padding:9px 12px;color:#cdd7e2;vertical-align:top;">verify_build oracle</td><td style="padding:9px 12px;color:#9aa7b4;vertical-align:top;">Compiles, tests, returns PASS/FAIL with file:line errors</td></tr>
+    <tr style="border-bottom:1px solid #1c2230;"><td style="padding:9px 12px;color:#e6edf3;vertical-align:top;"><strong style="color:#f59e0b;">Sub-agents</strong></td><td style="padding:9px 12px;color:#cdd7e2;vertical-align:top;">G1 build · G2 tests · G3 behaviour · G4 feel · G5 visual</td><td style="padding:9px 12px;color:#9aa7b4;vertical-align:top;">Role-split, each with its own toolset</td></tr>
+    <tr style="border-bottom:1px solid #1c2230;"><td style="padding:9px 12px;color:#e6edf3;vertical-align:top;"><strong style="color:#f59e0b;">CI loop</strong></td><td style="padding:9px 12px;color:#cdd7e2;vertical-align:top;">tinqs-ci extension</td><td style="padding:9px 12px;color:#9aa7b4;vertical-align:top;">ci_status, ci_logs, ci_wait — polls Gitea Actions, reads logs, retries</td></tr>
+    <tr style="border-bottom:1px solid #1c2230;"><td style="padding:9px 12px;color:#e6edf3;vertical-align:top;"><strong style="color:#f59e0b;">Decision</strong></td><td style="padding:9px 12px;color:#cdd7e2;vertical-align:top;">Agent-loop Reflexion</td><td style="padding:9px 12px;color:#9aa7b4;vertical-align:top;">Self-reflect on failures, retry (≤3) or escalate</td></tr>
+    <tr><td style="padding:9px 12px;color:#e6edf3;vertical-align:top;"><strong style="color:#f59e0b;">Visualization</strong></td><td style="padding:9px 12px;color:#cdd7e2;vertical-align:top;">FlowDashboard</td><td style="padding:9px 12px;color:#9aa7b4;vertical-align:top;">Real-time pipeline state</td></tr>
+  </tbody>
+</table>
+<!--/raw-->
+
+---
+
+The old supervisor was 1,050 lines of code that did one thing well: verify that agent output compiled and passed tests. The new system does the same thing with less code, more flexibility, composable gates, live CI integration, and a bug we'll never hit again. Sometimes the best commit is a deletion. Sometimes it's two.
+
+*The flow-native brain and CI extension run on our [Pi fork](https://tinqs.com/tinqs/pi) inside [Tinqs Studio](https://tinqs.com). The verify_build extension is ~300 lines of TypeScript, the tinqs-ci extension is ~200 lines — both MIT licensed and reusable in any Pi project.*

Flow engine	pi-flows orchestrator	Composes agents, gates and decision points
Gates	verify_build oracle	Compiles, tests, returns PASS/FAIL with file:line errors
Sub-agents	G1 build · G2 tests · G3 behaviour · G4 feel · G5 visual	Role-split, each with its own toolset
Decision	Agent-loop Reflexion	Self-reflect on failures, retry (≤3) or escalate
Visualization	FlowDashboard	Real-time pipeline state at localhost:33634
Sub-agents	G1 build · G2 tests · G3 behaviour · G4 feel · G5 visual	Role-split, each with its own toolset
CI loop	tinqs-ci extension	ci_status, ci_logs, ci_wait — polls Gitea Actions, reads logs, retries
Decision	Agent-loop Reflexion	Self-reflect on failures, retry (≤3) or escalate
Visualization	FlowDashboard	Real-time pipeline state