<title>Pi as CI Integrator: Agents That Fix Their Own Builds — Tinqs Blog</title>
<metaname="description"content="Most coding agents stop at git push. Our Pi fork watches CI, reads failure logs, and fixes its own code until the pipeline goes green.">
<ahref="/blog/"class="post__back">← All Posts</a>
<spanclass="post__date">25 May 2026</span>
<h1class="post__title">Pi as CI Integrator: Agents That Fix Their Own Builds</h1>
<pclass="post__lead">Most coding agents have a dirty secret: they don't care if the code compiles. They write, they push, they walk away. The human discovers the broken build an hour later. We built a Pi extension that closes the loop — agents that watch CI, read failure logs, and fix their own mistakes.</p>
<divclass="post__body">
<h2>The Gap</h2>
<p>Every agent demo looks the same. The AI writes code, commits, pushes. The presenter says "and now we have a pull request!" Cut. End of demo.</p>
<p>What happens next? The CI pipeline runs. Tests fail. Linting screams. The build breaks because someone forgot an import. A human opens the PR, reads the red badge, clicks into the logs, finds the error, fixes it, pushes again. The agent did 90% of the work but left the last 10% — the most tedious part — for a person.</p>
<p>We wanted agents that finish the job.</p>
<h2>The tinqs-ci Extension</h2>
<p>Our <ahref="https://tinqs.com/tinqs/pi"style="color: var(–c-accent-l);">Pi fork</a> has a <code>tinqs-ci</code> extension — a single TypeScript file, about 200 lines — that gives the agent three tools:</p>
<ul>
<li><strong>ci_status</strong>— checks the current pipeline state for a branch (pending, running, success, failure)</li>
<li><strong>ci_logs</strong>— fetches the full build log from the most recent failed run</li>
<li><strong>ci_wait</strong>— polls the pipeline every 15 seconds until it finishes, then returns the result</li>
</ul>
<p>These are Gitea Actions API calls under the hood. The agent authenticates with the same PAT it uses for git push. No extra credentials, no special CI service account.</p>
<h2>The Loop</h2>
<p>Here's what a Pi task looks like end to end:</p>
<pre><code>Agent receives task brief
→ reads codebase, plans approach
→ writes code
→ runs local tests (bash tool)
→ commits and pushes branch
→ calls ci_wait
→ CI passes → opens PR via Gitea API
→ CI fails → calls ci_logs
→ reads error output
→ fixes the issue
→ pushes again
→ calls ci_wait again
→ repeats until green (max 3 retries)</code></pre>
<p>The key is that <code>ci_logs</code> returns the raw build output — compiler errors, test failures, lint violations — as plain text in the agent's context. DeepSeek V4 is surprisingly good at reading build logs. It parses a Go compiler error, identifies the file and line, and fixes it. It reads a test assertion failure, understands what the test expected, and corrects the implementation.</p>
<p>Three retries is the hard limit. If the agent can't fix it in three rounds, it opens the PR anyway with a comment explaining what failed and why. A human takes over from there. In practice, most failures resolve on the first retry — it's usually a missing import or a type mismatch.</p>
<h2>What This Actually Looks Like</h2>
<p>A real run from last week. The task: add a health check endpoint to a Go service.</p>
<ul>
<li><strong>Turn 1:</strong> Agent reads the codebase, writes the handler and test, pushes. CI fails — the test imports a package that doesn't exist on the runner.</li>
<li><strong>Turn 2:</strong> Agent reads <code>ci_logs</code>, sees the <code>go: module not found</code> error, adds the missing <code>go.mod</code> replace directive, pushes. CI passes.</li>
<li><strong>Turn 3:</strong> Agent opens PR with passing checks.</li>
</ul>
<p>Total time: 4 minutes. Total cost: $0.06. No human touched the keyboard.</p>
<p>Without the CI extension, this would have been a PR with a red badge and a Slack message saying "hey, the agent's PR is broken again." Someone would have context-switched, opened the logs, seen the trivial error, fixed it, and lost 20 minutes of flow state.</p>
<h2>Why This Matters More Than You Think</h2>
<p>CI integration isn't a feature. It's the difference between an agent that helps and an agent that creates work.</p>
<p>An agent that pushes broken code is worse than no agent at all. It creates a false sense of progress — "the PR is up!" — while actually adding a task to someone's plate. Every broken PR is an interruption. Every interruption costs 15 minutes of context-switching.</p>
<p>An agent that watches CI and fixes its own builds is genuinely autonomous. You submit a task, you walk away, you come back to a green PR ready for review. The agent handled the mechanical iteration that a human would have done anyway — the fix-push-wait-check cycle that eats hours of developer time every week.</p>
<h2>The Guardrail Problem</h2>
<p>Letting an agent retry its own builds sounds dangerous. What if it enters an infinite loop? What if it starts making increasingly wild changes to get the build to pass?</p>
<p>Three safeguards:</p>
<p><strong>Retry limit.</strong> Three attempts maximum. After that, the agent stops and reports. This is a hard limit in the orchestrator, not a suggestion to the model.</p>
<p><strong>Diff budget.</strong> Each retry can only touch files that were already in the original changeset. The agent can't "fix" a build failure by rewriting the test suite or disabling the linter. If the fix requires touching new files, it fails and escalates.</p>
<p><strong>Hallucination detection.</strong> The guardrail extension monitors every turn. If the agent claims "the build passed" without having called <code>ci_status</code> or <code>ci_wait</code>, it gets corrected. Agents are not allowed to guess the CI result.</p>
<h2>The Numbers</h2>
<p>Over three weeks of running the orchestrator:</p>
<li><strong>23 tasks</strong> needed at least one CI retry (26%)</li>
<li><strong>19 of those 23</strong> resolved on the first retry</li>
<li><strong>4 tasks</strong> hit the retry limit and escalated to a human</li>
<li><strong>0 tasks</strong> produced a merged PR that later broke something else</li>
</ul>
<p>The 26% retry rate tells you how often agents push code that doesn't build on the first try. That's not a bad number — it's the same rate you'd see from a junior developer. The difference is the agent fixes it in 30 seconds instead of 20 minutes.</p>
<hr>
<p><em>The CI extension is part of our <ahref="https://tinqs.com/tinqs/pi"style="color: var(–c-accent-l);">Pi fork</a>, which runs inside <ahref="https://tinqs.com"style="color: var(–c-accent-l);">Tinqs Studio</a>— a Gitea-based platform for game development with built-in AI agents. The whole thing is MIT licensed.</em></p>