104 lines
6.2 KiB
Markdown
104 lines
6.2 KiB
Markdown
|
|
---
|
||
|
|
title: "Pi as CI Integrator: Agents That Fix Their Own Builds"
|
||
|
|
slug: pi-ci-integrator
|
||
|
|
date: "2026-05-25"
|
||
|
|
description: "Most coding agents stop at git push. Our Pi fork watches CI, reads failure logs, and fixes its own code until the pipeline goes green."
|
||
|
|
og_description: "Coding agents that watch CI and fix their own builds."
|
||
|
|
og_image: "https://www.tinqs.com/img/og-cover.jpg"
|
||
|
|
excerpt: "Most coding agents stop at git push. Our Pi fork watches CI, reads failure logs, and fixes its own code until the pipeline goes green."
|
||
|
|
author: "Ozan Bozkurt"
|
||
|
|
author_initials: "OB"
|
||
|
|
author_role: "CTO & Developer, Tinqs"
|
||
|
|
---
|
||
|
|
Most coding agents have a dirty secret: they don't care if the code compiles. They write, they push, they walk away. The human discovers the broken build an hour later. We built a Pi extension that closes the loop --- agents that watch CI, read failure logs, and fix their own mistakes.
|
||
|
|
|
||
|
|
## The Gap
|
||
|
|
|
||
|
|
Every agent demo looks the same. The AI writes code, commits, pushes. The presenter says "and now we have a pull request!" Cut. End of demo.
|
||
|
|
|
||
|
|
What happens next? The CI pipeline runs. Tests fail. Linting screams. The build breaks because someone forgot an import. A human opens the PR, reads the red badge, clicks into the logs, finds the error, fixes it, pushes again. The agent did 90% of the work but left the last 10% --- the most tedious part --- for a person.
|
||
|
|
|
||
|
|
We wanted agents that finish the job.
|
||
|
|
|
||
|
|
## The tinqs-ci Extension
|
||
|
|
|
||
|
|
Our [Pi fork](https://tinqs.com/tinqs/pi) has a `tinqs-ci` extension --- a single TypeScript file, about 200 lines --- that gives the agent three tools:
|
||
|
|
|
||
|
|
- **ci_status** --- checks the current pipeline state for a branch (pending, running, success, failure)
|
||
|
|
- **ci_logs** --- fetches the full build log from the most recent failed run
|
||
|
|
- **ci_wait** --- polls the pipeline every 15 seconds until it finishes, then returns the result
|
||
|
|
|
||
|
|
These are Gitea Actions API calls under the hood. The agent authenticates with the same PAT it uses for git push. No extra credentials, no special CI service account.
|
||
|
|
|
||
|
|
## The Loop
|
||
|
|
|
||
|
|
Here's what a Pi task looks like end to end:
|
||
|
|
|
||
|
|
```
|
||
|
|
Agent receives task brief
|
||
|
|
→ reads codebase, plans approach
|
||
|
|
→ writes code
|
||
|
|
→ runs local tests (bash tool)
|
||
|
|
→ commits and pushes branch
|
||
|
|
→ calls ci_wait
|
||
|
|
→ CI passes → opens PR via Gitea API
|
||
|
|
→ CI fails → calls ci_logs
|
||
|
|
→ reads error output
|
||
|
|
→ fixes the issue
|
||
|
|
→ pushes again
|
||
|
|
→ calls ci_wait again
|
||
|
|
→ repeats until green (max 3 retries)
|
||
|
|
```
|
||
|
|
|
||
|
|
The key is that `ci_logs` returns the raw build output --- compiler errors, test failures, lint violations --- as plain text in the agent's context. DeepSeek V4 is surprisingly good at reading build logs. It parses a Go compiler error, identifies the file and line, and fixes it. It reads a test assertion failure, understands what the test expected, and corrects the implementation.
|
||
|
|
|
||
|
|
Three retries is the hard limit. If the agent can't fix it in three rounds, it opens the PR anyway with a comment explaining what failed and why. A human takes over from there. In practice, most failures resolve on the first retry --- it's usually a missing import or a type mismatch.
|
||
|
|
|
||
|
|
## What This Actually Looks Like
|
||
|
|
|
||
|
|
A real run from last week. The task: add a health check endpoint to a Go service.
|
||
|
|
|
||
|
|
- **Turn 1:** Agent reads the codebase, writes the handler and test, pushes. CI fails --- the test imports a package that doesn't exist on the runner.
|
||
|
|
- **Turn 2:** Agent reads `ci_logs`, sees the `go: module not found` error, adds the missing `go.mod` replace directive, pushes. CI passes.
|
||
|
|
- **Turn 3:** Agent opens PR with passing checks.
|
||
|
|
|
||
|
|
Total time: 4 minutes. Total cost: $0.06. No human touched the keyboard.
|
||
|
|
|
||
|
|
Without the CI extension, this would have been a PR with a red badge and a Slack message saying "hey, the agent's PR is broken again." Someone would have context-switched, opened the logs, seen the trivial error, fixed it, and lost 20 minutes of flow state.
|
||
|
|
|
||
|
|
## Why This Matters More Than You Think
|
||
|
|
|
||
|
|
CI integration isn't a feature. It's the difference between an agent that helps and an agent that creates work.
|
||
|
|
|
||
|
|
An agent that pushes broken code is worse than no agent at all. It creates a false sense of progress --- "the PR is up!" --- while actually adding a task to someone's plate. Every broken PR is an interruption. Every interruption costs 15 minutes of context-switching.
|
||
|
|
|
||
|
|
An agent that watches CI and fixes its own builds is genuinely autonomous. You submit a task, you walk away, you come back to a green PR ready for review. The agent handled the mechanical iteration that a human would have done anyway --- the fix-push-wait-check cycle that eats hours of developer time every week.
|
||
|
|
|
||
|
|
## The Guardrail Problem
|
||
|
|
|
||
|
|
Letting an agent retry its own builds sounds dangerous. What if it enters an infinite loop? What if it starts making increasingly wild changes to get the build to pass?
|
||
|
|
|
||
|
|
Three safeguards:
|
||
|
|
|
||
|
|
**Retry limit.** Three attempts maximum. After that, the agent stops and reports. This is a hard limit in the orchestrator, not a suggestion to the model.
|
||
|
|
|
||
|
|
**Diff budget.** Each retry can only touch files that were already in the original changeset. The agent can't "fix" a build failure by rewriting the test suite or disabling the linter. If the fix requires touching new files, it fails and escalates.
|
||
|
|
|
||
|
|
**Hallucination detection.** The guardrail extension monitors every turn. If the agent claims "the build passed" without having called `ci_status` or `ci_wait`, it gets corrected. Agents are not allowed to guess the CI result.
|
||
|
|
|
||
|
|
## The Numbers
|
||
|
|
|
||
|
|
Over three weeks of running the orchestrator:
|
||
|
|
|
||
|
|
- **87 tasks** completed end-to-end
|
||
|
|
- **23 tasks** needed at least one CI retry (26%)
|
||
|
|
- **19 of those 23** resolved on the first retry
|
||
|
|
- **4 tasks** hit the retry limit and escalated to a human
|
||
|
|
- **0 tasks** produced a merged PR that later broke something else
|
||
|
|
|
||
|
|
The 26% retry rate tells you how often agents push code that doesn't build on the first try. That's not a bad number --- it's the same rate you'd see from a junior developer. The difference is the agent fixes it in 30 seconds instead of 20 minutes.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
*The CI extension is part of our [Pi fork](https://tinqs.com/tinqs/pi), which runs inside [Tinqs Studio](https://tinqs.com) --- a Gitea-based platform for game development with built-in AI agents. The whole thing is MIT licensed.*
|