posts/pi-ci-integrator.md

---
title: "Pi as CI Integrator: Agents That Fix Their Own Builds"
slug: pi-ci-integrator
date: "2026-05-25"
description: "Most coding agents stop at git push. Our Pi fork watches CI, reads failure logs, and fixes its own code until the pipeline goes green."
og_description: "Coding agents that watch CI and fix their own builds."
og_image: "https://www.tinqs.com/img/og-cover.jpg"
excerpt: "Most coding agents stop at git push. Our Pi fork watches CI, reads failure logs, and fixes its own code until the pipeline goes green."
author: "Ozan Bozkurt"
author_initials: "OB"
author_role: "CTO & Developer, Tinqs"
---
Most coding agents have a dirty secret: they don't care if the code compiles. They write, they push, they walk away. The human discovers the broken build an hour later. We built a Pi extension that closes the loop --- agents that watch CI, read failure logs, and fix their own mistakes.

## The Gap

Every agent demo looks the same. The AI writes code, commits, pushes. The presenter says "and now we have a pull request!" Cut. End of demo.

What happens next? The CI pipeline runs. Tests fail. Linting screams. The build breaks because someone forgot an import. A human opens the PR, reads the red badge, clicks into the logs, finds the error, fixes it, pushes again. The agent did 90% of the work but left the last 10% --- the most tedious part --- for a person.

We wanted agents that finish the job.

## The tinqs-ci Extension

Our [Pi fork](https://tinqs.com/tinqs/pi) has a `tinqs-ci` extension --- a single TypeScript file, about 200 lines --- that gives the agent three tools:

- **ci_status** --- checks the current pipeline state for a branch (pending, running, success, failure)
- **ci_logs** --- fetches the full build log from the most recent failed run
- **ci_wait** --- polls the pipeline every 15 seconds until it finishes, then returns the result

These are Gitea Actions API calls under the hood. The agent authenticates with the same PAT it uses for git push. No extra credentials, no special CI service account.

## The Loop

Here's what a Pi task looks like end to end:

```
Agent receives task brief
  → reads codebase, plans approach
  → writes code
  → runs local tests (bash tool)
  → commits and pushes branch
  → calls ci_wait
  → CI passes → opens PR via Gitea API
  → CI fails → calls ci_logs
  → reads error output
  → fixes the issue
  → pushes again
  → calls ci_wait again
  → repeats until green (max 3 retries)
```

The key is that `ci_logs` returns the raw build output --- compiler errors, test failures, lint violations --- as plain text in the agent's context. DeepSeek V4 is surprisingly good at reading build logs. It parses a Go compiler error, identifies the file and line, and fixes it. It reads a test assertion failure, understands what the test expected, and corrects the implementation.

Three retries is the hard limit. If the agent can't fix it in three rounds, it opens the PR anyway with a comment explaining what failed and why. A human takes over from there. In practice, most failures resolve on the first retry --- it's usually a missing import or a type mismatch.

## What This Actually Looks Like

A real run from last week. The task: add a health check endpoint to a Go service.

- **Turn 1:** Agent reads the codebase, writes the handler and test, pushes. CI fails --- the test imports a package that doesn't exist on the runner.
- **Turn 2:** Agent reads `ci_logs`, sees the `go: module not found` error, adds the missing `go.mod` replace directive, pushes. CI passes.
- **Turn 3:** Agent opens PR with passing checks.

Total time: 4 minutes. Total cost: $0.06. No human touched the keyboard.

Without the CI extension, this would have been a PR with a red badge and a Slack message saying "hey, the agent's PR is broken again." Someone would have context-switched, opened the logs, seen the trivial error, fixed it, and lost 20 minutes of flow state.

## Why This Matters More Than You Think

CI integration isn't a feature. It's the difference between an agent that helps and an agent that creates work.

An agent that pushes broken code is worse than no agent at all. It creates a false sense of progress --- "the PR is up!" --- while actually adding a task to someone's plate. Every broken PR is an interruption. Every interruption costs 15 minutes of context-switching.

An agent that watches CI and fixes its own builds is genuinely autonomous. You submit a task, you walk away, you come back to a green PR ready for review. The agent handled the mechanical iteration that a human would have done anyway --- the fix-push-wait-check cycle that eats hours of developer time every week.

## The Guardrail Problem

Letting an agent retry its own builds sounds dangerous. What if it enters an infinite loop? What if it starts making increasingly wild changes to get the build to pass?

Three safeguards:

**Retry limit.** Three attempts maximum. After that, the agent stops and reports. This is a hard limit in the orchestrator, not a suggestion to the model.

**Diff budget.** Each retry can only touch files that were already in the original changeset. The agent can't "fix" a build failure by rewriting the test suite or disabling the linter. If the fix requires touching new files, it fails and escalates.

**Hallucination detection.** The guardrail extension monitors every turn. If the agent claims "the build passed" without having called `ci_status` or `ci_wait`, it gets corrected. Agents are not allowed to guess the CI result.

## The Numbers

Over three weeks of running the orchestrator:

- **87 tasks** completed end-to-end
- **23 tasks** needed at least one CI retry (26%)
- **19 of those 23** resolved on the first retry
- **4 tasks** hit the retry limit and escalated to a human
- **0 tasks** produced a merged PR that later broke something else

The 26% retry rate tells you how often agents push code that doesn't build on the first try. That's not a bad number --- it's the same rate you'd see from a junior developer. The difference is the agent fixes it in 30 seconds instead of 20 minutes.

---

*The CI extension is part of our [Pi fork](https://tinqs.com/tinqs/pi), which runs inside [Tinqs Studio](https://tinqs.com) --- a Gitea-based platform for game development with built-in AI agents. The whole thing is MIT licensed.*
feat: blog build system + all HTML generated by Pi agent 2026-05-26 11:12:08 +01:00			`---`
			`title: "Pi as CI Integrator: Agents That Fix Their Own Builds"`
			`slug: pi-ci-integrator`
			`date: "2026-05-25"`
			`description: "Most coding agents stop at git push. Our Pi fork watches CI, reads failure logs, and fixes its own code until the pipeline goes green."`
			`og_description: "Coding agents that watch CI and fix their own builds."`
			`og_image: "https://www.tinqs.com/img/og-cover.jpg"`
			`excerpt: "Most coding agents stop at git push. Our Pi fork watches CI, reads failure logs, and fixes its own code until the pipeline goes green."`
			`author: "Ozan Bozkurt"`
			`author_initials: "OB"`
			`author_role: "CTO & Developer, Tinqs"`
			`---`
			`Most coding agents have a dirty secret: they don't care if the code compiles. They write, they push, they walk away. The human discovers the broken build an hour later. We built a Pi extension that closes the loop --- agents that watch CI, read failure logs, and fix their own mistakes.`

			`## The Gap`

			`Every agent demo looks the same. The AI writes code, commits, pushes. The presenter says "and now we have a pull request!" Cut. End of demo.`

			`What happens next? The CI pipeline runs. Tests fail. Linting screams. The build breaks because someone forgot an import. A human opens the PR, reads the red badge, clicks into the logs, finds the error, fixes it, pushes again. The agent did 90% of the work but left the last 10% --- the most tedious part --- for a person.`

			`We wanted agents that finish the job.`

			`## The tinqs-ci Extension`

			Our [Pi fork](https://tinqs.com/tinqs/pi) has a `tinqs-ci` extension --- a single TypeScript file, about 200 lines --- that gives the agent three tools:

			`- ci_status --- checks the current pipeline state for a branch (pending, running, success, failure)`
			`- ci_logs --- fetches the full build log from the most recent failed run`
			`- ci_wait --- polls the pipeline every 15 seconds until it finishes, then returns the result`

			`These are Gitea Actions API calls under the hood. The agent authenticates with the same PAT it uses for git push. No extra credentials, no special CI service account.`

			`## The Loop`

			`Here's what a Pi task looks like end to end:`

			```
			`Agent receives task brief`
			`→ reads codebase, plans approach`
			`→ writes code`
			`→ runs local tests (bash tool)`
			`→ commits and pushes branch`
			`→ calls ci_wait`
			`→ CI passes → opens PR via Gitea API`
			`→ CI fails → calls ci_logs`
			`→ reads error output`
			`→ fixes the issue`
			`→ pushes again`
			`→ calls ci_wait again`
			`→ repeats until green (max 3 retries)`
			```

			The key is that `ci_logs` returns the raw build output --- compiler errors, test failures, lint violations --- as plain text in the agent's context. DeepSeek V4 is surprisingly good at reading build logs. It parses a Go compiler error, identifies the file and line, and fixes it. It reads a test assertion failure, understands what the test expected, and corrects the implementation.

			`Three retries is the hard limit. If the agent can't fix it in three rounds, it opens the PR anyway with a comment explaining what failed and why. A human takes over from there. In practice, most failures resolve on the first retry --- it's usually a missing import or a type mismatch.`

			`## What This Actually Looks Like`

			`A real run from last week. The task: add a health check endpoint to a Go service.`

			`- Turn 1: Agent reads the codebase, writes the handler and test, pushes. CI fails --- the test imports a package that doesn't exist on the runner.`
			- Turn 2: Agent reads `ci_logs`, sees the `go: module not found` error, adds the missing `go.mod` replace directive, pushes. CI passes.
			`- Turn 3: Agent opens PR with passing checks.`

			`Total time: 4 minutes. Total cost: $0.06. No human touched the keyboard.`

			`Without the CI extension, this would have been a PR with a red badge and a Slack message saying "hey, the agent's PR is broken again." Someone would have context-switched, opened the logs, seen the trivial error, fixed it, and lost 20 minutes of flow state.`

			`## Why This Matters More Than You Think`

			`CI integration isn't a feature. It's the difference between an agent that helps and an agent that creates work.`

			`An agent that pushes broken code is worse than no agent at all. It creates a false sense of progress --- "the PR is up!" --- while actually adding a task to someone's plate. Every broken PR is an interruption. Every interruption costs 15 minutes of context-switching.`

			`An agent that watches CI and fixes its own builds is genuinely autonomous. You submit a task, you walk away, you come back to a green PR ready for review. The agent handled the mechanical iteration that a human would have done anyway --- the fix-push-wait-check cycle that eats hours of developer time every week.`

			`## The Guardrail Problem`

			`Letting an agent retry its own builds sounds dangerous. What if it enters an infinite loop? What if it starts making increasingly wild changes to get the build to pass?`

			`Three safeguards:`

			`Retry limit. Three attempts maximum. After that, the agent stops and reports. This is a hard limit in the orchestrator, not a suggestion to the model.`

			`Diff budget. Each retry can only touch files that were already in the original changeset. The agent can't "fix" a build failure by rewriting the test suite or disabling the linter. If the fix requires touching new files, it fails and escalates.`

			Hallucination detection. The guardrail extension monitors every turn. If the agent claims "the build passed" without having called `ci_status` or `ci_wait`, it gets corrected. Agents are not allowed to guess the CI result.

			`## The Numbers`

			`Over three weeks of running the orchestrator:`

			`- 87 tasks completed end-to-end`
			`- 23 tasks needed at least one CI retry (26%)`
			`- 19 of those 23 resolved on the first retry`
			`- 4 tasks hit the retry limit and escalated to a human`
			`- 0 tasks produced a merged PR that later broke something else`

			`The 26% retry rate tells you how often agents push code that doesn't build on the first try. That's not a bad number --- it's the same rate you'd see from a junior developer. The difference is the agent fixes it in 30 seconds instead of 20 minutes.`

			`---`

			`The CI extension is part of our [Pi fork](https://tinqs.com/tinqs/pi), which runs inside [Tinqs Studio](https://tinqs.com) --- a Gitea-based platform for game development with built-in AI agents. The whole thing is MIT licensed.`