feat: blog build system + all HTML generated by Pi agent
build.js + templates copied from docs, 11 posts built to 14 HTML files. Generated by local Pi orchestrator task. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,156 @@
|
||||
---
|
||||
title: "Building a Cloud Agent Harness with DeepSeek V4 and Pi"
|
||||
slug: cloud-harness
|
||||
date: "2026-05-26"
|
||||
description: "We forked Pi, merged a browser dashboard into the monorepo, and built a Go orchestrator inside our Gitea fork. Agents code overnight for about $0.80 — and you can watch them from localhost:33634."
|
||||
og_description: "Pi fork, merged agent dashboard, and a Go orchestrator inside Tinqs Studio."
|
||||
og_image: "https://www.tinqs.com/blog/img/cloud-harness-architecture.png"
|
||||
excerpt: "We forked Pi, merged a browser dashboard into the monorepo, and built a Go orchestrator inside our Gitea fork. Agents code overnight for about $0.80 — and you can watch them from the browser."
|
||||
author: "Ozan Bozkurt"
|
||||
author_initials: "OB"
|
||||
author_role: "CTO & Developer, Tinqs"
|
||||
---
|
||||
We spent a few sessions building something that still barely exists elsewhere: a cloud agent harness where AI coding agents are first-class citizens of the platform, not bolt-on tools. The stack is a [Pi fork](https://tinqs.com/tinqs/pi) for the brain, a Go orchestrator inside our [Gitea fork](https://tinqs.com/tinqs/studio) for overnight work, and a browser dashboard merged into Pi for the daytime. Here is how it fits together.
|
||||
|
||||
## The Problem
|
||||
|
||||
Every coding agent today — Claude Code, Codex, Pi, Aider — runs in your terminal. You watch it work. You close the laptop, it stops. There is no way to say "build these eight features overnight" and wake up to pull requests.
|
||||
|
||||
We wanted exactly that. Not a coding assistant. An autonomous workforce — with a UI when a human needs to be in the loop.
|
||||
|
||||
## Why Not Just Use Claude Code or Codex?
|
||||
|
||||
**Cost.** Claude Code runs on Opus at $15/MTok output. Codex uses GPT 5.5. Running eight agents overnight on either would cost $50–200. DeepSeek V4 Flash costs $0.28/MTok output. Eight overnight tasks: **about $0.80**.
|
||||
|
||||
**Control.** Cloud tools are black boxes. We cannot add a Gitea API tool, a fal.ai image generator, or a guardrail that blocks `aws ec2 terminate-instances`. With our own harness, we add an extension and it is live.
|
||||
|
||||
**Platform.** We are building [Tinqs Studio](https://tinqs.com) — a Gitea-based game development platform. Agents are not a feature we want to outsource. They are the product.
|
||||
|
||||
## Pi — The Agent Brain
|
||||
|
||||
[Pi](https://pi.dev) is an open-source coding agent by Mario Zechner. MIT license, TypeScript, minimal by design — four core tools (read, write, edit, bash) and an extension system.
|
||||
|
||||
We [forked it](https://tinqs.com/tinqs/pi). Not to rewrite the core — to add first-party extensions:
|
||||
|
||||
- **tinqs-provider** — routes DeepSeek V4 Flash and Pro through our inference proxy
|
||||
- **tinqs-tools** — Gitea REST API, fal.ai image generation, Amazon Nova Lite vision
|
||||
- **tinqs-ci** — reads CI pipeline status, logs, and polls for completion
|
||||
- **tinqs-guardrail** — 29 safety patterns that block dangerous operations
|
||||
|
||||
Each extension is a single TypeScript file. No extra npm dependencies on the extension side.
|
||||
|
||||
Pi has four output modes. The one that matters for automation is **RPC** — a headless process that accepts JSON on stdin/stdout. That is how the orchestrator drives it.
|
||||
|
||||
## DeepSeek V4 — The LLM
|
||||
|
||||
DeepSeek V4 Flash through our own inference proxy. OpenAI-compatible API, so Pi treats it like any other provider. The proxy adds:
|
||||
|
||||
- Redis job queue (10 concurrent workers)
|
||||
- Per-user usage tracking
|
||||
- System prompt injection for cache hit optimization
|
||||
- Gitea PAT authentication (same token as git push)
|
||||
|
||||
Cost per task: **$0.02–0.10** depending on complexity.
|
||||
|
||||
## Go Orchestrator — Overnight Batch Work
|
||||
|
||||
Inside `tinqs/studio` we added `modules/agents/` — a Go worker pool that:
|
||||
|
||||
- Spawns Pi with `--mode rpc --no-session`
|
||||
- Tracks task lifecycle (pending → running → done)
|
||||
- Streams events over **SSE** to any connected UI
|
||||
- Enforces guardrails at the platform layer (worker limits, timeouts)
|
||||
|
||||
Six HTTP endpoints, same auth as git push:
|
||||
|
||||
```
|
||||
POST /api/v1/agents/tasks — submit a task
|
||||
GET /api/v1/agents/tasks — list all tasks
|
||||
GET /api/v1/agents/tasks/{id} — get task details
|
||||
DELETE /api/v1/agents/tasks/{id} — stop a task
|
||||
GET /api/v1/agents/stream — SSE live events
|
||||
GET /api/v1/agents/health — orchestrator status
|
||||
```
|
||||
|
||||
We considered bolting on a separate orchestration SaaS and rejected it. The orchestrator lives in the same binary as git — same auth, no extra service to deploy.
|
||||
|
||||
The intended loop:
|
||||
|
||||
```
|
||||
Orchestrator reads task brief
|
||||
→ spawns pi --mode rpc
|
||||
→ Pi writes code using DeepSeek V4
|
||||
→ Pi pushes branch, calls ci_wait
|
||||
→ CI green → Pi opens PR via gitea_api
|
||||
→ CI red → Pi reads ci_logs, fixes, retries
|
||||
→ Human reviews PR, merges
|
||||
```
|
||||
|
||||
Git worktree integration and full push/PR automation are still being wired; the API and worker pool already run locally.
|
||||
|
||||
## Pi Dashboard — Browser UI (Shipped)
|
||||
|
||||
The cloud orchestrator is for batch work while you sleep. During the day you want to see agents, chat with them, and spawn sessions without living in a terminal.
|
||||
|
||||
We merged [pi-agent-dashboard](https://github.com/BlackBeltTechnology/pi-agent-dashboard) into the Pi monorepo — not as a second repo to install. One checkout, one command:
|
||||
|
||||
```bash
|
||||
npm run dashboard:dev
|
||||
```
|
||||
|
||||
Open **http://localhost:33634**. You get:
|
||||
|
||||
- **Live session streaming** — watch tool calls and model output in real time
|
||||
- **Interactive chat** — send prompts, answer `ask_user` dialogs from the browser
|
||||
- **Session spawning** — start Pi in any pinned project folder
|
||||
- **Cost tracking** — per-session token usage when using Tinqs inference
|
||||
- **Plugins** — flows, subagents, workspace helpers
|
||||
|
||||
The dashboard talks to Pi sessions over a WebSocket bridge on port **9999**. Inference uses the same Tinqs proxy as the CLI — register a custom provider in `~/.pi/agent/providers.json` and authenticate with your existing `tstudio` token. No separate LLM API keys.
|
||||
|
||||
```
|
||||
Dashboard (localhost:33634)
|
||||
↕ WebSocket (port 9999)
|
||||
Pi sessions (interactive or headless)
|
||||
↕ OpenAI-compatible API
|
||||
Tinqs Studio proxy (tinqs.com/api/v1/ai)
|
||||
↕ DeepSeek V4 Flash / Pro
|
||||
```
|
||||
|
||||
When Studio runs locally with agents enabled, the dashboard can also talk to the orchestrator API on port 3000 — submit tasks and watch SSE events in the same UI.
|
||||
|
||||
One browser tab for daytime work; the orchestrator queue for overnight runs.
|
||||
|
||||
## The Guardrail
|
||||
|
||||
Our biggest fear: an agent hallucinating instead of using tools, or running `aws ec2 terminate-instances` at 3 AM.
|
||||
|
||||
The guardrail extension monitors every agent turn:
|
||||
|
||||
**Hallucination detection** — if the agent claims file contents without calling `read`, it gets corrected.
|
||||
|
||||
**No-tool drift** — three consecutive turns without a tool call triggers a warning.
|
||||
|
||||
**Command blocking** — 29 patterns covering destructive git, AWS teardown, process killing, and production API abuse.
|
||||
|
||||
## What It Cost to Build
|
||||
|
||||
A few focused sessions: about 2,000 lines of Go, 900 lines of TypeScript extensions, 52 tests, plus merging the dashboard packages into the Pi monorepo. No new servers — Pi is a Node subprocess; the dashboard is another Node process on your machine.
|
||||
|
||||
## What Is Next
|
||||
|
||||
| Piece | Status |
|
||||
|-------|--------|
|
||||
| Pi fork + tinqs extensions | Shipped |
|
||||
| Dashboard merged into Pi monorepo | Shipped |
|
||||
| Go orchestrator + REST/SSE API | MVP, running locally |
|
||||
| Git worktree + push + PR loop | In progress |
|
||||
| Domain routing (game / sim / platform tasks) | Designed |
|
||||
|
||||
Next we are promoting studio skills from IDE playbooks into orchestrator prompt packs — so the same Pi worker behaves like a game builder, sim maintainer, or platform engineer depending on the task. Specialized agents (planner, reviewer, asset pipeline) sit on top of this foundation.
|
||||
|
||||
The harness — inference proxy, guardrails, dashboard, orchestrator API — is in place. The work now is feeding it real tasks and hardening the git loop.
|
||||
|
||||
---
|
||||
|
||||
*Tinqs Studio is an open platform for game development — git hosting, AI inference, asset generation, and autonomous agents. We are building [Ariki](https://arikigame.com), a survival colony sim, using the same tools we ship.*
|
||||
@@ -0,0 +1,113 @@
|
||||
---
|
||||
title: "Fork, Don't Build: The Age of Agents Doesn't Need New Tools"
|
||||
slug: fork-dont-build
|
||||
date: "2026-05-25"
|
||||
description: "Everyone is building new AI developer tools. We forked three existing ones --- Gitea, Pi, Godot --- and modified them from the inside. Here's why that's the better bet."
|
||||
og_description: "Fork Gitea. Fork Pi. Fork Godot. Modify platforms, don't build toys."
|
||||
og_image: "https://www.tinqs.com/img/og-cover.jpg"
|
||||
excerpt: "Everyone is building new AI developer tools. We forked three existing ones and modified them from the inside. Here's why that's the better bet."
|
||||
author: "Ozan Bozkurt"
|
||||
author_initials: "OB"
|
||||
author_role: "CTO & Developer, Tinqs"
|
||||
---
|
||||
The AI developer tools space has a problem: everyone is building new things. New agents, new IDEs, new platforms, new wrappers around GPT. Meanwhile, the tools that actually run the world --- git servers, game engines, CI runners --- sit there unchanged, waiting for someone to open them up and let agents in. We chose to fork instead of build. Three times. Here's why.
|
||||
|
||||
## The Pattern
|
||||
|
||||
We're a four-person game studio. We don't have time to build a git platform, a coding agent, and a game engine from scratch. Nobody does. But we can take something that already works --- something with years of battle-testing, thousands of contributors, and millions of users --- and change it from the inside.
|
||||
|
||||
The pattern is simple:
|
||||
|
||||
1. Find an open-source tool that does 95% of what you need
|
||||
2. Fork it
|
||||
3. Add the 5% that makes it yours
|
||||
4. Stay close to upstream so you get their fixes for free
|
||||
|
||||
We've done this three times.
|
||||
|
||||
## Fork 1: Gitea --- Our Git Platform
|
||||
|
||||
[Gitea](https://gitea.com) is a self-hosted git server. Single Go binary, MIT license, 45k GitHub stars. It handles repos, issues, pull requests, CI, LFS --- everything a team needs.
|
||||
|
||||
We [forked it](https://tinqs.com/tinqs/studio) and built Tinqs Studio. Our changes:
|
||||
|
||||
- **3D asset preview** --- click a `.glb` file and rotate the model in your browser
|
||||
- **HTML file preview** --- rendered in a sandboxed iframe, not raw source
|
||||
- **Agent API** --- six endpoints that let AI agents submit tasks, push code, and open PRs
|
||||
- **OAuth2 SSO** --- one login for git, the game, and every tool
|
||||
- **Credits system** --- monetize AI inference without hiding features behind paywalls
|
||||
|
||||
Total lines changed from upstream: about 2,000 out of Gitea's 500,000. That's 0.4%. We modify templates, add Go modules, and tweak CSS variables. We never touch the database schema --- we ride upstream's migrations. When Gitea releases 1.27, we rebase, fix conflicts, and ship.
|
||||
|
||||
The alternative was building a git platform from scratch. That's a multi-year, multi-million dollar project. Or using GitHub/GitLab and accepting their limitations. Neither option gives you the ability to embed AI agents directly into the platform.
|
||||
|
||||
## Fork 2: Pi --- Our Agent Runtime
|
||||
|
||||
[Pi](https://pi.dev) is an open-source coding agent. 51k stars, MIT license, TypeScript. Four core tools (read, write, edit, bash), a minimal system prompt, and an extension system.
|
||||
|
||||
We [forked it](https://tinqs.com/tinqs/pi) and added four extensions:
|
||||
|
||||
- **tinqs-provider** --- routes inference through our DeepSeek V4 proxy ($0.28/MTok vs Opus at $15/MTok)
|
||||
- **tinqs-tools** --- Gitea API, fal.ai image generation, vision preprocessing
|
||||
- **tinqs-ci** --- reads CI pipeline status and logs, polls for completion
|
||||
- **tinqs-guardrail** --- 29 safety patterns blocking dangerous commands
|
||||
|
||||
Each extension is a single TypeScript file. No npm dependencies. The core Pi code is untouched --- we only add files.
|
||||
|
||||
The alternative was building our own agent from scratch. That means writing tool-calling logic, context management, streaming, retry handling, conversation threading --- months of work to reinvent what Pi already does. Or using Claude Code / Codex as a black box and accepting that you can't add a Gitea API tool or a budget cap.
|
||||
|
||||
## Fork 3: Godot --- Our Game Engine
|
||||
|
||||
[Godot](https://godotengine.org) is an open-source game engine. We forked 4.6.2 and added nine C++ modules that turn the engine into an agent-aware runtime:
|
||||
|
||||
- **agent_api** --- HTTP server inside the engine, so agents can query game state
|
||||
- **agent_vision** --- screenshot capture for AI vision pipelines
|
||||
- **agent_console** --- programmatic access to the engine console
|
||||
- **agent_replay** --- record and replay game sessions for testing
|
||||
- **agent_analytics** --- PostHog event tracking from inside the engine
|
||||
|
||||
These modules compile into the engine binary. A vanilla Godot user never sees them. An agent can connect to the running engine over HTTP, take a screenshot, read the scene tree, execute a console command, and capture the result --- all without touching the editor UI.
|
||||
|
||||
The alternative was building an engine integration from scratch. Or worse, building a custom engine. We'd still be writing a renderer instead of making a game.
|
||||
|
||||
## Why Forking Beats Building
|
||||
|
||||
### You inherit decades of work
|
||||
|
||||
Gitea has handled millions of git pushes. Godot renders millions of frames. Pi has processed millions of LLM tokens. That battle-testing is free when you fork. When you build from scratch, you spend your first year rediscovering bugs that were fixed upstream in 2019.
|
||||
|
||||
### You get free maintenance
|
||||
|
||||
Every upstream release brings security patches, performance improvements, and new features --- written by hundreds of contributors we don't pay. Our job is to rebase, resolve conflicts, and test. That's an afternoon, not a quarter.
|
||||
|
||||
### You stay focused
|
||||
|
||||
Building a git server from scratch means worrying about pack-file format, SSH key management, webhook delivery, and a thousand other things that have nothing to do with AI agents. Forking means you only think about the 5% that matters to you. The other 95% is someone else's problem.
|
||||
|
||||
### Agents work better on real platforms
|
||||
|
||||
An agent that pushes to a real Gitea instance --- with real CI, real code review, real permissions --- produces work that humans can actually review and ship. An agent that pushes to a toy demo platform produces demos.
|
||||
|
||||
The whole point of AI agents is to participate in real workflows. Real workflows run on real tools. If you want agents in your git workflow, put them in your git server. If you want agents in your game pipeline, put them in your game engine.
|
||||
|
||||
## The 0.5% Rule
|
||||
|
||||
Across all three forks, our total changeset is less than 0.5% of the upstream code. Tinqs Studio: 0.4% of Gitea. Pi extensions: 900 lines added to a 15,000-line codebase. Godot modules: 2,000 lines added to a 2-million-line engine.
|
||||
|
||||
This isn't a coincidence. If your fork touches more than 1% of upstream, you're doing too much. Either the upstream tool is wrong for the job, or you're not trusting it enough. The power of forking is that you don't have to understand the whole codebase. You find the extension points, add your code, and leave the rest alone.
|
||||
|
||||
## What We're Not Doing
|
||||
|
||||
We're not building a new IDE. Cursor and Claude Code exist. We're not building a new LLM. DeepSeek and Claude exist. We're not building a new cloud platform. AWS exists.
|
||||
|
||||
We're building the layer that connects them. The git server that speaks agent. The coding agent that speaks Gitea. The game engine that speaks HTTP. Each fork is a bridge between an existing tool and the agentic future --- not a replacement for either.
|
||||
|
||||
## The Bet
|
||||
|
||||
The age of agents doesn't need more agents. It needs better platforms. Platforms that understand agents as first-class users --- with API endpoints, safety rails, and lifecycle management. Those platforms already exist as open-source projects. They just need someone to fork them and add the wiring.
|
||||
|
||||
That's the bet. Fork, don't build. Modify the foundation, don't stack another layer on top. Let the upstream community handle the 99.5% while you focus on the 0.5% that makes it yours.
|
||||
|
||||
---
|
||||
|
||||
*[Tinqs Studio](https://tinqs.com) is our Gitea fork, open for game teams and indie studios. We're building [Ariki](https://arikigame.com) --- a survival colony sim --- using every tool described in this post. If you're interested in self-hosted game development with built-in AI agents, come take a look.*
|
||||
@@ -0,0 +1,122 @@
|
||||
---
|
||||
title: "Image Generation at Every Price Point with fal.ai"
|
||||
slug: image-generation-fal
|
||||
date: "2026-05-25"
|
||||
description: "We generate concept art, logos, icons, and trailer frames through a single API proxy. Here's how we pick between 12 models spanning $0.002 to $0.09 per image."
|
||||
og_description: "One proxy, 12 models, $0.002 to $0.09 per image. How we pick."
|
||||
og_image: "https://www.tinqs.com/img/og-cover.jpg"
|
||||
excerpt: "We generate concept art, logos, icons, and trailer frames through a single API proxy. Here's how we pick between 12 models spanning $0.002 to $0.09 per image."
|
||||
author: "Ozan Bozkurt"
|
||||
author_initials: "OB"
|
||||
author_role: "CTO & Developer, Tinqs"
|
||||
---
|
||||
We generate every visual asset for Ariki --- concept art, app icons, trailer frames, logo variants, Steam capsules --- through a single inference proxy that routes to fal.ai. No Photoshop. No Midjourney subscription. Just API calls at prices that range from $0.002 to $0.09 per image. Here's how we decide which model gets which job.
|
||||
|
||||
## The Setup
|
||||
|
||||
Our [Tinqs Studio](https://tinqs.com) platform includes an inference proxy that sits between agents and model providers. When an agent (or a human in Cursor) says "generate an image," the proxy routes the request to fal.ai, handles authentication, tracks usage per user, and persists the result to S3. The caller doesn't care which model runs --- they describe what they want, and the proxy picks or the caller specifies.
|
||||
|
||||
```
|
||||
Agent describes what it wants
|
||||
→ tinqsProxy receives generate_image call
|
||||
→ Routes to fal.ai with the specified model
|
||||
→ Image generated, persisted to S3
|
||||
→ Permanent URL returned to caller
|
||||
```
|
||||
|
||||
One API key. One billing account. Access to every model fal.ai hosts. That's the pitch of aggregator platforms, and fal.ai delivers on it.
|
||||
|
||||
## The Tiers
|
||||
|
||||
Not every image needs the best model. A throwaway mockup doesn't justify $0.09. A final logo doesn't deserve $0.002. We split our usage into four tiers.
|
||||
|
||||
### Best Quality --- Final Art
|
||||
|
||||
For images that ship --- hero art, app icons, trailer keyframes, print-ready designs --- we use three models depending on the content:
|
||||
|
||||
**Flux 2 Pro** ($0.03/megapixel, ~15 seconds). Our default. Best all-round quality for concept art, character illustrations, environment paintings, and anything that doesn't need text. Handles complex prompts with multiple elements well. Rarely fails.
|
||||
|
||||
**Ideogram v3 Quality** ($0.09, ~12 seconds). The only model that renders text reliably inside images. When we need a poster with a tagline, a sign in a game scene, or a logo with readable letters, this is the only option. The QUALITY tier is expensive but worth it --- text at lower tiers gets blurry.
|
||||
|
||||
**Recraft v3** ($0.04 raster, $0.08 vector, ~10 seconds). Built for commercial design. Clean lines, consistent style, and the only model on fal.ai that outputs SVG vectors. When we need brand assets, packaging mockups, or anything that might end up in print, Recraft produces work that doesn't need cleanup.
|
||||
|
||||
### Mid Tier --- Everyday Work
|
||||
|
||||
For images that are good enough for internal review, social posts, or documentation:
|
||||
|
||||
**Ideogram v3 Balanced** ($0.06, ~8 seconds). Typography quality between Turbo and Quality. Good for marketing materials where text matters but perfection doesn't.
|
||||
|
||||
**Seedream v4.5** ($0.04, ~8 seconds). Google's model on fal.ai. Photorealistic scenes and product shots. Different aesthetic from Flux --- slightly more photographic, less painterly.
|
||||
|
||||
**Flux Dev** ($0.025, ~10 seconds). The open-weight Flux variant. Good quality, and the base for LoRA fine-tuning if you want to train on your own style. We use it when we need custom-trained models later.
|
||||
|
||||
### Low Cost --- Drafts and Exploration
|
||||
|
||||
For iteration, A/B testing, and throwing things at the wall:
|
||||
|
||||
**Flux Schnell** ($0.003/megapixel, ~3 seconds). The workhorse for exploration. When we're figuring out composition, trying different camera angles, or generating 20 variants to pick one direction --- Schnell. A hundred images costs $0.30. You can afford to be wasteful.
|
||||
|
||||
**SDXL Lightning** (~$0.002, ~2 seconds). The absolute cheapest option. Lower quality than Schnell, but when you need 50 thumbnails to test a layout grid or generate placeholder textures, quality doesn't matter. Two cents for ten images.
|
||||
|
||||
### Specialised --- Editing and Post-Processing
|
||||
|
||||
For modifying existing images rather than generating new ones:
|
||||
|
||||
**Flux Kontext** (~$0.04, ~12 seconds). Context-aware editing. Give it an image and say "change the wood to marble" or "make it sunset lighting." Preserves composition while changing style or material. Useful for quick style transfers without regenerating from scratch.
|
||||
|
||||
**Nano Banana Edit** ($0.039, ~12 seconds). Image-to-image restyle. We use this for our logo variant pipeline --- take one carved-wood Ariki logo and produce versions in mahogany, pearl, obsidian, coral, gold. It's better than Kontext at preserving fine detail in complex images.
|
||||
|
||||
**BiRefNet** ($0.001, ~3 seconds). Background removal. Produces clean alpha cutouts from any image. We pair it with every logo and icon generation --- generate with a white background, then cut it out. A dollar gets you a thousand cutouts.
|
||||
|
||||
## How We Actually Use Them
|
||||
|
||||
### The Schnell-to-Pro Pipeline
|
||||
|
||||
We never start with the expensive model. Every generation session follows the same pattern:
|
||||
|
||||
1. **Explore with Schnell** ($0.003) --- 10-20 variants, different angles, compositions, color palettes. Total: $0.03-0.06.
|
||||
2. **Pick 2-3 directions.** Human looks at the grid, picks the promising ones.
|
||||
3. **Refine with Flux 2 Pro** ($0.03) --- regenerate the winners at full quality with refined prompts. Total: $0.06-0.09.
|
||||
4. **Post-process** --- BiRefNet for background removal ($0.001), maybe Recraft for a vector version ($0.08).
|
||||
|
||||
A full session --- from blank canvas to final assets --- costs under $0.20. That's the price of a single Midjourney generation on their Pro plan.
|
||||
|
||||
### Logo Variants at Scale
|
||||
|
||||
Our Ariki logo has 18 material variants --- deep mahogany, mother-of-pearl, obsidian, molten lava, bronze with verdigris, tapa cloth, and more. Each one generated with Nano Banana Edit ($0.039) + BiRefNet ($0.001) for transparency. Total cost for 18 variants: **$0.72**. A designer would quote hundreds of dollars and a week of work for the same output.
|
||||
|
||||
### Typography That Works
|
||||
|
||||
Every model except Ideogram fails at text. Flux will give you beautiful art with garbled letters. Recraft gets close but isn't consistent. SDXL doesn't try. If the image has words in it, Ideogram v3 is the only answer. We've learned to accept the $0.09 cost for text-heavy images rather than wasting $0.30 on ten failed Flux attempts.
|
||||
|
||||
## The Numbers
|
||||
|
||||
Over the past month:
|
||||
|
||||
| Category | Images | Total Cost | Avg Cost/Image |
|
||||
|----------|--------|-----------|----------------|
|
||||
| Concept art (flux-2-pro) | ~120 | $3.60 | $0.03 |
|
||||
| Exploration drafts (schnell) | ~400 | $1.20 | $0.003 |
|
||||
| Logo variants (nano-banana) | 18 | $0.72 | $0.04 |
|
||||
| Icons (nano-banana + birefnet) | 30 | $1.20 | $0.04 |
|
||||
| Typography (ideogram) | ~25 | $1.50 | $0.06 |
|
||||
| Background removal (birefnet) | ~80 | $0.08 | $0.001 |
|
||||
| **Total** | **~673** | **$8.30** | **$0.012** |
|
||||
|
||||
Six hundred images for eight dollars. The infrastructure to route, authenticate, and persist them costs more than the generation itself.
|
||||
|
||||
## What We Learned
|
||||
|
||||
**Never iterate on expensive models.** The Schnell-to-Pro pipeline saves 10x. Most of the creative work happens at $0.003/image. The expensive model just polishes the decision you already made.
|
||||
|
||||
**Typography is a solved problem --- but only on one model.** Stop trying to make Flux render text. Use Ideogram v3 Quality for anything with words. Accept the cost.
|
||||
|
||||
**Vector output is underrated.** Recraft v3's SVG export means logos and icons scale to any size without artifacts. For anything that might end up on a billboard or a business card, pay the $0.08 for vector.
|
||||
|
||||
**Background removal is basically free.** At $0.001 per image, there's no reason to ever manually mask anything. Run BiRefNet on everything, keep both versions.
|
||||
|
||||
**Aggregation beats loyalty.** No single model is best at everything. Flux for art, Ideogram for text, Recraft for design, Nano Banana for edits, BiRefNet for masks. The proxy pattern lets us use the right tool for each job without managing five API keys and five billing accounts.
|
||||
|
||||
---
|
||||
|
||||
*Image generation is built into [Tinqs Studio](https://tinqs.com) --- our Gitea-based platform for game teams. Every model above is available through the same inference proxy that handles LLM calls, authenticated with the same Gitea token. We're building [Ariki](https://arikigame.com) with these tools, and every asset in the game touched at least one of them.*
|
||||
@@ -0,0 +1,103 @@
|
||||
---
|
||||
title: "Pi as CI Integrator: Agents That Fix Their Own Builds"
|
||||
slug: pi-ci-integrator
|
||||
date: "2026-05-25"
|
||||
description: "Most coding agents stop at git push. Our Pi fork watches CI, reads failure logs, and fixes its own code until the pipeline goes green."
|
||||
og_description: "Coding agents that watch CI and fix their own builds."
|
||||
og_image: "https://www.tinqs.com/img/og-cover.jpg"
|
||||
excerpt: "Most coding agents stop at git push. Our Pi fork watches CI, reads failure logs, and fixes its own code until the pipeline goes green."
|
||||
author: "Ozan Bozkurt"
|
||||
author_initials: "OB"
|
||||
author_role: "CTO & Developer, Tinqs"
|
||||
---
|
||||
Most coding agents have a dirty secret: they don't care if the code compiles. They write, they push, they walk away. The human discovers the broken build an hour later. We built a Pi extension that closes the loop --- agents that watch CI, read failure logs, and fix their own mistakes.
|
||||
|
||||
## The Gap
|
||||
|
||||
Every agent demo looks the same. The AI writes code, commits, pushes. The presenter says "and now we have a pull request!" Cut. End of demo.
|
||||
|
||||
What happens next? The CI pipeline runs. Tests fail. Linting screams. The build breaks because someone forgot an import. A human opens the PR, reads the red badge, clicks into the logs, finds the error, fixes it, pushes again. The agent did 90% of the work but left the last 10% --- the most tedious part --- for a person.
|
||||
|
||||
We wanted agents that finish the job.
|
||||
|
||||
## The tinqs-ci Extension
|
||||
|
||||
Our [Pi fork](https://tinqs.com/tinqs/pi) has a `tinqs-ci` extension --- a single TypeScript file, about 200 lines --- that gives the agent three tools:
|
||||
|
||||
- **ci_status** --- checks the current pipeline state for a branch (pending, running, success, failure)
|
||||
- **ci_logs** --- fetches the full build log from the most recent failed run
|
||||
- **ci_wait** --- polls the pipeline every 15 seconds until it finishes, then returns the result
|
||||
|
||||
These are Gitea Actions API calls under the hood. The agent authenticates with the same PAT it uses for git push. No extra credentials, no special CI service account.
|
||||
|
||||
## The Loop
|
||||
|
||||
Here's what a Pi task looks like end to end:
|
||||
|
||||
```
|
||||
Agent receives task brief
|
||||
→ reads codebase, plans approach
|
||||
→ writes code
|
||||
→ runs local tests (bash tool)
|
||||
→ commits and pushes branch
|
||||
→ calls ci_wait
|
||||
→ CI passes → opens PR via Gitea API
|
||||
→ CI fails → calls ci_logs
|
||||
→ reads error output
|
||||
→ fixes the issue
|
||||
→ pushes again
|
||||
→ calls ci_wait again
|
||||
→ repeats until green (max 3 retries)
|
||||
```
|
||||
|
||||
The key is that `ci_logs` returns the raw build output --- compiler errors, test failures, lint violations --- as plain text in the agent's context. DeepSeek V4 is surprisingly good at reading build logs. It parses a Go compiler error, identifies the file and line, and fixes it. It reads a test assertion failure, understands what the test expected, and corrects the implementation.
|
||||
|
||||
Three retries is the hard limit. If the agent can't fix it in three rounds, it opens the PR anyway with a comment explaining what failed and why. A human takes over from there. In practice, most failures resolve on the first retry --- it's usually a missing import or a type mismatch.
|
||||
|
||||
## What This Actually Looks Like
|
||||
|
||||
A real run from last week. The task: add a health check endpoint to a Go service.
|
||||
|
||||
- **Turn 1:** Agent reads the codebase, writes the handler and test, pushes. CI fails --- the test imports a package that doesn't exist on the runner.
|
||||
- **Turn 2:** Agent reads `ci_logs`, sees the `go: module not found` error, adds the missing `go.mod` replace directive, pushes. CI passes.
|
||||
- **Turn 3:** Agent opens PR with passing checks.
|
||||
|
||||
Total time: 4 minutes. Total cost: $0.06. No human touched the keyboard.
|
||||
|
||||
Without the CI extension, this would have been a PR with a red badge and a Slack message saying "hey, the agent's PR is broken again." Someone would have context-switched, opened the logs, seen the trivial error, fixed it, and lost 20 minutes of flow state.
|
||||
|
||||
## Why This Matters More Than You Think
|
||||
|
||||
CI integration isn't a feature. It's the difference between an agent that helps and an agent that creates work.
|
||||
|
||||
An agent that pushes broken code is worse than no agent at all. It creates a false sense of progress --- "the PR is up!" --- while actually adding a task to someone's plate. Every broken PR is an interruption. Every interruption costs 15 minutes of context-switching.
|
||||
|
||||
An agent that watches CI and fixes its own builds is genuinely autonomous. You submit a task, you walk away, you come back to a green PR ready for review. The agent handled the mechanical iteration that a human would have done anyway --- the fix-push-wait-check cycle that eats hours of developer time every week.
|
||||
|
||||
## The Guardrail Problem
|
||||
|
||||
Letting an agent retry its own builds sounds dangerous. What if it enters an infinite loop? What if it starts making increasingly wild changes to get the build to pass?
|
||||
|
||||
Three safeguards:
|
||||
|
||||
**Retry limit.** Three attempts maximum. After that, the agent stops and reports. This is a hard limit in the orchestrator, not a suggestion to the model.
|
||||
|
||||
**Diff budget.** Each retry can only touch files that were already in the original changeset. The agent can't "fix" a build failure by rewriting the test suite or disabling the linter. If the fix requires touching new files, it fails and escalates.
|
||||
|
||||
**Hallucination detection.** The guardrail extension monitors every turn. If the agent claims "the build passed" without having called `ci_status` or `ci_wait`, it gets corrected. Agents are not allowed to guess the CI result.
|
||||
|
||||
## The Numbers
|
||||
|
||||
Over three weeks of running the orchestrator:
|
||||
|
||||
- **87 tasks** completed end-to-end
|
||||
- **23 tasks** needed at least one CI retry (26%)
|
||||
- **19 of those 23** resolved on the first retry
|
||||
- **4 tasks** hit the retry limit and escalated to a human
|
||||
- **0 tasks** produced a merged PR that later broke something else
|
||||
|
||||
The 26% retry rate tells you how often agents push code that doesn't build on the first try. That's not a bad number --- it's the same rate you'd see from a junior developer. The difference is the agent fixes it in 30 seconds instead of 20 minutes.
|
||||
|
||||
---
|
||||
|
||||
*The CI extension is part of our [Pi fork](https://tinqs.com/tinqs/pi), which runs inside [Tinqs Studio](https://tinqs.com) --- a Gitea-based platform for game development with built-in AI agents. The whole thing is MIT licensed.*
|
||||
@@ -0,0 +1,149 @@
|
||||
---
|
||||
title: "A Pre-Commit Agent That Guards Your Secrets for $0.001"
|
||||
slug: pre-commit-agent
|
||||
date: "2026-05-25"
|
||||
description: "We built a pre-commit hook that calls DeepSeek V4 Flash to review every commit. It catches leaked secrets, classified terms, broken URLs, and drafts announcements --- for a tenth of a cent per commit."
|
||||
og_description: "A DeepSeek-powered pre-commit hook that catches leaks for $0.001/commit."
|
||||
og_image: "https://www.tinqs.com/img/og-cover.jpg"
|
||||
excerpt: "We built a pre-commit hook that calls DeepSeek V4 Flash to review every commit. It catches leaked secrets, classified terms, and broken URLs --- for a tenth of a cent."
|
||||
author: "Ozan Bozkurt"
|
||||
author_initials: "OB"
|
||||
author_role: "CTO & Developer, Tinqs"
|
||||
---
|
||||
We have a problem that every small team has: too many things to remember before hitting commit. Don't leak API keys. Don't reference the classified AI codename in public blog posts. Don't link to GitHub repos we deleted six months ago. Don't push a blog post with a 90-character title. We built a pre-commit hook that uses a cheap LLM to check all of this automatically --- and it costs less than a tenth of a cent per commit.
|
||||
|
||||
## The Problem
|
||||
|
||||
We maintain a docs repo that serves double duty. Internal files --- game design documents, architecture notes, agent configuration --- sit alongside a public blog and website. The internal side references classified codenames, machine hostnames, and internal URLs. The public side must never contain any of that.
|
||||
|
||||
We also deleted all our GitHub repos in April 2026 and moved everything to our own Gitea platform. But old links keep creeping back in --- someone copies a URL from an old document, a blog post references the wrong remote. These are invisible bugs. The blog looks fine, the build passes, and three weeks later someone notices a dead link.
|
||||
|
||||
A checklist in the README doesn't work. Humans skip checklists. Code review catches some issues but not all --- reviewers focus on logic, not whether a URL points to a deleted GitHub org. We needed something automatic, fast, and cheap enough to run on every single commit.
|
||||
|
||||
## Two Layers: Regex + Agent
|
||||
|
||||
The hook has two layers. The first is instant and free. The second is smart and nearly free.
|
||||
|
||||
### Layer 1: Local Blocklist (0ms, $0.00)
|
||||
|
||||
A text file of regex patterns, each tagged with a scope and a message:
|
||||
|
||||
```
|
||||
public|\bCosmos\b|Classified codename — use "advanced colonist AI"
|
||||
all|github\.com/(tinqs-ltd|tinqs)/|GitHub repos deleted — use tinqs.com
|
||||
all|sk-[a-zA-Z0-9]{20,}|Possible API key leaked
|
||||
all|AKIA[A-Z0-9]{16}|AWS access key leaked
|
||||
public|admin\.arikigame\.com|Internal admin URL in public content
|
||||
```
|
||||
|
||||
The scope field controls where the pattern is enforced. `all` means every file. `public` means only files under `website/` --- our public-facing content. This is critical. We *want* classified codenames in our internal architecture docs. We just don't want them in blog posts.
|
||||
|
||||
The blocklist runs grep against the staged diff. No network call, no API, no latency. If it finds a match, the commit is blocked immediately with a file path and explanation. This catches 80% of issues before the LLM ever wakes up.
|
||||
|
||||
### Layer 2: DeepSeek V4 Flash Review (~4s, $0.001)
|
||||
|
||||
If the commit touches public-facing files (`website/`, blog posts), the hook sends the staged diff to DeepSeek V4 Flash through our inference proxy. The system prompt tells the model exactly what to check:
|
||||
|
||||
- **Leaked secrets** --- API keys, tokens, credentials that the regex might have missed
|
||||
- **Classified terms** --- codenames that aren't in the blocklist yet
|
||||
- **Internal URLs** --- references to internal services that shouldn't be public
|
||||
- **Blog quality** --- title length, meta description, slug consistency, missing fields
|
||||
- **Broken links** --- malformed URLs, obvious typos
|
||||
- **Announcements** --- if it's a new blog post, draft a one-line summary
|
||||
|
||||
The model responds with structured JSON: errors (block the commit) or warnings (inform but allow). If the API is unreachable or times out, the commit proceeds --- the hook never blocks work for infrastructure reasons.
|
||||
|
||||
## Why Not Pi?
|
||||
|
||||
Our [Pi fork](https://tinqs.com/tinqs/pi) is a full coding agent with tool calling, file I/O, and context management. It's what we use for overnight autonomous coding. But for pre-commit review, it's overkill.
|
||||
|
||||
A pre-commit hook needs to finish in under 5 seconds. Pi takes 2--3 seconds just to start the Node.js process and load extensions. The review itself is a single LLM call with a system prompt and a diff --- no tools needed, no file reads, no iteration. A direct curl to DeepSeek is faster and simpler.
|
||||
|
||||
That said, the hook is designed as a stepping stone. The blocklist patterns, the review prompt, and the classification logic are all reusable. When we build a Pi-based CI reviewer that runs on pull requests --- with tool access to read the full file, check links live, and verify image URLs --- it will use the same prompt and the same patterns. The pre-commit hook is the fast, cheap first pass. Pi is the thorough second pass.
|
||||
|
||||
## The Architecture
|
||||
|
||||
```
|
||||
git commit
|
||||
↓
|
||||
.githooks/pre-commit (bash)
|
||||
↓
|
||||
Phase 0: Collect staged diff + classify files
|
||||
↓
|
||||
Phase 1: Regex blocklist scan (instant, free)
|
||||
→ Match found → BLOCK (exit 1)
|
||||
→ Clean → continue
|
||||
↓
|
||||
Phase 2: Public files changed?
|
||||
→ No → exit 0 (skip AI, no cost)
|
||||
→ Yes → send diff to DeepSeek V4 Flash
|
||||
↓
|
||||
Phase 3: Parse JSON response
|
||||
→ Errors → BLOCK (exit 1)
|
||||
→ Warnings → print, exit 0
|
||||
→ Announcement → print draft
|
||||
→ API failure → warn, exit 0
|
||||
```
|
||||
|
||||
The hook lives in `.githooks/` inside the repo --- committed, version-controlled, shared by the whole team. A setup script configures `git config core.hooksPath` to point there. The LFS pre-push hook sits in the same directory.
|
||||
|
||||
## What It Costs
|
||||
|
||||
The system prompt is ~500 tokens. An average diff is 2,000--4,000 tokens. The response is ~200 tokens. At DeepSeek V4 Flash rates:
|
||||
|
||||
| | Tokens | Cost |
|
||||
|--|--------|------|
|
||||
| Input (prompt + diff) | ~4,000 | $0.00056 |
|
||||
| Output (JSON response) | ~200 | $0.00006 |
|
||||
| **Per commit** | | **$0.00062** |
|
||||
|
||||
Call it a tenth of a cent. Twenty commits a day across the team: **$0.012/day**. About **$0.40/month**.
|
||||
|
||||
Commits that only touch internal files (architecture docs, agent config, game design) skip the AI review entirely. Zero cost. The hook only calls DeepSeek when public-facing content changes.
|
||||
|
||||
## What It Catches
|
||||
|
||||
In the first week:
|
||||
|
||||
- **2 classified codename leaks** in draft blog posts --- caught by the blocklist before the AI even ran
|
||||
- **1 GitHub URL** that crept back in from a copy-paste --- caught by the blocklist
|
||||
- **3 blog SEO warnings** --- titles over 60 characters, missing og_description --- caught by the AI review
|
||||
- **1 announcement draft** generated automatically when a new blog post was committed
|
||||
|
||||
Zero false positives on the blocklist (the patterns are specific enough). Two false positives from the AI (flagged an internal URL in a code example that was clearly illustrative, not a real link). We added a note to the prompt to ignore URLs inside fenced code blocks.
|
||||
|
||||
## Setup
|
||||
|
||||
One command per machine:
|
||||
|
||||
```bash
|
||||
bash scripts/setup-hooks.sh
|
||||
```
|
||||
|
||||
Or on Windows:
|
||||
|
||||
```powershell
|
||||
.\scripts\setup-hooks.ps1
|
||||
```
|
||||
|
||||
Set your inference token:
|
||||
|
||||
```bash
|
||||
export TINQS_HOOK_TOKEN=<your-gitea-pat>
|
||||
```
|
||||
|
||||
That's it. Every `git commit` now runs the two-layer review. Bypass with `git commit --no-verify` when you need to (emergencies, known false positives).
|
||||
|
||||
## The Pattern: Guard Rails at the Edge
|
||||
|
||||
This is the same pattern we apply everywhere: put the guard rail where the action happens. Don't rely on a human checklist. Don't wait for code review. Don't hope someone remembers.
|
||||
|
||||
The pre-commit hook is $0.001 worth of prevention. A leaked API key in a public blog post is hours of rotation, revocation, and audit. A classified codename in a public post is a confidentiality breach. A dead GitHub link is a broken user experience that nobody notices for weeks.
|
||||
|
||||
The tools exist. DeepSeek V4 Flash is cheap enough to call on every commit. The hook is 150 lines of bash. The blocklist is a text file. The total infrastructure cost is zero --- it runs on the developer's machine, calls an API we already pay for, and adds 4 seconds to the commit flow.
|
||||
|
||||
The age of agents doesn't just mean agents that write code. It means agents that watch the code you write.
|
||||
|
||||
---
|
||||
|
||||
*The pre-commit hook is part of [Tinqs Studio](https://tinqs.com), our open platform for game development. The inference proxy, the blocklist patterns, and the review prompt are all open and reusable. We're building [Ariki](https://arikigame.com) with these tools --- every commit in the game repo runs through the same guard.*
|
||||
Reference in New Issue
Block a user