agent: Write two new blog posts as markdown files in posts/. (1) posts/l

2026-05-26 10:41:06 +01:00
4 changed files with 272 additions and 0 deletions
@@ -6,6 +6,8 @@ We're building Tinqs Studio while using it to make our own game --- a survival c

 ## Posts

+- [From Local to Cloud: Agents That Run While You Sleep](posts/cloud-agents.md) (2026-05-26)
+- [Running AI Agents from Your Browser](posts/local-agent-dashboard.md) (2026-05-26)
 - [How a Small Game Studio Runs on AI Agents](posts/agentic-workflow.md) (2026-03-06)
 - [One Binary to Rule Them All: Building a Studio CLI](posts/studio-cli.md) (2026-05-18)
 - [Why We Forked Gitea and Built Tinqs Studio](posts/forking-gitea.md) (2026-05-20)
@@ -351,6 +351,16 @@
        <div class="post-title">One Binary to Rule Them All: Building a Studio CLI</div>
        <div class="post-excerpt">A single Go binary that gives AI agents context about who you are, what machine you're on, and what services are reachable.</div>
      </a>
+      <a class="post-card" href="posts/cloud-agents.md">
+        <div class="post-date">2026-05-26</div>
+        <div class="post-title">From Local to Cloud: Agents That Run While You Sleep</div>
+        <div class="post-excerpt">A Go orchestrator inside Tinqs Studio spawns coding agents in RPC mode, manages worker pools, and delivers PRs overnight.</div>
+      </a>
+      <a class="post-card" href="posts/local-agent-dashboard.md">
+        <div class="post-date">2026-05-26</div>
+        <div class="post-title">Running AI Agents from Your Browser</div>
+        <div class="post-excerpt">A local dashboard at localhost:33634 turns agent sessions into trackable jobs with handoffs, code generation, testing, and PR delivery.</div>
+      </a>
      <a class="post-card" href="posts/agentic-workflow.md">
        <div class="post-date">2026-03-06</div>
        <div class="post-title">How a Small Game Studio Runs on AI Agents</div>
@@ -0,0 +1,129 @@
+---
+title: "From Local to Cloud: Agents That Run While You Sleep"
+slug: cloud-agents
+date: "2026-05-26"
+description: "A Go orchestrator inside Tinqs Studio spawns coding agents in RPC mode, manages worker pools, creates git worktrees, waits for CI, and opens PRs — all while you sleep."
+og_description: "Cloud agents: a Go orchestrator that spawns coding agents, manages worker pools, and delivers PRs overnight."
+og_image: "https://www.tinqs.com/img/og-cover.jpg"
+excerpt: "A Go orchestrator inside Tinqs Studio that spawns coding agents in RPC mode, manages worker pools, creates git worktrees, waits for CI, and opens PRs — all while you're asleep."
+author: "Ozan Bozkurt"
+author_initials: "OB"
+author_role: "CTO & Developer, Tinqs"
+---
+We built an orchestrator in Go that runs coding agents on our servers, on our repos, while nobody is watching. It spawns agents in RPC mode, manages worker pools, creates isolated git worktrees, waits for CI to pass, and opens pull requests. Your backlog runs overnight. You wake up to PRs.
+
+## Why the Dashboard Wasn't Enough
+
+The local dashboard is great for workday use. Queue up tasks while you're at your desk, monitor them in a browser tab, review PRs as they land. But it has a hard limit: your machine has to be on. When you go home, the agents stop. When you close your laptop, the queue freezes. When Windows decides it's time for an update and reboots at 3am, your overnight batch of 12 tasks is gone.
+
+More importantly, the local dashboard doesn't integrate with your team's workflow. It doesn't know about your CI pipeline, your branch protection rules, your code review assignments. It's a personal tool, not a team tool.
+
+We needed agents that run **on infrastructure, not on laptops.** Agents that spawn when there's work and shut down when there isn't. Agents that understand the full lifecycle from issue assignment to merged PR, not just "generate this code and stop."
+
+So we built an orchestrator.
+
+## The Orchestrator: Go, RPC, and Worker Pools
+
+The orchestrator is a Go service that runs inside Tinqs Studio. It manages a pool of coding agents and feeds them work. Here's how it fits together.
+
+### RPC Mode: Agents as Subprocesses
+
+The coding agent supports an RPC mode --- a JSON protocol over stdin/stdout. No terminal UI, no interactive prompts, no human in the loop. You send a JSON command on stdin, the agent processes it, and you get JSON responses and streaming events on stdout.
+
+```json
+{"type": "prompt", "message": "Fix the null pointer in VegetationGrid.ActivateChunk() and add a unit test"}
+```
+
+The agent reads the codebase, plans the change, calls tools, runs tests, and streams progress back as JSON events. When it's done, the session stops. No human needed.
+
+The orchestrator spawns one agent subprocess per active job. Each agent is isolated --- its own process, its own working directory, its own session state. If an agent crashes, the orchestrator captures the error, cleans up, and marks the job as failed. Nothing leaks between jobs.
+
+### Worker Pools: Don't Spin Up One Agent per Task
+
+Spawning a new process for every task would be slow and wasteful. The orchestrator maintains a pool of idle agent processes, ready to accept work.
+
+When a task comes in, the orchestrator picks an idle worker from the pool, assigns the task, and the worker goes to work. When the task completes, the worker returns to the pool. If there's no idle worker and the pool isn't at capacity, the orchestrator spawns a new one. If the pool is full, the task queues.
+
+The pool size is configurable. We default to 3 concurrent workers --- enough to parallelize independent tasks without overwhelming CI or hitting rate limits. For teams with heavier workloads, bump it to 5 or 8. The limit is your CI throughput and your API budget, not the orchestrator.
+
+### Git Worktrees: Isolation Without Forking
+
+Every job gets its own git worktree. Not a clone --- a worktree. A clone copies the entire repository, which for our game repo means copying 12GB of LFS objects. A worktree checks out a new working directory on a new branch, sharing the same `.git` directory as the main clone. It takes under a second.
+
+The workflow:
+
+1. Orchestrator receives a task with a repository ID
+2. Creates a worktree on a new branch named `agent/<task-id>`
+3. Spawns or assigns a worker pointed at that worktree
+4. Worker runs the task --- reads, writes, commits, pushes
+5. When done (or failed), worktree is cleaned up
+
+Worktrees solve the isolation problem without the overhead of full clones. Two agents can work on the same repo simultaneously, on different branches, with zero file conflicts. The orchestrator handles the lifecycle --- create the worktree before the job starts, remove it after the job ends.
+
+## The Full Lifecycle: From Issue to Merged PR
+
+A cloud agent job doesn't end when the code is written. The orchestrator shepherds the task from assignment to merge. Here's the full lifecycle:
+
+**1. Task intake.** A task is created --- either manually through the dashboard, automatically from an issue, or scheduled from a backlog. The task specifies the repository, the prompt, and optionally constraints (target branch, reviewers, labels).
+
+**2. Worktree creation.** The orchestrator creates an isolated worktree on a branch named `agent/<task-id>`. If the repo has CI hooks, the branch name triggers the pipeline.
+
+**3. Agent execution.** The worker receives the prompt via RPC, reads the codebase, writes code, runs tests locally. If local tests fail, the agent iterates until they pass. If they can't pass after a configured number of attempts, the job fails early --- no point pushing broken code.
+
+**4. Push and CI.** When local tests pass, the agent commits and pushes. The orchestrator detects the push and starts polling CI status. It doesn't just fire and forget --- it waits.
+
+**5. CI monitoring.** The orchestrator polls the CI pipeline every 30 seconds. If CI fails, it feeds the failure logs back to the agent. The agent reads the logs, fixes the issue, commits, and pushes again. This loop repeats until CI passes or the agent exhausts its retry budget.
+
+**6. PR creation.** When CI is green, the orchestrator opens a pull request with the agent's summary --- what changed, why, and testing notes. It assigns reviewers based on repo configuration and adds labels.
+
+**7. Cleanup.** The worktree is removed. The worker returns to the pool. The job status updates to "awaiting review."
+
+All of this happens without human intervention. A task submitted at 11pm can be a merged PR by the time you check your notifications at 8am.
+
+## CI Integration: Closing the Loop
+
+The CI loop is where cloud agents earn their keep. A local agent can write code and run tests, but it can't push and wait for CI. It can't see that the Windows build failed because of a path separator issue while the Linux build passed. It can't adjust and try again.
+
+The orchestrator can. When CI fails, the orchestrator:
+
+1. Fetches the failure logs from the CI system
+2. Appends them to the agent's session as context
+3. Sends a follow-up prompt: "CI failed with these errors. Fix them."
+4. The agent reads the logs, identifies the issue, and makes a fix
+5. Commit, push, poll CI again
+
+We've seen agents fix CI failures that would take a human 20 minutes to diagnose --- missing imports in test files, platform-specific path handling, version mismatches in dependency files. The agent doesn't know these things innately, but given the CI logs and the codebase, it figures them out.
+
+The retry budget is configurable. We default to 3 CI retries per task. If CI fails 3 times and the agent can't fix it, the job is marked "needs human" with the full failure context. A human reviews, fixes, and the agent doesn't burn credits looping on an unfixable problem.
+
+## The Agents Tab: Coming to tinqs.com
+
+All of this is managed through a new Agents tab, coming to every repository on Tinqs Studio. It's a single view for everything agent-related on your repo:
+
+- **Job queue** --- pending, running, completed, failed tasks
+- **Agent sessions** --- live session output with file diffs
+- **Cost tracking** --- per-task and aggregate API spending
+- **Configuration** --- pool size, model selection, CI retry budget, reviewer assignments
+- **History** --- past jobs with full logs and outcomes
+
+The Agents tab turns the orchestrator from a developer tool into a team tool. Any team member can submit tasks, monitor progress, and review results. The PM can queue up backlog items. The designer can request asset changes. The lead developer can review PRs the agents opened overnight.
+
+It's not launching as a separate product. It's a tab on your repo, next to Issues, Pull Requests, and Actions. Agents are part of the development workflow, not a bolt-on.
+
+## What We've Learned
+
+**RPC mode is the right abstraction for headless agents.** Early versions of the orchestrator tried to script the interactive agent --- sending keystrokes, parsing terminal output. It was fragile and slow. RPC mode is a proper API: structured input, structured output, streaming events. The orchestrator and the agent speak the same language.
+
+**Worker pools save cold-start time.** Spawning a new agent process for every task adds 2--3 seconds of startup overhead --- loading config, warming caches, reading the soul file. With a pool of idle workers, tasks start instantly. The overhead matters when you're running 50 tasks overnight.
+
+**CI polling is more reliable than webhooks.** We started with webhook-based CI integration --- CI notifies the orchestrator when a build completes. Webhooks get dropped, delayed, or delivered out of order. Polling every 30 seconds is simpler, more reliable, and the latency difference is irrelevant for overnight batch work.
+
+**Git worktrees are a superpower.** Before worktrees, parallel agents meant multiple clones or complex branching strategies. Worktrees give you isolation for free. One command, no copy, no wait. When a job finishes, the worktree disappears. When a new job starts, a fresh worktree appears. No stale state, no cleanup burden.
+
+**Cloud agents complement local agents, not replace them.** The dashboard is for interactive work during the day --- pair-programming, quick fixes, exploration. The orchestrator is for batch work overnight --- backlog processing, large-scale refactors, scheduled maintenance. They're two modes of the same tool, not competitors.
+
+---
+
+Cloud agents aren't science fiction. They're a Go service, a worker pool, and an RPC protocol running on our infrastructure, doing real work on real repos. The local dashboard made agents useful during the day. The cloud orchestrator makes them useful around the clock.
+
+We're building all of this as part of [Tinqs Studio](https://tinqs.com) --- a game development platform that brings together git hosting, AI agent tools, and creative workflows. The Agents tab is rolling out to repositories this quarter. If you want agents working on your backlog while you sleep, that's where they'll live.
@@ -0,0 +1,131 @@
+---
+title: "Running AI Agents from Your Browser"
+slug: local-agent-dashboard
+date: "2026-05-26"
+description: "A local dashboard at localhost:33634 turns AI agent tasks into a job queue. Handoffs become jobs, agents code and test and deliver PRs, all visible from your browser."
+og_description: "A local dashboard turns AI agent sessions into trackable jobs. Handoffs, PR delivery, and $0.02/task."
+og_image: "https://www.tinqs.com/img/og-cover.jpg"
+excerpt: "A local dashboard at localhost:33634 turns AI agent sessions into trackable jobs with handoffs, code generation, testing, and PR delivery — all visible from your browser."
+author: "Ozan Bozkurt"
+author_initials: "OB"
+author_role: "CTO & Developer, Tinqs"
+---
+We built a local dashboard that turns AI agent sessions into a job queue you can watch from your browser. Open localhost:33634, type a task, and watch the agent plan, code, test, and deliver a PR --- while you work on something else.
+
+## The Problem: Agent Sessions Are Ephemeral
+
+When you use an AI coding agent in your terminal, you're staring at a scrolling wall of text. The agent thinks, calls tools, writes files, runs tests. If you look away for 30 seconds, you miss something. If you have three tasks queued up, you can't see which one is running, which one is blocked, and which one shipped a PR twenty minutes ago.
+
+The terminal is great for interactive sessions --- pair-programming with the agent, asking questions, iterating on a design. It's terrible for **batch work**. When you have five issues to triage, two features to implement, and a PR to review, you don't want to shepherd each one through a terminal session. You want to hand them off and check back later.
+
+So we built a dashboard.
+
+## What the Dashboard Is
+
+The dashboard runs locally at `localhost:33634`. It's a web UI that sits on top of your local coding agent, turning each task into a visible, trackable job. Open it in any browser, type a task, and the agent goes to work.
+
+The dashboard shows:
+
+- **Active sessions** --- which agent sessions are running, what they're doing right now, live streaming their output
+- **Job queue** --- tasks you've submitted, in order, with status: queued, running, done, failed
+- **Session history** --- past sessions with full logs, file diffs, and cost breakdowns
+- **Git state** --- which branch each session is on, what files changed, whether there's uncommitted work
+
+It's the difference between `ps aux | grep agent` and a proper job manager.
+
+## How Handoffs Become Jobs
+
+The most important design decision: **every handoff is a job.** In terminal mode, you hand off to the agent by typing a prompt and watching. In dashboard mode, you type a prompt in the browser and the agent accepts it as a discrete job.
+
+A handoff might be "fix the null pointer in `VegetationGrid.ActivateChunk()` and add a unit test." The dashboard creates a new session for that task, spawns the agent on a fresh git worktree, and lets it run. You don't need to watch. You get a notification when it's done.
+
+Here's what happens under the hood:
+
+1. You submit a task through the dashboard UI
+2. The dashboard creates a new agent session with the task as the opening prompt
+3. The agent reads the codebase, plans the change, and starts working
+4. Tool calls --- file reads, writes, bash commands --- stream to the dashboard in real time
+5. When the agent is done, it summarizes what changed and proposes next steps
+6. The job status updates: done, failed, or needs human review
+
+Multiple jobs can run in parallel on different git worktrees. One agent fixing a UI bug on `tree-ui-fix`, another implementing a save system on `tree-save-system`. No conflicts, no cross-contamination. Each job has its own branch, its own diff, its own results.
+
+## The Agent's Job: Code, Test, Deliver
+
+We've tuned the agent to follow a specific workflow on every task:
+
+**Read first.** Before writing a single line, the agent reads the files it needs to understand. It traces call chains, checks imports, reads test files. This sounds obvious, but most AI coding tools jump straight to writing code on incomplete context. Our agent is instructed to load context first, then act.
+
+**Test as it goes.** After making a change, the agent runs the relevant tests immediately. Not "run all tests at the end." Not "assume it works and move on." Every change gets verified before the next one. If tests fail, the agent diagnoses and fixes before proceeding.
+
+**Git discipline.** Every job creates a branch off main. Commits are small and descriptive. The agent writes proper commit messages --- not "fix stuff" but "fix null pointer in VegetationGrid.ActivateChunk when chunk is out of bounds." If the task spans multiple logical changes, the agent splits them into multiple commits.
+
+**PR delivery.** When the agent is done, it pushes the branch and opens a pull request with a description of what changed, why, and what was tested. The human reviews and merges. The agent never merges its own code.
+
+This workflow --- read, code, test, commit, PR --- runs end-to-end without human intervention. The human's job is to review the PR, not to babysit the agent.
+
+## The Task Lifecycle
+
+Every task in the dashboard moves through a defined lifecycle:
+
+| Stage | What's happening |
+|-------|-----------------|
+| **Queued** | Task is in the queue, waiting for a free agent slot |
+| **Planning** | Agent is reading the codebase, understanding the problem, forming a plan |
+| **Working** | Agent is writing code, running tests, iterating on fixes |
+| **Testing** | Agent is running the full test suite, verifying nothing broke |
+| **Delivering** | Agent is committing, pushing, and opening a PR |
+| **Done** | Task completed successfully, PR is open for review |
+| **Failed** | Something went wrong --- tests didn't pass, agent couldn't fix, PR rejected |
+| **Needs Human** | Agent hit an ambiguity and is asking for clarification |
+
+You can see at a glance where every task is. No scrolling through terminal output trying to figure out if the agent is still working or stuck in a loop.
+
+## Costs: $0.02 to $0.10 per Task
+
+The economics of AI agents change when you're running them in batch mode. We track cost per task obsessively, and the numbers are surprisingly low.
+
+Most tasks cost between **$0.02 and $0.10** in API credits. Here's the breakdown for a typical implementation task:
+
+- **Planning phase** (~3,000 tokens in, ~1,000 tokens out): $0.005
+- **Code phase** (~8,000 tokens in, ~3,000 tokens out): $0.015
+- **Test/fix iteration** (~5,000 tokens in, ~2,000 tokens out): $0.010
+- **PR writeup** (~2,000 tokens in, ~1,000 tokens out): $0.005
+
+Total: ~$0.035 for a small bug fix with a test. Larger features with multiple iterations run $0.08--0.15. Triage and investigation tasks (read-only, no code changes) are the cheapest at $0.01--0.02.
+
+The key to keeping costs down: **use the right model for the right phase.** We run DeepSeek V4 for coding tasks --- it's fast, cheap, and excellent at structured code generation. For tasks that require visual understanding --- screenshots, UI review, diagram analysis --- we route to Gemini Flash, which handles images at a fraction of the cost of other vision-capable models.
+
+Model selection is automatic. The dashboard routes based on the task type. You don't need to think about it.
+
+## Setup in 5 Minutes
+
+Getting the dashboard running takes longer to read about than to do:
+
+1. **Install the agent CLI.** One command, one binary. Works on Mac, Windows, and Linux.
+2. **Start the dashboard.** `agent dashboard` --- that's it. It binds to `localhost:33634`.
+3. **Open your browser.** Navigate to the URL, and you'll see an empty job queue.
+4. **Type your first task.** "Add input validation to the login form and write tests." Press enter.
+5. **Watch it work.** The agent spawns, reads the codebase, and starts coding. You'll see tool calls stream in real time.
+
+That's the whole setup. No Docker, no Kubernetes, no cloud configuration, no API gateway. One binary, one port, one browser tab. If you already have an API key configured for your coding agent, the dashboard picks it up automatically.
+
+Advanced configuration --- custom models, thinking levels, parallel job limits, git worktree locations --- is all done through a config file. But the defaults work for 90% of use cases. You can go from zero to running your first agent job in the time it takes to brew a coffee.
+
+## What We've Learned
+
+**The browser is a better job manager than the terminal.** This sounds obvious in retrospect, but it took us months to realize. The terminal is for interaction. The browser is for monitoring. When you have multiple agents running, you want tabs, not tmux panes.
+
+**Batch mode changes how you think about agents.** In interactive mode, you ask the agent to do one thing and wait. In batch mode, you queue up everything you need and come back to a pile of open PRs. The mental model shifts from "assistant" to "worker pool."
+
+**Cheap models are good enough for most work.** DeepSeek V4 handles 80% of our coding tasks with quality indistinguishable from premium models, at a tenth of the cost. The remaining 20% --- complex refactors, architectural decisions, anything requiring deep reasoning --- still benefit from premium models. But routing every task to the most expensive model is like commuting in a Formula 1 car.
+
+**Visibility reduces trust issues.** When you can't see what the agent is doing, you don't trust the output. The dashboard's live streaming of tool calls --- every file read, every command run, every test result --- builds confidence. You can see the agent's reasoning, not just its conclusions.
+
+**Git worktrees are the missing primitive.** Without them, you can only run one agent job at a time --- or you get file conflicts. With them, each job gets its own isolated workspace on a dedicated branch. Zero coordination overhead, zero merge conflicts between concurrent jobs. When the agent is done, the worktree is cleaned up automatically.
+
+---
+
+The dashboard is part of how we use AI agents in our own development. It's not a product we're selling --- it's the tool we built because we needed it, and it's available to anyone using the agent. If you're running AI coding agents and you're still staring at terminal output for every task, give the dashboard a try. Open `localhost:33634`, queue up your backlog, and watch the PRs roll in.
+
+We're building all of this as part of [Tinqs Studio](https://tinqs.com) --- a game development platform that brings together git hosting, AI agent tools, and creative workflows for game teams.