blog: refresh posts to current infra state (JS flows, tinqs CLI, steering + human-in-the-loop)

This commit is contained in:
2026-06-10 21:47:13 +01:00
parent aaa788b29f
commit 6cba781083
6 changed files with 83 additions and 85 deletions
+1 -1
View File
@@ -163,7 +163,7 @@
<a href="pi-flow-native-brain" class="blog-card"> <a href="pi-flow-native-brain" class="blog-card">
<span class="blog-card__date">4 June 2026</span> <span class="blog-card__date">4 June 2026</span>
<h2 class="blog-card__title">How Pi Agents Build, Test, and Ship Game Code with Oracle-Backed Flows</h2> <h2 class="blog-card__title">How Pi Agents Build, Test, and Ship Game Code with Oracle-Backed Flows</h2>
<p class="blog-card__excerpt">We type a slash command, agents fan out through five oracle gates, the game-builder fixes 19 red tests while vision judges check the live game — and it all runs as one autonomous flow.</p> <p class="blog-card__excerpt">A flow spawns, agents fan out through five oracle gates, the game-builder fixes 19 red tests while vision judges check the live game — and it all runs as one autonomous flow.</p>
<span class="blog-card__read">Read &rarr;</span> <span class="blog-card__read">Read &rarr;</span>
</a> </a>
{{CARDS}} {{CARDS}}
Before
After
+2 -2
View File
@@ -271,7 +271,7 @@
<p><strong>Identity.</strong> Who the agent is, what it values, how it should behave. Not "you are a helpful assistant" — that's generic and unmoored. A soul file that says "you're working on Ariki, a survival colony sim. The team is four people. Never push to main without review. Prefer existing conventions." Identity creates consistency across sessions.</p> <p><strong>Identity.</strong> Who the agent is, what it values, how it should behave. Not "you are a helpful assistant" — that's generic and unmoored. A soul file that says "you're working on Ariki, a survival colony sim. The team is four people. Never push to main without review. Prefer existing conventions." Identity creates consistency across sessions.</p>
<p><strong>Memory.</strong> What happened last session. What decisions were made. What failed and why. Without memory, every conversation is a cold start — "let me explain the project..." Memory stored as markdown in git means it's version-controlled, diffable, and human-readable. When something goes wrong, you <code>git log</code> instead of debugging a vector database.</p> <p><strong>Memory.</strong> What happened last session. What decisions were made. What failed and why. Without memory, every conversation is a cold start — "let me explain the project..." Memory stored as markdown in git means it's version-controlled, diffable, and human-readable. When something goes wrong, you <code>git log</code> instead of debugging a vector database.</p>
<p><strong>Tools.</strong> What the agent can actually do beyond generating text. A CLI that takes screenshots, checks service health, and loads project context. API wrappers for git, CI, image generation. Without tools, the agent is a very articulate oracle that can't touch anything.</p> <p><strong>Tools.</strong> What the agent can actually do beyond generating text. A CLI that takes screenshots, checks service health, and loads project context. API wrappers for git, CI, image generation. Without tools, the agent is a very articulate oracle that can't touch anything.</p>
<p><strong>Context.</strong> Which project this is. Who's asking. What machine they're on. What services are reachable. A single CLI call — <code>tstudio identity</code> — returns all of this in 100ms. No re-reading the README. No "what repo are we in?"</p> <p><strong>Context.</strong> Which project this is. Who's asking. What machine they're on. What services are reachable. A single CLI call — <code>tinqs identity</code> — returns all of this in 100ms. No re-reading the README. No "what repo are we in?"</p>
<p><strong>Guardrails.</strong> What the agent must never do. No merging to main without review. No pushing to public repos without approval. No running destructive commands. The harness enforces these at the platform layer, not in the prompt. Prompts can be ignored. Platform gates cannot.</p> <p><strong>Guardrails.</strong> What the agent must never do. No merging to main without review. No pushing to public repos without approval. No running destructive commands. The harness enforces these at the platform layer, not in the prompt. Prompts can be ignored. Platform gates cannot.</p>
<h2>Why generic harnesses fail for game dev</h2> <h2>Why generic harnesses fail for game dev</h2>
<p>LangChain, CrewAI, and AutoGen are built for web apps. They assume text-in, text-out. Game development is different in ways that break those assumptions:</p> <p>LangChain, CrewAI, and AutoGen are built for web apps. They assume text-in, text-out. Game development is different in ways that break those assumptions:</p>
@@ -281,7 +281,7 @@
<p><strong>The team is small and cross-functional.</strong> Four people. No dedicated DevOps, no dedicated artist, no dedicated PM. The harness fills all those gaps, not just one.</p> <p><strong>The team is small and cross-functional.</strong> Four people. No dedicated DevOps, no dedicated artist, no dedicated PM. The harness fills all those gaps, not just one.</p>
<h2>The toolchain that makes it work</h2> <h2>The toolchain that makes it work</h2>
<p>Our harness runs on <a href="https://tinqs.com" style="color: var(&ndash;c-accent-l);">Tinqs Studio</a>, built on a Gitea fork with game-specific features. The key pieces:</p> <p>Our harness runs on <a href="https://tinqs.com" style="color: var(&ndash;c-accent-l);">Tinqs Studio</a>, built on a Gitea fork with game-specific features. The key pieces:</p>
<p><strong>The CLI</strong> — a single Go binary. One command (<code>tstudio identity</code>) gives the agent full project context in 100ms. Screenshots, cloud vision, health checks — all subcommands of the same binary.</p> <p><strong>The CLI</strong> — a single Go binary. One command (<code>tinqs identity</code>) gives the agent full project context in 100ms. Screenshots, cloud vision, health checks — all subcommands of the same binary.</p>
<p><strong>The soul file</strong> — a markdown document in the repo root. The agent reads it on session start. It defines values, scope, and behavioural rules. The same soul file works in Cursor, Claude Code, or any tool that reads markdown.</p> <p><strong>The soul file</strong> — a markdown document in the repo root. The agent reads it on session start. It defines values, scope, and behavioural rules. The same soul file works in Cursor, Claude Code, or any tool that reads markdown.</p>
<p><strong>Skills</strong> — markdown playbooks for specific workflows. Image generation, concept art pipeline, 3D model creation, video generation. Each skill is a procedure the agent follows. Write once, use forever.</p> <p><strong>Skills</strong> — markdown playbooks for specific workflows. Image generation, concept art pipeline, 3D model creation, video generation. Each skill is a procedure the agent follows. Write once, use forever.</p>
<p><strong>3D preview</strong> — click a <code>.glb</code> file in a PR and rotate the model in your browser. 22 formats supported. This alone transformed our review process — nobody approves a binary diff blind anymore.</p> <p><strong>3D preview</strong> — click a <code>.glb</code> file in a PR and rotate the model in your browser. 22 formats supported. This alone transformed our review process — nobody approves a binary diff blind anymore.</p>
Before
After
+68 -70
View File
@@ -372,11 +372,11 @@
<div class="callout"> <div class="callout">
<span class="callout__kicker">The Kitchen ↔ Flows Analogy</span> <span class="callout__kicker">The Kitchen ↔ Flows Analogy</span>
<p><strong>The kitchen</strong> = Pi (the agent harness). <strong>The recipe</strong> = a flow YAML (the DAG). <strong>The line cooks</strong> = agents (each with a station and tools). <strong>The pass</strong> = the flow engine (routes finished work). <strong>The head chef's inspection</strong> = the five gates. <strong>The order ticket</strong> = a slash command. <strong>"Send it back!"</strong> = the fix loop.</p> <p><strong>The kitchen</strong> = Pi (the agent harness). <strong>The recipe</strong> = a JavaScript flow (<code>.flow.mjs</code>). <strong>The line cooks</strong> = agents (each with a station and tools). <strong>The pass</strong> = the flow engine (routes finished work). <strong>The head chef's inspection</strong> = the five gates. <strong>The order ticket</strong> = a spawn task or <code>tinqs flow run</code>. <strong>"Send it back!"</strong> = the fix loop.</p>
</div> </div>
<h2>What Happens When You Type a Slash Command</h2> <h2>What Happens When You Spawn a Flow</h2>
<p>You type <code>/game-feature add a double-jump with cooldown</code> and hit enter. The ticket hits the kitchen. What follows is not one agent doing everything — it's a brigade running their stations.</p> <p>You run <code>tinqs flow run game-feature --task 'add a double-jump with cooldown'</code> or click Run Flow on the dashboard. The ticket hits the kitchen. What follows is not one agent doing everything — it's a brigade running their stations.</p>
<figure style="margin:28px 0;"> <figure style="margin:28px 0;">
<svg viewBox="0 0 920 350" role="img" aria-label="The verify-heavy flow: context, plan, implement, five gates, a Reflexion loop, and one judge" style="width:100%;height:auto;display:block;background:#0a0e14;border:1px solid #2a3340;border-radius:12px;font-family:'IBM Plex Sans',system-ui,sans-serif;"> <svg viewBox="0 0 920 350" role="img" aria-label="The verify-heavy flow: context, plan, implement, five gates, a Reflexion loop, and one judge" style="width:100%;height:auto;display:block;background:#0a0e14;border:1px solid #2a3340;border-radius:12px;font-family:'IBM Plex Sans',system-ui,sans-serif;">
@@ -467,7 +467,7 @@
</div> </div>
<h2>Composability: Adding a New Station</h2> <h2>Composability: Adding a New Station</h2>
<p>A kitchen doesn't redesign the whole line when they add a new dish. They add a station. Same in flows. Started with three gates — build, test, vision. Behaviour and feel came later, each a single-file extension. Gates aren't hardcoded. They're sub-agents declared in YAML. Want a linting gate? Add a sub-agent with a linter. Security scan? Same pattern. Asset bundle size check? Write the tool, declare the agent, wire it in.</p> <p>A kitchen doesn't redesign the whole line when they add a new dish. They add a station. Same in flows. Started with three gates — build, test, vision. Behaviour and feel came later, each a single-file extension. Gates aren't hardcoded. They're sub-agents called from JavaScript flows. Want a linting gate? Add an <code>agent()</code> call with a linter. Security scan? Same pattern. Asset bundle size check? Write the tool, declare the agent, wire it in.</p>
<div class="callout callout--purple"> <div class="callout callout--purple">
<span class="callout__kicker">Self-Improving Kitchen</span> <span class="callout__kicker">Self-Improving Kitchen</span>
@@ -538,17 +538,17 @@
<div class="callout callout--amber"> <div class="callout callout--amber">
<span class="callout__kicker">Flow 1 · 4 June, 18:32</span> <span class="callout__kicker">Flow 1 · 4 June, 18:32</span>
<p><strong>/deep-implement</strong> — "Build the tinqs-gitea-read extension: list_org_repos, read_repo_file, list_repo_dir, search_repos." Nine steps, 14 minutes. Verdict: <span class="gate gate--test">PASS</span>. 31/31 vitest tests green, zero new TypeScript errors, session-level caching, path traversal protection. Every <code>execute()</code> body fully wired — no stubs, no placeholders. Like a saucier who doesn't just list ingredients but actually makes the sauce.</p> <p><strong>deep-implement</strong> — "Build the tinqs-gitea-read extension: list_org_repos, read_repo_file, list_repo_dir, search_repos." Nine steps, 14 minutes. Verdict: <span class="gate gate--test">PASS</span>. 31/31 vitest tests green, zero new TypeScript errors, session-level caching, path traversal protection. Every <code>execute()</code> body fully wired — no stubs, no placeholders. Like a saucier who doesn't just list ingredients but actually makes the sauce.</p>
</div> </div>
<div class="callout callout--purple"> <div class="callout callout--purple">
<span class="callout__kicker">Flow 2 · 4 June, 19:04</span> <span class="callout__kicker">Flow 2 · 4 June, 19:04</span>
<p><strong>/game-feature</strong> — "Make the player jump." Build: <span class="gate gate--build">PASS</span>. Tests: <span class="gate gate--test">PASS</span>. Behaviour/Feel/Visual: <span style="color:#f59e0b;">NOT RUN</span> — no live game instance was reachable. The flow didn't silently skip the visual gate. It <strong>hard-stopped</strong> and reported honestly: "FAIL — the feature has not been verified in-game." This is the kitchen saying: "The dish is cooked, but nobody tasted it. I'm not sending it out."</p> <p><strong>game-feature</strong> — "Make the player jump." Build: <span class="gate gate--build">PASS</span>. Tests: <span class="gate gate--test">PASS</span>. Behaviour/Feel/Visual: <span style="color:#f59e0b;">NOT RUN</span> — no live game instance was reachable. The flow didn't silently skip the visual gate. It <strong>hard-stopped</strong> and reported honestly: "FAIL — the feature has not been verified in-game." This is the kitchen saying: "The dish is cooked, but nobody tasted it. I'm not sending it out."</p>
</div> </div>
<div class="callout callout--amber"> <div class="callout callout--amber">
<span class="callout__kicker">Flow 3 · 4 June, 19:49</span> <span class="callout__kicker">Flow 3 · 4 June, 19:49</span>
<p><strong>/cto-infra</strong> — "Synthesize cost, stability, and VCS research into an AWS architecture decision." Four research streams fed into one CTO agent. Output: 14 requirements mapped to specific decisions, cost-vs-stability tradeoffs resolved with dollar figures, EC2+EBS over Fargate+EFS, RDS Multi-AZ mandatory, S3+CloudFront for LFS. Like an executive chef reading four menu proposals, reconciling them into one service, and pricing every plate.</p> <p><strong>cto-infra</strong> — "Synthesize cost, stability, and VCS research into an AWS architecture decision." Four research streams fed into one CTO agent. Output: 14 requirements mapped to specific decisions, cost-vs-stability tradeoffs resolved with dollar figures, EC2+EBS over Fargate+EFS, RDS Multi-AZ mandatory, S3+CloudFront for LFS. Like an executive chef reading four menu proposals, reconciling them into one service, and pricing every plate.</p>
</div> </div>
<hr class="accent"> <hr class="accent">
@@ -556,61 +556,59 @@
<h2>Dinner Rush Recovery: The Crash That Interrupted Service</h2> <h2>Dinner Rush Recovery: The Crash That Interrupted Service</h2>
<p>Earlier today, a machine crash cut off a flow mid-stream — the kitchen lost power during dinner rush. Nineteen tests were left red. Contracts written, implementation half-done. Half-cooked dishes on every station.</p> <p>Earlier today, a machine crash cut off a flow mid-stream — the kitchen lost power during dinner rush. Nineteen tests were left red. Contracts written, implementation half-done. Half-cooked dishes on every station.</p>
<p>I typed one slash command — the expediter reassembled the brigade:</p> <p>I spawned the same flow with a different task:</p>
<pre><code>/game-feature Finish the leftover jump & locomotion animation work make the 19 FAILING tests GREEN.</code></pre> <pre><code>tinqs flow run game-feature --task 'Finish the leftover jump & locomotion animation work -- make the 19 FAILING tests GREEN.'</code></pre>
<p>What happened next: the team picked up exactly where the crash left off. Here's the recipe — the exact YAML that runs in production:</p> <p>What happened next: the team picked up exactly where the crash left off. Here's the recipe — the exact JavaScript that runs in production:</p>
<pre><code>name: game-feature <pre><code>// .pi/flows/flows/game-feature.flow.mjs
description: Build a PLAYABLE game feature and prove it in the LIVE game. export const meta = {
task_required: true name: &quot;game-feature&quot;,
description: &quot;Build a PLAYABLE game feature and prove it in the LIVE game.&quot;,
task_required: true
};
steps: export default async function run({ task, flow }) {
# G0: Pre-flight — validate vision CAN run before any build work // G0: Pre-flight — validate vision CAN run before any build work
- id: preflight await flow.agent(&quot;vision-preflight&quot;, {
agent: vision-preflight task: &quot;Check GEMINI_API_KEY is set AND game_frames reaches a live instance.&quot;
task: Check GEMINI_API_KEY is set AND game_frames reaches a live instance. });
If EITHER fails, STOP — vision is not optional.
# Context + plan // Context + plan
- id: context const context = await flow.agent(&quot;project-context-reader&quot;);
agent: project-context-reader const plan = await flow.agent(&quot;feature-planner&quot;, { context });
blockedBy: [preflight]
- id: plan // TDD: write tests FIRST (different agent than implementer)
agent: feature-planner const testSuite = await flow.agent(&quot;test-author&quot;, { plan });
blockedBy: [context]
# TDD: write tests FIRST (different agent than implementer) // Implement
- id: test-author const source = await flow.agent(&quot;game-builder&quot;, { testSuite, plan });
agent: test-author
blockedBy: [plan]
- id: implement // G1G5: Oracle gates run via parallel for speed
agent: game-builder const gates = await flow.parallel([
blockedBy: [test-author] flow.agent(&quot;build-verifier&quot;, { source }),
flow.agent(&quot;test-runner&quot;, { source }),
flow.agent(&quot;behavioral-prober&quot;, { source }),
flow.agent(&quot;feel-judge&quot;, { source }),
flow.agent(&quot;animation-vision-judge&quot;, { source })
]);
# G1G5: Oracle gates (build, tests, behaviour, feel, visual) // Self-recurring fix-loop: bounded loop back to implement with evidence
- id: build → agent: build-verifier const MAX_RETRIES = 3;
- id: tests → agent: test-runner for (let attempt = 1; attempt &lt;= MAX_RETRIES; attempt++) {
- id: behavior → agent: behavioral-prober (drives LIVE game via drive_game) const decision = await flow.agent(&quot;flow-decision&quot;, { gates });
- id: feel → agent: feel-judge (apex, airtime, latency, rise/fall) if (decision.verdict === &quot;pass&quot;) break;
- id: visual → agent: animation-vision-judge (multimodal gemini-2.5-flash) if (attempt === MAX_RETRIES) {
const fixed = await flow.agent(&quot;game-builder&quot;, { source, failures: decision.evidence });
}
}
# Self-recurring fix-loop: bounded loop back to implement with evidence // Final judge: one honest verdict
- id: fix-loop return flow.agent(&quot;game-judge&quot;);
type: agent-loop-decision }</code></pre>
agent: flow-decision
loop_target: implement
exit_target: report
max_iterations: 3
# Final judge: one honest verdict <p>Eight logical steps, seven cooks, five inspection points, one head chef. Triggered by a single spawn.</p>
- id: report
agent: game-judge</code></pre>
<p>Eighteen steps, seven cooks, five inspection points, one head chef. Triggered by a single order ticket.</p>
<p>Here's how the brigade actually worked. The <strong>vision-preflight</strong> agent — the chef who checks the gas is on before anyone starts cooking — verified <code>GEMINI_API_KEY</code> was set and <code>game_frames</code> could reach the live game. Both green in under a second. Without this, the whole kitchen would prep for an hour only to discover the oven doesn't work.</p> <p>Here's how the brigade actually worked. The <strong>vision-preflight</strong> agent — the chef who checks the gas is on before anyone starts cooking — verified <code>GEMINI_API_KEY</code> was set and <code>game_frames</code> could reach the live game. Both green in under a second. Without this, the whole kitchen would prep for an hour only to discover the oven doesn't work.</p>
@@ -624,29 +622,29 @@ steps:
<div class="callout"> <div class="callout">
<span class="callout__kicker">Not a Demo</span> <span class="callout__kicker">Not a Demo</span>
<p>This flow is a file at <code>.pi/flows/flows/game-feature.yaml</code>. I trigger it by typing <code>/game-feature</code> in Pi. It dispatches agents, runs gates, loops on failures, reports a verdict. There is no dashboard with drag-and-drop. There is a YAML file and a slash command. That's the whole product.</p> <p>This flow is a file at <code>.pi/flows/flows/game-feature.flow.mjs</code>. I trigger it by running <code>tinqs flow run game-feature</code> or clicking Run Flow on the dashboard. It dispatches agents, runs gates, loops on failures, reports a verdict. The dashboard at <code>:33634</code> is the control plane — spawn, steer mid-run, inspect state. That's the whole product.</p>
</div> </div>
<hr class="accent"> <hr class="accent">
<h2>The Menu: Flows Are Slash Commands</h2> <h2>The Menu: Flows at Your Fingertips</h2>
<p>Every flow becomes a slash command — the menu you read to the expediter. <code>.pi/flows/flows/game-feature.yaml</code><code>/game-feature</code>. You don't invoke a pipeline from a terminal. You order a dish in conversation.</p> <p>Every flow lives in <code>.pi/flows/flows/*.flow.mjs</code> and is spawnable by name. You run <code>tinqs flow run &lt;name&gt; [task]</code> or click Run Flow on the dashboard.</p>
<p>"Add wall-running" is not a CLI flag. It's natural language. The flow reads it, wires it through the agents, routes it through the gates. The YAML is the recipe. The conversation is the context.</p> <p>"Add wall-running" becomes the task argument. The flow reads it, wires it through the agents, routes it through the gates. The JavaScript is the recipe. The conversation provides the context.</p>
<p>The menu I call from daily:</p> <p>The menu I call from daily:</p>
<ul> <ul>
<li><strong>/game-feature</strong> — "add a double-jump" or "fix the 19 red tests" → brigade assembles, cooks, inspects, plates</li> <li><strong>game-feature</strong> — "add a double-jump" or "fix the 19 red tests" → brigade assembles, cooks, inspects, plates</li>
<li><strong>/deep-implement</strong> — "build the gitea-read extension" → research → plan → implement → test → review → judge</li> <li><strong>deep-implement</strong> — "build the gitea-read extension" → research → plan → implement → test → review → judge</li>
<li><strong>/cto-infra</strong> — "reconcile cost, stability, and VCS research into architecture decisions" → 4 research streams → 1 synthesis agent → 14 requirements mapped to decisions</li> <li><strong>cto-infra</strong> — "reconcile cost, stability, and VCS research into architecture decisions" → 4 research streams → 1 synthesis agent → 14 requirements mapped to decisions</li>
<li><strong>/flows:new</strong> — "I need a flow that..." → the Flow Architect reads the agent catalog, selects cooks, designs the recipe, writes the YAML</li> <li><strong>flows:new</strong> — "I need a flow that..." → the Flow Architect reads the agent catalog, selects cooks, designs the recipe, writes the <code>.flow.mjs</code></li>
</ul> </ul>
<h2>The Pass: How Agents Hand Off Work</h2> <h2>The Pass: How Agents Hand Off Work</h2>
<p>In a real kitchen, cooks don't shout instructions across the room. They place finished plates on the pass. The expediter reads the ticket, checks the plate, routes it to the next station or to the dining room. Nobody yells. Nobody grabs someone else's pan.</p> <p>In a real kitchen, cooks don't shout instructions across the room. They place finished plates on the pass. The expediter reads the ticket, checks the plate, routes it to the next station or to the dining room. Nobody yells. Nobody grabs someone else's pan.</p>
<p>Flows work the same way. Agents never talk to each other directly. When the game-builder finishes, it doesn't ping the test-runner. It calls <code>finish({ summary: "...", artifacts: "...", files: "..." })</code> — placing its work on the pass. The flow engine — the expediter — records it and routes it. The next agent receives exactly the inputs wired in the YAML: <code>${{result.game-builder.summary}}</code>, <code>${{result.game-builder.files}}</code>.</p> <p>Flows work the same way. Agents never talk to each other directly. When the game-builder finishes, it returns a result object — placing its work on the pass. The flow engine — the expediter — records it and routes it. The next agent receives the return value directly from <code>await flow.agent("game-builder")</code>.</p>
<div class="kitchen-grid"> <div class="kitchen-grid">
<div class="kitchen-col"> <div class="kitchen-col">
@@ -655,11 +653,11 @@ steps:
</div> </div>
<div class="kitchen-col"> <div class="kitchen-col">
<span class="kitchen-col__title kitchen-col__title--reality">What Actually Happens</span> <span class="kitchen-col__title kitchen-col__title--reality">What Actually Happens</span>
<p>Agent A <code>finish({verdict: "pass", findings: ["coyote_time=100ms"]})</code> → engine records → Agent B receives <code>${{result.A.findings}}</code> via <code>inputs:</code> block. No chatter. Structured handoff.</p> <p>Agent A returns <code>{ verdict: "pass", findings: ["coyote_time=100ms"] }</code> flow engine records it → Agent B receives the result as a direct return value of <code>await flow.agent("A")</code>. No chatter. Structured handoff.</p>
</div> </div>
</div> </div>
<p>Why? Because unstructured chatter is how hallucination cascades start. Agent A confidently states something wrong. Agent B builds on it. Agent C compounds it. Three agents later, they're collectively wrong about a file that doesn't exist, and nobody can trace where the error came from. The pass — structured result-passing with typed outputs — makes every handoff auditable, verifiable, and debuggable.</p> <p>Why? Because unstructured chatter is how hallucination cascades start. Agent A confidently states something wrong. Agent B builds on it. Agent C compounds it. Three agents later, they're collectively wrong about a file that doesn't exist, and nobody can trace where the error came from. The pass — structured result-passing via typed return values from each <code>agent()</code> call — makes every handoff auditable, verifiable, and debuggable.</p>
<p>Pi itself is built for solo interactive work: you ask, it does, you review. The orchestration layer I wrote on top inverts that. Pi becomes the kitchen. The flow engine becomes the expediter. Agents become line cooks who place plates on the pass, never shouting across the room.</p> <p>Pi itself is built for solo interactive work: you ask, it does, you review. The orchestration layer I wrote on top inverts that. Pi becomes the kitchen. The flow engine becomes the expediter. Agents become line cooks who place plates on the pass, never shouting across the room.</p>
@@ -700,17 +698,17 @@ outputs: [summary, files]
You are a game developer. Task: ${{task}} You are a game developer. Task: ${{task}}
Context: ${{input.context}}</code></pre> Context: ${{input.context}}</code></pre>
<p><strong style="color:#f59e0b;">Flows</strong> are YAML DAGs that wire agents together. I have about <strong>1520 flows</strong> running across different domains:</p> <p><strong style="color:#f59e0b;">Flows</strong> are JavaScript modules (<code>.flow.mjs</code>) that coordinate agents with real control flow. I have about <strong>1520 flows</strong> running across different domains:</p>
<ul> <ul>
<li><strong>Game dev:</strong> /game-feature, /review, /bug-hunt, /refactor</li> <li><strong>Game dev:</strong> game-feature, review, bug-hunt, refactor</li>
<li><strong>Design:</strong> /concept-art, /sound-design (plans → ElevenLabs generation → judge evaluates with other models)</li> <li><strong>Design:</strong> concept-art, sound-design (plans → ElevenLabs generation → judge evaluates with other models)</li>
<li><strong>Marketing:</strong> /brand-image, /trailer-clip (Sora 2 video generation → vision judge)</li> <li><strong>Marketing:</strong> brand-image, trailer-clip (Sora 2 video generation → vision judge)</li>
<li><strong>Infra:</strong> /ci-fix, /deploy-check, /tstudio-jobs (action runners on AWS Lambda, workspace management)</li> <li><strong>Infra:</strong> ci-fix, deploy-check, tinqs-jobs (action runners on AWS Lambda, workspace management)</li>
<li><strong>Meta:</strong> A flow that periodically reads and improves the other flows — yes, flows that edit flows</li> <li><strong>Meta:</strong> A flow that periodically reads and improves the other flows — yes, flows that edit flows</li>
</ul> </ul>
<p>The setup is not a product you install. It's a stack: Pi as the agent harness, custom extensions as the tool layer, markdown agents as the role layer, YAML flows as the orchestration layer. The whole thing lives in <code>.pi/flows/</code>. Version-controlled. CI-tested. Slash-command invoked.</p> <p>The setup is not a product you install. It's a stack: Pi as the agent harness, custom extensions as the tool layer, markdown agents as the role layer, JavaScript flows as the orchestration layer. The whole thing lives in <code>.pi/flows/</code>. Version-controlled. CI-tested. Spawned via <code>tinqs flow run</code> or the dashboard.</p>
<h2>The Recipe vs. The Technique</h2> <h2>The Recipe vs. The Technique</h2>
<p>"Do you define the process with these trees, or do the agents freestyle?" Both. The recipe says what to make and in what order. The technique is how each cook executes their station.</p> <p>"Do you define the process with these trees, or do the agents freestyle?" Both. The recipe says what to make and in what order. The technique is how each cook executes their station.</p>
@@ -718,7 +716,7 @@ Context: ${{input.context}}</code></pre>
<div class="kitchen-grid"> <div class="kitchen-grid">
<div class="kitchen-col"> <div class="kitchen-col">
<span class="kitchen-col__title kitchen-col__title--kitchen">The Recipe (Rigid)</span> <span class="kitchen-col__title kitchen-col__title--kitchen">The Recipe (Rigid)</span>
<p>The flow YAML is the recipe. It says: first the prep cook dices onions, then the saucier makes the base, then the grill cook sears the protein. After every station, the plate hits the pass for inspection. <strong>This order is not negotiable.</strong> A cook cannot skip the inspection because they feel confident. The inspection runs. Period.</p> <p>The flow's JavaScript is the recipe. It says: first the prep cook dices onions, then the saucier makes the base, then the grill cook sears the protein. After every station, the plate hits the pass for inspection. <strong>This order is not negotiable.</strong> A cook cannot skip the inspection because they feel confident. The inspection runs. Period.</p>
</div> </div>
<div class="kitchen-col"> <div class="kitchen-col">
<span class="kitchen-col__title kitchen-col__title--reality">The Technique (Autonomous)</span> <span class="kitchen-col__title kitchen-col__title--reality">The Technique (Autonomous)</span>
@@ -730,7 +728,7 @@ Context: ${{input.context}}</code></pre>
<div class="callout callout--purple"> <div class="callout callout--purple">
<span class="callout__kicker">The Meta-Kitchen</span> <span class="callout__kicker">The Meta-Kitchen</span>
<p>And when a recipe is wrong? Another flow improves it. A meta-flow reads performance data, spots bottlenecks — "the feel gate keeps failing because the cook doesn't know the jump velocity threshold" — edits the YAML to wire that threshold into the builder's inputs, and commits the change. <strong>Flows that edit flows.</strong> The kitchen that renovates itself between services.</p> <p>And when a recipe is wrong? Another flow improves it. A meta-flow reads performance data, spots bottlenecks — "the feel gate keeps failing because the cook doesn't know the jump velocity threshold" — edits the <code>.flow.mjs</code> to pass that threshold into the builder's inputs, and commits the change. <strong>Flows that edit flows.</strong> The kitchen that renovates itself between services.</p>
</div> </div>
<hr class="accent"> <hr class="accent">
Before
After
+2 -2
View File
@@ -24,7 +24,7 @@ Every agent harness, regardless of domain, needs five things:
**Tools.** What the agent can actually do beyond generating text. A CLI that takes screenshots, checks service health, and loads project context. API wrappers for git, CI, image generation. Without tools, the agent is a very articulate oracle that can't touch anything. **Tools.** What the agent can actually do beyond generating text. A CLI that takes screenshots, checks service health, and loads project context. API wrappers for git, CI, image generation. Without tools, the agent is a very articulate oracle that can't touch anything.
**Context.** Which project this is. Who's asking. What machine they're on. What services are reachable. A single CLI call — `tstudio identity` — returns all of this in 100ms. No re-reading the README. No "what repo are we in?" **Context.** Which project this is. Who's asking. What machine they're on. What services are reachable. A single CLI call — `tinqs identity` — returns all of this in 100ms. No re-reading the README. No "what repo are we in?"
**Guardrails.** What the agent must never do. No merging to main without review. No pushing to public repos without approval. No running destructive commands. The harness enforces these at the platform layer, not in the prompt. Prompts can be ignored. Platform gates cannot. **Guardrails.** What the agent must never do. No merging to main without review. No pushing to public repos without approval. No running destructive commands. The harness enforces these at the platform layer, not in the prompt. Prompts can be ignored. Platform gates cannot.
@@ -44,7 +44,7 @@ LangChain, CrewAI, and AutoGen are built for web apps. They assume text-in, text
Our harness runs on [Tinqs Studio](https://tinqs.com), built on a Gitea fork with game-specific features. The key pieces: Our harness runs on [Tinqs Studio](https://tinqs.com), built on a Gitea fork with game-specific features. The key pieces:
**The CLI** — a single Go binary. One command (`tstudio identity`) gives the agent full project context in 100ms. Screenshots, cloud vision, health checks — all subcommands of the same binary. **The CLI** — a single Go binary. One command (`tinqs identity`) gives the agent full project context in 100ms. Screenshots, cloud vision, health checks — all subcommands of the same binary.
**The soul file** — a markdown document in the repo root. The agent reads it on session start. It defines values, scope, and behavioural rules. The same soul file works in Cursor, Claude Code, or any tool that reads markdown. **The soul file** — a markdown document in the repo root. The agent reads it on session start. It defines values, scope, and behavioural rules. The same soul file works in Cursor, Claude Code, or any tool that reads markdown.
+5 -5
View File
@@ -12,11 +12,11 @@ author_role: "CTO & Developer, Tinqs"
--- ---
Every AI agent session starts the same way: cold. The agent doesn't know what project this is, who's asking, what tools are available, or what happened yesterday. You spend the first five minutes re-explaining context. Every AI agent session starts the same way: cold. The agent doesn't know what project this is, who's asking, what tools are available, or what happened yesterday. You spend the first five minutes re-explaining context.
Our CLI solves this in 100ms. One command — `tstudio identity` — and the agent knows everything. The binary is 15MB, has zero runtime dependencies, and runs on every machine in the studio. Our CLI solves this in 100ms. One command — `tinqs identity` — and the agent knows everything. The binary is 15MB, has zero runtime dependencies, and runs on every machine in the studio.
## The identity command (100ms) ## The identity command (100ms)
When an agent starts, the first thing it calls is `tstudio identity`. The output: When an agent starts, the first thing it calls is `tinqs identity`. The output:
- **Soul file** — the agent's persistent identity, values, operating principles - **Soul file** — the agent's persistent identity, values, operating principles
- **Company context** — team members, roles, what the company does - **Company context** — team members, roles, what the company does
@@ -26,7 +26,7 @@ When an agent starts, the first thing it calls is `tstudio identity`. The output
This data lives in markdown files in the docs repo. Any machine on the network can read it. The agent goes from blank to fully contextual in under a second. This data lives in markdown files in the docs repo. Any machine on the network can read it. The agent goes from blank to fully contextual in under a second.
This started as a convenience tool for humans. It became the single most important function in our stack. Every agent session — Cursor, Claude Code, Pi — starts with `tstudio identity`. Without it, every conversation begins with "let me explain the project." With it, the agent already knows. This started as a convenience tool for humans. It became the single most important function in our stack. Every agent session — Cursor, Claude Code, Pi — starts with `tinqs identity`. Without it, every conversation begins with "let me explain the project." With it, the agent already knows.
## Screenshots and cloud vision ## Screenshots and cloud vision
@@ -38,7 +38,7 @@ This is how you file bugs without typing. Look at the game, tell the agent what'
## Health checks ## Health checks
`tstudio doctor` runs a comprehensive check: `tinqs doctor` runs a comprehensive check:
- Is the git platform reachable and authenticated? - Is the git platform reachable and authenticated?
- Is the game server running? - Is the game server running?
@@ -55,7 +55,7 @@ Cross-compilation is trivial. We build Windows, Mac (arm64 + amd64), and Linux b
## What we learned ## What we learned
**The CLI is the API for AI agents.** What started as a human convenience tool became the primary interface for agents. Every session starts with `tstudio identity`. The agent's "hands and eyes" — screenshots, vision, health checks — are subcommands of the same binary. **The CLI is the API for AI agents.** What started as a human convenience tool became the primary interface for agents. Every session starts with `tinqs identity`. The agent's "hands and eyes" — screenshots, vision, health checks — are subcommands of the same binary.
**One binary beats ten scripts.** Scripts rot. They have different shells, different PATH assumptions, different error handling. A compiled binary either works or it doesn't. It ships with dependencies baked in. It doesn't care if your Python is 3.9 or 3.12. **One binary beats ten scripts.** Scripts rot. They have different shells, different PATH assumptions, different error handling. A compiled binary either works or it doesn't. It ships with dependencies baked in. It doesn't care if your Python is 3.9 or 3.12.
+5 -5
View File
@@ -265,9 +265,9 @@
<p class="post__lead">Every AI agent session starts the same way: cold. The agent doesn't know what project this is, who's asking, what tools are available, or what happened yesterday. You spend the first five minutes re-explaining context.</p> <p class="post__lead">Every AI agent session starts the same way: cold. The agent doesn't know what project this is, who's asking, what tools are available, or what happened yesterday. You spend the first five minutes re-explaining context.</p>
<div class="post__body"> <div class="post__body">
<p>Our CLI solves this in 100ms. One command — <code>tstudio identity</code> — and the agent knows everything. The binary is 15MB, has zero runtime dependencies, and runs on every machine in the studio.</p> <p>Our CLI solves this in 100ms. One command — <code>tinqs identity</code> — and the agent knows everything. The binary is 15MB, has zero runtime dependencies, and runs on every machine in the studio.</p>
<h2>The identity command (100ms)</h2> <h2>The identity command (100ms)</h2>
<p>When an agent starts, the first thing it calls is <code>tstudio identity</code>. The output:</p> <p>When an agent starts, the first thing it calls is <code>tinqs identity</code>. The output:</p>
<ul> <ul>
<li><strong>Soul file</strong> — the agent's persistent identity, values, operating principles</li> <li><strong>Soul file</strong> — the agent's persistent identity, values, operating principles</li>
<li><strong>Company context</strong> — team members, roles, what the company does</li> <li><strong>Company context</strong> — team members, roles, what the company does</li>
@@ -276,13 +276,13 @@
<li><strong>Service status</strong> — which URLs are live and reachable</li> <li><strong>Service status</strong> — which URLs are live and reachable</li>
</ul> </ul>
<p>This data lives in markdown files in the docs repo. Any machine on the network can read it. The agent goes from blank to fully contextual in under a second.</p> <p>This data lives in markdown files in the docs repo. Any machine on the network can read it. The agent goes from blank to fully contextual in under a second.</p>
<p>This started as a convenience tool for humans. It became the single most important function in our stack. Every agent session — Cursor, Claude Code, Pi — starts with <code>tstudio identity</code>. Without it, every conversation begins with "let me explain the project." With it, the agent already knows.</p> <p>This started as a convenience tool for humans. It became the single most important function in our stack. Every agent session — Cursor, Claude Code, Pi — starts with <code>tinqs identity</code>. Without it, every conversation begins with "let me explain the project." With it, the agent already knows.</p>
<h2>Screenshots and cloud vision</h2> <h2>Screenshots and cloud vision</h2>
<p>The CLI can capture any window from outside the process. No in-game overlay, no rendering pipeline integration. OS-level capture — GDI+ on Windows, screencapture on Mac.</p> <p>The CLI can capture any window from outside the process. No in-game overlay, no rendering pipeline integration. OS-level capture — GDI+ on Windows, screencapture on Mac.</p>
<p>A <code>photo</code> command sends the screenshot to a cloud vision model. The agent says "take a photo of the game" and gets back: "The player character is standing near a half-built hut. Three palm trees to the left. The terrain has a visible seam between two biomes."</p> <p>A <code>photo</code> command sends the screenshot to a cloud vision model. The agent says "take a photo of the game" and gets back: "The player character is standing near a half-built hut. Three palm trees to the left. The terrain has a visible seam between two biomes."</p>
<p>This is how you file bugs without typing. Look at the game, tell the agent what's wrong. It takes a screenshot, describes what it sees, and creates an issue with both the description and the image attached. Keyboard-free bug reporting.</p> <p>This is how you file bugs without typing. Look at the game, tell the agent what's wrong. It takes a screenshot, describes what it sees, and creates an issue with both the description and the image attached. Keyboard-free bug reporting.</p>
<h2>Health checks</h2> <h2>Health checks</h2>
<p><code>tstudio doctor</code> runs a comprehensive check:</p> <p><code>tinqs doctor</code> runs a comprehensive check:</p>
<ul> <ul>
<li>Is the git platform reachable and authenticated?</li> <li>Is the git platform reachable and authenticated?</li>
<li>Is the game server running?</li> <li>Is the game server running?</li>
@@ -294,7 +294,7 @@
<p>Go compiles to a single static binary. No Python virtualenvs, no Node.js version managers, no DLL hell on Windows. The same binary runs on a gaming PC, a designer's MacBook, and a CI runner in AWS.</p> <p>Go compiles to a single static binary. No Python virtualenvs, no Node.js version managers, no DLL hell on Windows. The same binary runs on a gaming PC, a designer's MacBook, and a CI runner in AWS.</p>
<p>Cross-compilation is trivial. We build Windows, Mac (arm64 + amd64), and Linux binaries from a single CI workflow. Push a tag, CI builds all three, uploads to S3. The binary is 15MB, starts in under 100ms, has zero runtime dependencies.</p> <p>Cross-compilation is trivial. We build Windows, Mac (arm64 + amd64), and Linux binaries from a single CI workflow. Push a tag, CI builds all three, uploads to S3. The binary is 15MB, starts in under 100ms, has zero runtime dependencies.</p>
<h2>What we learned</h2> <h2>What we learned</h2>
<p><strong>The CLI is the API for AI agents.</strong> What started as a human convenience tool became the primary interface for agents. Every session starts with <code>tstudio identity</code>. The agent's "hands and eyes" — screenshots, vision, health checks — are subcommands of the same binary.</p> <p><strong>The CLI is the API for AI agents.</strong> What started as a human convenience tool became the primary interface for agents. Every session starts with <code>tinqs identity</code>. The agent's "hands and eyes" — screenshots, vision, health checks — are subcommands of the same binary.</p>
<p><strong>One binary beats ten scripts.</strong> Scripts rot. They have different shells, different PATH assumptions, different error handling. A compiled binary either works or it doesn't. It ships with dependencies baked in. It doesn't care if your Python is 3.9 or 3.12.</p> <p><strong>One binary beats ten scripts.</strong> Scripts rot. They have different shells, different PATH assumptions, different error handling. A compiled binary either works or it doesn't. It ships with dependencies baked in. It doesn't care if your Python is 3.9 or 3.12.</p>
<p><strong>Cloud vision is underrated for game dev.</strong> Sending a screenshot to a vision model sounds gimmicky. In practice, it's the fastest way to document visual bugs. "The tree is floating 2m above the terrain" is much faster to communicate when the AI is looking at the same screen.</p> <p><strong>Cloud vision is underrated for game dev.</strong> Sending a screenshot to a vision model sounds gimmicky. In practice, it's the fastest way to document visual bugs. "The tree is floating 2m above the terrain" is much faster to communicate when the AI is looking at the same screen.</p>
<p><strong>Agent cold starts are the real problem.</strong> Without the identity system, every session starts with the agent asking "what project is this?" With it, the agent knows everything in 100ms. That's the difference between an AI assistant and an AI team member.</p> <p><strong>Agent cold starts are the real problem.</strong> Without the identity system, every session starts with the agent asking "what project is this?" With it, the agent knows everything in 100ms. That's the difference between an AI assistant and an AI team member.</p>
Before
After