post: GPU-skinned herds — agent_skinned renderer + engine private, builds public

2026-06-14 01:19:46 +01:00
parent f762ad52a3
commit b8c3fc473b
17 changed files with 1259 additions and 185 deletions
@@ -0,0 +1,133 @@
+---
+title: "Flows Are Sessions, Not Pipelines: Why We Moved Our Agent Orchestrator from YAML to JavaScript"
+slug: flows-are-sessions
+date: "2026-06-11"
+description: "We killed the static YAML DAG and rewrote our agent orchestration in 200 lines of JavaScript. Now a flow IS a session — you chat it, steer it, and it pauses for you at a human gate."
+og_description: "YAML DAGs are dead. We rewrote our agent orchestration in JavaScript, made every flow a live session, and added a human-in-the-loop gate. The operator is the co-pilot, not the babysitter."
+og_image: "https://www.tinqs.com/img/og-cover.jpg"
+excerpt: "We killed the static YAML DAG and rewrote our agent orchestration in 200 lines of JavaScript. Now a flow IS a session — you chat it, steer it, and it pauses for you at a human gate."
+author: "Ozan Bozkurt"
+author_initials: "OB"
+author_role: "CTO & Developer, Tinqs"
+---
+
+Our YAML flow engine had seven bespoke node types just to fake a `while` loop. We threw it out and rewrote everything in 200 lines of JavaScript. The flow engine is gone. The flow IS the session. Here's what we learned.
+
+## The YAML Was a Compiler for a Language Nobody Wanted
+
+The old system was a static DAG. You defined nodes and edges in YAML, the engine walked them top-to-bottom, and when it finished it was done. No mid-run interaction. No branching. No retry. If you wanted a loop, you didn't use a `while` statement — you used an `agent-loop-decision` node type. In YAML.
+
+```yaml
+# The old way: a "loop" was a bespoke node type
+steps:
+  - agent-task:   "generate document"
+  - agent-loop-decision:
+      condition:  "check if quality > 0.8"
+      if-true:    "continue"
+      if-false:   "repeat-step: agent-task"
+```
+
+That's not configuration. That's a compiler for a language nobody wanted to write. Every orchestrator ends up here — GitHub Actions expressions, GitLab CI `rules`, Airflow's `BranchPythonOperator`, all of them start as "simple YAML config" and grow node types until they're Turing-complete nightmares held together by schema patches.
+
+We had seven: `agent-task`, `agent-loop-decision`, `fork`, `conditional`, `agent-join`, `pipeline-stage`, `human-review`. Each one existed because YAML can't express control flow. You weren't writing a flow. You were filing paperwork to describe a flow.
+
+The moment we knew it was wrong: someone asked "can I retry this step three times if it fails?" and the honest answer was "we'd need a new node type." When your config format needs an RFC to add a `for` loop, you've built a programming language by accident. Delete it.
+
+## 200 Lines of JavaScript Replaced Seven Node Types
+
+A flow is now an ES module. You export `default async (flow) => {}` and the runtime calls it. The API surface is five calls:
+
+- `agent(prompt, options)` — run one agent with a task
+- `parallel(thunks)` — run many agents concurrently, await all
+- `pipeline(items, ...stages)` — push items through stages
+- `phase(name)` — label progress for the dashboard
+- `human(config)` — pause and wait for a person
+
+Here's a real flow. It reviews a code route change with parallel researchers and a human gate. Ten lines.
+
+```js
+// .pi/flows/flows/review-routes.flow.mjs
+export const meta = { name: "review-routes", description: "audit routes for missing auth" };
+export default async (flow) => {
+  const { agent, parallel, phase, human, task } = flow;
+  phase("find");
+  const findings = await parallel(["auth", "input"].map((d) => () =>
+    agent(`Review ${d}: ${task}`, { agent: "researcher", model: "@planning" })
+  ));
+  phase("review");
+  const gate = await human({ title: "Eyeball it", prompt: "anything to fix before I finish?" });
+  return { findings, gate };
+};
+```
+
+Why JavaScript wins is boring and fundamental: it has `while`, `if`, `try/catch`, and `parallel(thunks)` built in. The things our YAML needed custom schema types to fake are just language keywords. Bounded concurrency is a one-liner. Error recovery is a `try` block. A loop is `while (!approved)`. No plugin, no RFC, no new node type.
+
+We migrated all flows YAML-to-JS on 2026-06-10. One-way conversion script, every flow reviewed and running in 24 hours. The YAML parser was deleted — not deprecated, not kept for backwards compatibility, deleted. There's no config path left to reach for.
+
+The plan lives in code, not config. Config is for things that don't change. Agent orchestration changes every run.
+
+## A Flow IS a Session
+
+The old model had a phantom problem. A flow was a card. A session was a separate card. The operator watched the flow run from a different window. There was a "New Flow" button that created a flow card, and a "Continue" button that attached a session to it, and the disconnect between "the thing running your work" and "the place you talk to it" was baked into the UI.
+
+We killed that architecture. Every spawn is a session.
+
+When you call `POST /api/flows/spawn {cwd, task, flowName}`, the session runs the flow inside itself with the `flow_run` tool. Steps stream inline into the chat — `flow:steps` injects progress into the session's own message stream. The session turns purple in the dashboard. It becomes the flow. One card. One identity. Persistent after the run finishes.
+
+No "New Flow" button. No "Continue" button. You spawn a session; with a `flowName` it runs that flow, without one it opens an interactive operator session that designs a flow with you first. The dashboard branding is tinqs Studio. The control surface is the host card, live agent cards, run history, and a chat to steer the run.
+
+There's no phantom card. No disconnect between "the thing running your work" and "the place you talk to it." It's one session. You're in the room.
+
+## The Human Gate: Pause, Take Over, Approve, Continue
+
+Agents make mistakes. They guess when they shouldn't. They take irreversible actions because the prompt said "proceed." The model gets stuck and you sit there wondering — do I wait or abort?
+
+`flow.human()` is our answer.
+
+When a flow hits a human gate, it stops. The dashboard shows the gate prompt: what to review, what to decide. The host session switches to takeover mode — coding tools are unblocked (normally the host is hands-off) and the system prompt becomes "flow is paused, help the operator finish this." You open files. You edit. You verify. You run commands.
+
+To release the gate, reply `approve` or `done` or `lgtm`. The flow resumes. Any other message is a work instruction — the takeover session executes it but does not release the gate. The flow loops on `notes` until the human says go.
+
+```js
+let approved = false;
+while (!approved) {
+  const { notes } = await human({
+    title: "Review before push",
+    prompt: "Check the diff. Approve or tell me what to fix.",
+  });
+  if (notes.match(/^(approve|done|lgtm|looks good)/i)) approved = true;
+  else await agent(`Fix: ${notes}`, { agent: "implementer" });
+}
+```
+
+Two patterns this enables. First: review-approve-before-push gates — nobody ships untested code because nobody set the `auto-approve` flag to true. Second: the "agent is stuck" hand-off — the flow pauses, you take over the exact same workspace, fix the problem, type `continue`, and the flow keeps going.
+
+The flow waits instead of guessing. This isn't a feature. It's an admission that some decisions shouldn't be automated.
+
+## The Operator Is Your Co-Pilot
+
+The old way was a one-shot generator. Paste an objective, click Generate, get a YAML blob, pray it's right, run it, discover it's wrong 20 minutes in with no way to steer. We'd watch flows fail and think "I could have told it that before it started."
+
+The new flow operator doesn't write the flow for you and walk away. It designs it with you.
+
+Hit New Flow. A DeepSeek session opens — the same model that powers the dashboard, cheap at $0.28/MTok, steerable in natural language. It proposes a draft `.flow.mjs`, shows you the agents and phases and any human gate, and explains why. You tell it what to change. It does NOT launch until you say go. When you approve, it writes the flow file and spawns a separate host session, then attaches to monitor and report progress.
+
+The operator is still in the chat when the human gate fires. It's still there when you want to change the plan mid-run. It doesn't go away after launch. Co-pilot, not autopilot.
+
+There are three runner faces for the same engine: **pi/dashboard** (DeepSeek, cheap, steerable — the default), **Claude Code** (Workflow tool, one-shot fan-out for heavy research), and a **cloud agent** (remote deploy, clone, AWS). Pick by granularity and cost. The flow file is the same in all three modes.
+
+## What We Learned
+
+**Numbers first.** 43 out of 43 unit tests green. All flows migrated. The supervisor inbox — steering messages sent between steps — was silently dropping operator messages before 2026-06-10. You'd type "focus on the auth routes" and the flow never saw it. That's fixed now. Chat reaches the inbox, the inbox drains between `agent()` calls, and steering works.
+
+**The inbox rule.** The supervisor inbox is applied between `agent()` calls, never mid-step. Steering mid-step is undefined behaviour. We learned this the hard way — early versions tried to inject mid-agent and got corrupted state, partial outputs, agents that forgot what they were doing. Between steps is the right boundary. Respect it.
+
+**Economics matter.** DeepSeek V4 Pro at $0.28/MTok runs per-step. Per-step model override lets you swap in a premium reasoning model ($15/MTok) for the one critical call that needs it. $0.28 for routine, $15 for the hard parts. The three-tier strategy from our agent daemon applies at flow granularity too.
+
+**What's next.** Richer on-card flow display — a pinned step strip so you can see progress without opening the session. Attachable asset and agent-structure viewers in the flow card. Run replay for finished sessions after a page reload (the session persists, but you can't rewatch the stream yet).
+
+But the principle is settled. A flow isn't a pipeline. A pipeline runs blind and reports back later. A flow is a pair-programming session where one of the pair happens to be code.
+
+---
+
+*[Tinqs Studio](https://tinqs.com) is our agent-native development platform — git hosting, AI agents, and the flow engine described here. [Ariki](https://arikigame.com) is the survival colony sim we're building with it.*
@@ -0,0 +1,112 @@
+---
+title: "GPU-Skinned Herds: One Draw Call for 1,000 Animated Characters in Godot"
+slug: gpu-skinned-herds
+date: "2026-06-14"
+description: "Godot has no built-in way to render 1,000 skinned characters in one draw call. We built a GPU skinned-instance renderer into Tinqs Engine that does — 25 crocodiles verified, 1,000+ projected. Pre-built binaries for macOS and Windows."
+og_description: "One draw call, 1,000 animated characters. GPU-skinned herd renderer built into the Tinqs Engine fork of Godot."
+og_image: "https://www.tinqs.com/img/og-cover.jpg"
+excerpt: "Godot can't batch-render 1,000 animated characters. We built a GPU skinned-instance herd renderer into the engine itself — already driving crocodile herds in Ariki. Pre-built editor binaries for macOS and Windows."
+author: "Ozan Bozkurt"
+author_initials: "OB"
+author_role: "CTO & Developer, Tinqs"
+---
+Godot gives you one `Skeleton3D` per character. Want 200 animals in a herd? That's 200 skeleton nodes, 200 draw calls, and 200 `AnimationPlayer` ticks every frame. Want 1,000? Now you're measuring in seconds per frame, not frames per second.
+
+We built a GPU skinned-instance renderer into Tinqs Engine that packs every pose into a single texture, uploads once, and draws every instance in one call. 25 crocodiles on screen right now. 1,000+ projected. Same bone count, same animation fidelity — a tiny fraction of the cost.
+
+## Why the engine needs to change
+
+The standard Godot approach — one `Skeleton3D` + one `MeshInstance3D` per character — works for a handful of animated entities. It breaks down hard at crowd scale:
+
+- **CPU bone transforms.** Computing `global_pose` for 200 skeletons × 100 bones each = 20,000 matrix multiplies per frame, all on the main thread.
+- **Draw call explosion.** Each `MeshInstance3D` is its own draw call. Even with MultiMesh, there's no built-in path for skinned meshes — `MultiMeshInstance3D` only handles static geometry.
+- **AnimationPlayer sprawl.** Each skeleton needs its own `AnimationPlayer` and its own `process()` tick.
+
+The alternative — baking animations to vertex textures — works for static crowds but locks you out of per-instance variation. No blending, no phase offsets, no reactive behaviour.
+
+What we need is simpler: **share the skeleton, drive per-instance poses from a single animation, batch the draw call.** That's what `agent_skinned` does.
+
+## How it works: two classes, one texture
+
+The module lives in `modules/agent_skinned/` inside [Tinqs Engine](https://tinqs.com/tinqs/engine). Two classes, one job:
+
+### `MultiSkinnedMeshInstance3D` — the data plane
+
+Holds the CPU-side bone matrices. Allocates an `ImageTexture` of size `[4 × max_bones, max_instances]` in RGBA32F — each texel is one column of a 4×4 bone matrix. For a 130-bone crocodile with 256 instances:
+
+```
+Texture: 520 × 256 RGBA32F ≈ 2 MB
+```
+
+That's the entire pose state for 256 animated crocodiles in a single GPU texture. The API is simple:
+
+```gdscript
+var data := MultiSkinnedMeshInstance3D.new()
+data.set_mesh(crocodile_mesh)
+data.set_skeleton(skeleton)       # rest pose + bone hierarchy
+data.set_max_instances(256)
+data.set_max_bones(130)
+
+# Each frame: push poses from the animated skeleton
+for instance in herd_positions:
+    data.set_instance_pose_bones(instance.id, bone_transforms)
+data.update()   # upload only dirty instances, not the whole texture
+```
+
+### `MultiSkinnedInstance3D` — the renderer
+
+A `MultiMeshInstance3D` subclass. Set its multimesh with the skinned mesh and instance transforms, point it at the data plane, call `refresh()` — it uploads the bone texture into the shader material's `bone_matrices_tex` uniform and the mesh is drawn in one call.
+
+The shader does 4-bone linear-blend skinning on the GPU:
+
+```glsl
+mat4 get_bone(int b) {
+    return mat4(
+        texelFetch(bone_matrices_tex, ivec2(b * 4 + 0, INSTANCE_ID), 0),
+        texelFetch(bone_matrices_tex, ivec2(b * 4 + 1, INSTANCE_ID), 0),
+        texelFetch(bone_matrices_tex, ivec2(b * 4 + 2, INSTANCE_ID), 0),
+        texelFetch(bone_matrices_tex, ivec2(b * 4 + 3, INSTANCE_ID), 0)
+    );
+}
+```
+
+`INSTANCE_ID` is a Godot built-in — the GPU already knows which instance it's rendering. We just use it to index into the bone texture. No uniform arrays, no SSBOs, no compute shaders. Just a 2D texture and a custom vertex shader.
+
+## Two bugs we shipped and fixed
+
+The module had data-plane doctests from day one — round-trip pose get/set, dirty tracking, size clamping, AABB. All green. Then we put it on screen for the first time and the crocodiles looked... wrong.
+
+**Bug 1: Shader compile failure.** The default skinning shader compared `TANGENT` as `vec4`. Godot 4 exposes it as `vec3`. Fixed in one line, added `albedo_tex` uniform so herds texture out of the box.
+
+**Bug 2: Bone matrices stored transposed.** The data plane wrote basis rows (standard Godot `Transform3D.basis` is row-major), but the shader unpacked as columns. Every bone matrix was transposed — the mesh crumpled. Not a scale bug, not an orientation bug — a layout mismatch. Fixed by storing column-major, with a doctest to prevent regression.
+
+The lesson: doctests catch logic. Rendering catches truth. You need both.
+
+## What's driving it
+
+In [Ariki](https://www.arikigame.com), the sim tracks animal migration across a 12km archipelago. `AnimalHerdRenderer.cs` groups sim `ViewerState.animals` by type, feeds positions to `skinned_herd.gd` (a reusable per-type herd backend), which drives the renderer. One `AnimationPlayer` animates a single driver skeleton; poses propagate to every instance.
+
+The crocodile herd scene is 25 instances, one draw call. The same pipeline projects to 200–1,000 before the GPU budget even notices.
+
+## What's deliberately not here
+
+- **No C# wrapper.** Instantiate from GDScript via `ClassDB.instantiate()` — the binding surface is small and stable.
+- **No automatic `AnimationPlayer` integration.** You drive poses. We give you the texture. Freedom to animate however you want.
+- **No GPU occlusion or LOD.** That's the game's job. The engine provides the tool; the game decides what to draw.
+
+## Get the build
+
+Pre-built editor binaries with `agent_skinned` baked in — no engine compile required:
+
+| Platform | Binary | Engine commit |
+|----------|--------|---------------|
+| **macOS ARM64** | [`tinqs.macos.editor.arm64.mono`](https://tinqs.com/tinqs/builds/media/branch/main/engine/macos-arm64/tinqs.macos.editor.arm64.mono) | `4fe1323` (4.6.4, Xcode 26.3) |
+| **Windows x64** | [`tinqs.windows.editor.x86_64.mono.exe`](https://tinqs.com/tinqs/builds/media/branch/main/engine/windows-x64/tinqs.windows.editor.x86_64.mono.exe) | `64fb5cc` (4.6.4, MSVC 2022) |
+
+All builds live in the public [`tinqs/builds`](https://tinqs.com/tinqs/builds) repo — engine source is private, but the binaries are yours. See [`manifest.json`](https://tinqs.com/tinqs/builds/src/branch/main/manifest.json) for checksums and build details.
+
+The engine source lives in [`tinqs/engine`](https://tinqs.com/tinqs/engine) (private). Module docs: `modules/agent_skinned/README.md` and `.agents/wiki/agent-skinned-gpu-herd.md`.
+
+---
+
+**Related:** [Fork, Don't Build](fork-dont-build) — why we modify existing platforms instead of building new ones. [Streaming a 12km Archipelago in Godot 4](godot-optimisation) — the terrain and vegetation streaming layers that work alongside this.