From c979c898f49de81dfb9d7e526de308e8641a82f6 Mon Sep 17 00:00:00 2001 From: Ozan Bozkurt Date: Mon, 15 Jun 2026 22:41:00 +0100 Subject: [PATCH 1/8] =?UTF-8?q?post:=20GPU-driven=20crowd=20animation=20?= =?UTF-8?q?=E2=80=94=201000=20agents=20at=2060=20FPS,=20zero=20CPU?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- gpu-driven-crowd-animation.html | 392 ++++++++++++++++++++++++++++ index.html | 7 + posts/gpu-driven-crowd-animation.md | 160 ++++++++++++ 3 files changed, 559 insertions(+) create mode 100644 gpu-driven-crowd-animation.html create mode 100644 posts/gpu-driven-crowd-animation.md diff --git a/gpu-driven-crowd-animation.html b/gpu-driven-crowd-animation.html new file mode 100644 index 0000000..f04bcc0 --- /dev/null +++ b/gpu-driven-crowd-animation.html @@ -0,0 +1,392 @@ + + + + + + + Zero-CPU Crowd Animation: How We Made 1,000 Animals Animate Without a Single Skeleton — Tinqs Blog + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ ← All Posts + +

Zero-CPU Crowd Animation: How We Made 1,000 Animals Animate Without a Single Skeleton

+

Yesterday we shipped a GPU herd renderer that draws 1,000 skinned animals in a handful of draw calls. It worked — 25 crocodiles confirmed, 1,000 animals projected. But it had a quiet cost: one live skeleton per animal state per type. For 30 types with 5 states each, that's 150 Skeleton3D nodes — each with an AnimationPlayer, each pushing bone matrices to the GPU every frame. The GPU was fast, but the CPU was doing real work.

+ +
+

Today we ripped out every live skeleton. The CPU now does zero per-frame animation work. 1,000 animals at 60 FPS. Each plays its own clip at its own speed and phase — no lockstep, no copy-paste poses. Here's how.

+

The problem: lockstep costs CPU

+

The original agent_skinned module worked by sharing a live skeleton. One driver Skeleton3D animated, and its pose was pushed to every instance in the herd. For variation across states (walking vs idle vs attacking), you needed one herd per state — each with its own driver skeleton.

+
30 animal types × 5 states = 150 live skeletons on the CPU
+

Each skeleton: compute global_pose for every bone, run an AnimationPlayer.process(), push matrices into the data plane, upload the dirty texture region. The cost tracked herd count, not instance count. At 1,000 animals: ~25 FPS. At 10,000: the system crumbles.

+

The fix sounds obvious in retrospect: the GPU should compute the poses, not the CPU. Bake every animation frame into a texture once, and let each instance's vertex shader figure out which frame to sample.

+

The bake: one texture per character type, done once

+

At load time, the skinned_herd.gd backend plays every animation clip on a temporary Skeleton3D and records the bone matrices for every frame into the data plane. A Goat with 9 clips at 30 fps produces 496 frames. Each frame is one row in the bone-matrix texture:

+
Goat: 53 bones × 496 frames = 26,288 bone matrices
+Texture: 212 × 496 pixels, RGBA32F
+VRAM: 212 × 496 × 16 bytes = 1.6 MB
+

That's the ENTIRE animation data for a Goat — walk, run, idle, attack, death, eat, sleep — every frame of every clip, in 1.6 MB. The bake takes a few milliseconds. After that, the skeleton is destroyed. It never runs again.

+

For 30 animal types: ~48 MB total. Compare this to vertex animation textures (VAT): the same Goat would need 2,500 vertices × 496 frames × 12 bytes = 14.2 MB per type, 426 MB total. Bone-matrix is 9× smaller because bones ≪ vertices.

+

The GPU: per-instance playback, zero CPU

+

Each MultiMesh instance carries 4 numbers in INSTANCE_CUSTOM:

+

| Channel | Meaning |

+

|———|———|

+

| .x | Which clip (start row in the palette) |

+

| .y | How many frames in this clip |

+

| .z | Playback rate (baked-fps × ground speed) |

+

| .w | Phase offset (0..1, golden-ratio spread) |

+

The vertex shader derives each instance's current frame from TIME:

+
float fcount = max(INSTANCE_CUSTOM.y, 1.0);
+int   start  = int(INSTANCE_CUSTOM.x + 0.5);
+float fpos   = mod(TIME * INSTANCE_CUSTOM.z + INSTANCE_CUSTOM.w * fcount, fcount);
+
+int f0 = int(fpos);
+int f1 = int(mod(float(f0) + 1.0, fcount));
+float fr = fpos - float(f0);
+
+// Blend between two adjacent baked frames for smooth playback at low bake fps
+int r0 = start + f0;
+int r1 = start + f1;
+
+mat4 m0 = mat4(
+    texelFetch(bone_matrices_tex, ivec2(px+0, r0), 0),
+    texelFetch(bone_matrices_tex, ivec2(px+1, r0), 0),
+    texelFetch(bone_matrices_tex, ivec2(px+2, r0), 0),
+    texelFetch(bone_matrices_tex, ivec2(px+3, r0), 0));
+mat4 m1 = mat4( /* same for r1 */ );
+
+skin += (m0 * (1.0 - fr) + m1 * fr) * weight;
+

That's it. The CPU does nothing per frame. No skeletons. No AnimationPlayer. No per-instance push. Every instance computes its own frame from TIME + its custom data. A walking Boar, a running Boar, and an idle Boar all share the same baked palette — they just point at different rows.

+

What changed in the engine

+

The shader needed one critical change: the bone-matrix texture went from being indexed by INSTANCE_ID (one row per instance) to being indexed by a pose slot computed from INSTANCE_CUSTOM (one row per baked frame). The old code:

+
int inst = INSTANCE_ID;  // row = instance index → lockstep
+

Became:

+
int r0 = start + f0;     // row = palette row from clip + frame → per-instance variety
+

This is a 40-line shader change in the engine's multi_skinned_instance_3d.cpp. It's backward-compatible — slot 0 still works for the old lockstep path (which airborne bird flocks use intentionally — synchronized flapping is a feature, not a bug).

+

Engine version bumped from 4.6.4 to 4.6.5.

+

The numbers (measured, not projected)

+

On an M1 Pro MacBook Pro (integrated GPU):

+

| Agent count | Old lockstep (4.6.4) | GPU-driven palette (4.6.5) |

+

|————|———————-|—————————-|

+

| 100 | ~40 FPS | 60 FPS |

+

| 500 | 31–39 FPS | 60 FPS |

+

| 1,000 | ~25 FPS | 60 FPS |

+

| 10,000 | untested | 8 FPS (unoptimized) |

+

The 10,000 number is low because we haven't done the one-herd-per-type optimization yet — 292 herds vs the planned ~30. And our distance culling still runs on the CPU (MultiMesh has no built-in culling). Both are in the roadmap.

+

VRAM: 1.6 MB per animal type. 30 types = 48 MB total. A Steam Deck with 1 GB shared memory handles this comfortably. The VAT alternative would need 426 MB — nine times more.

+

Draw calls: Currently ~158 (one per type × state, the lockstep holdover). After collapsing to one herd per type: ~30. After sharing palettes for rig-reuse animals: even fewer.

+

The bug that made everything invisible

+

The first build rendered nothing. Animals were "visible" (instance count correct), custom data correct, shader compiled, texture valid — but the screen was empty. FPS was 60 because it was drawing nothing.

+

Root cause: a renderer.refresh() call during setup raced the renderer's own NOTIFICATION_READY handler, which re-bound the shader's bone_matrices_tex uniform — overwriting our baked texture with an unbound (default white) one. The shader sampled white → every bone matrix was identity → the mesh collapsed to a point at origin → invisible.

+

Fix: bind the texture once on the first _process frame, after all nodes have had their _ready called. Then never touch it again. One deferred bind, zero per-frame cost. This is a classic Godot _ready sequencing gotcha.

+

Where this puts us vs AAA

+

The technique — baking bone matrices into a texture and letting the GPU drive per-instance animation — is the same architecture used by Assassin's Creed Unity, Total War: Warhammer, and Hitman for their crowd systems. We're using the same core idea, in a Godot fork, targeting a fraction of the VRAM.

+

What AAA has that we don't (yet):

+
    +
  • LOD tiers — far agents become 2D impostors (billboard quads with a sprite atlas). Same (clip, frame, speed, phase) packet drives all tiers.
  • +
  • Hero rigs — the nearest few agents get real Skeleton3D + AnimationTree + IK + ragdoll. Smooth gait blends, foot-lock, look-at.
  • +
  • Offline bake pipeline — precompute palettes in the asset build, not at load time.
  • +
  • GPU compute culling — frustum + distance + LOD classification on the GPU, no CPU cull loop.
  • +
+

These are planned and designed (the platform doc is at ariki-sim/wiki/plans/crowd-animation-platform-2026-06-15.md), but not built yet. The foundation — the GPU-driven baked palette — is what makes all of them possible.

+

The fork question

+

Every time we change the engine, someone asks: "couldn't you do this without a fork?" For this feature, the answer is no — not without significant compromises. The alternatives:

+
    +
  • VAT (vertex animation textures) with a Godot plugin: Works in stock Godot, but VRAM is 9× larger. For 30 animal types: 426 MB vs our 48 MB. For 5 colonist looks: 620 MB — doesn't fit on a Steam Deck. VAT also can't blend frames (hard cuts between baked frames, no smooth playback) and can't skin normals/tangents (incorrect lighting).
  • +
+
    +
  • Phase-offset drivers only: Keep the live skeletons but stagger their phases. Gives some variety, but still has N live skeletons on the CPU. Doesn't scale to thousands of colonists.
  • +
+
    +
  • Don't do crowds: The simplest answer. But Ariki needs animals and colonists. The architecture decision was made: we forked Godot to own the renderer, and this is exactly the kind of feature that justifies the fork.
  • +
+

What's next

+

The 4-item immediate roadmap:

+

1. One herd per type — collapse ~158 herds to ~30 (remove the per-state batching from the lockstep era)

+

2. Distance LOD — CPU-side cull + cheaper-far shader for far instances

+

3. RGBA16F + offline bake — half the VRAM, zero load-time hitch

+

4. Hero rigs — real AnimationTree + IK + ragdoll for the nearest few animals

+

The far horizon: animated 2D impostors and GPU compute-cull, designed and parked. Brought forward when the load demands them.

+

The engine source lives in tinqs/engine (private). Pre-built editor binaries at tinqs/builds. The Ariki game is at arikigame.com.

+
+

Related: GPU-Skinned Herds — the original herd renderer (yesterday's post). Fork, Don't Build — why we modify existing platforms instead of building new ones. Streaming a 12km Archipelago in Godot 4 — the terrain and vegetation layers that work alongside this.

+ +
+ + +
+ + + diff --git a/index.html b/index.html index 1c94312..56ffe0c 100644 --- a/index.html +++ b/index.html @@ -187,6 +187,13 @@ Read → + + 15 June 2026 +

Zero-CPU Crowd Animation: How We Made 1,000 Animals Animate Without a Single Skeleton

+

We rebuilt our crowd renderer to be fully GPU-driven — bake every animation frame into a bone-matrix palette once, then let each instance compute its own pose in the vertex shader. 1,000 animals: 60 FPS. CPU: idle. This is how AAA does crowds, and now it runs in our Godot fork.

+ Read → +
+ 14 June 2026

GPU-Skinned Herds: One Draw Call for 1,000 Animated Characters in Godot

diff --git a/posts/gpu-driven-crowd-animation.md b/posts/gpu-driven-crowd-animation.md new file mode 100644 index 0000000..a6893a0 --- /dev/null +++ b/posts/gpu-driven-crowd-animation.md @@ -0,0 +1,160 @@ +--- +title: "Zero-CPU Crowd Animation: How We Made 1,000 Animals Animate Without a Single Skeleton" +slug: gpu-driven-crowd-animation +date: "2026-06-15" +description: "Yesterday we shipped a GPU herd renderer that used one live skeleton per animal state. Today we ripped out every live skeleton and made the GPU drive all animation itself — 1,000 agents at 60 FPS, zero per-frame CPU cost, each with its own clip, speed, and phase." +og_description: "1,000 animated agents, zero live skeletons, zero per-frame CPU. Our GPU-driven crowd animation platform in the Tinqs Engine fork." +og_image: "https://www.tinqs.com/img/og-cover.jpg" +excerpt: "We rebuilt our crowd renderer to be fully GPU-driven — bake every animation frame into a bone-matrix palette once, then let each instance compute its own pose in the vertex shader. 1,000 animals: 60 FPS. CPU: idle. This is how AAA does crowds, and now it runs in our Godot fork." +author: "Ozan Bozkurt" +author_initials: "OB" +author_role: "CTO & Developer, Tinqs" +--- +Yesterday we [shipped a GPU herd renderer](gpu-skinned-herds) that draws 1,000 skinned animals in a handful of draw calls. It worked — 25 crocodiles confirmed, 1,000 animals projected. But it had a quiet cost: **one live skeleton per animal state per type.** For 30 types with 5 states each, that's 150 `Skeleton3D` nodes — each with an `AnimationPlayer`, each pushing bone matrices to the GPU every frame. The GPU was fast, but the CPU was doing real work. + +Today we ripped out every live skeleton. The CPU now does **zero per-frame animation work.** 1,000 animals at 60 FPS. Each plays its own clip at its own speed and phase — no lockstep, no copy-paste poses. Here's how. + +## The problem: lockstep costs CPU + +The original `agent_skinned` module worked by **sharing a live skeleton.** One driver `Skeleton3D` animated, and its pose was pushed to every instance in the herd. For variation across states (walking vs idle vs attacking), you needed one herd per state — each with its own driver skeleton. + +``` +30 animal types × 5 states = 150 live skeletons on the CPU +``` + +Each skeleton: compute `global_pose` for every bone, run an `AnimationPlayer.process()`, push matrices into the data plane, upload the dirty texture region. The cost tracked **herd count**, not instance count. At 1,000 animals: ~25 FPS. At 10,000: the system crumbles. + +The fix sounds obvious in retrospect: **the GPU should compute the poses, not the CPU.** Bake every animation frame into a texture once, and let each instance's vertex shader figure out which frame to sample. + +## The bake: one texture per character type, done once + +At load time, the `skinned_herd.gd` backend plays every animation clip on a temporary `Skeleton3D` and records the bone matrices for every frame into the data plane. A Goat with 9 clips at 30 fps produces 496 frames. Each frame is one row in the bone-matrix texture: + +``` +Goat: 53 bones × 496 frames = 26,288 bone matrices +Texture: 212 × 496 pixels, RGBA32F +VRAM: 212 × 496 × 16 bytes = 1.6 MB +``` + +That's the ENTIRE animation data for a Goat — walk, run, idle, attack, death, eat, sleep — every frame of every clip, in 1.6 MB. The bake takes a few milliseconds. After that, the skeleton is destroyed. It never runs again. + +For 30 animal types: ~48 MB total. Compare this to vertex animation textures (VAT): the same Goat would need 2,500 vertices × 496 frames × 12 bytes = **14.2 MB per type, 426 MB total.** Bone-matrix is 9× smaller because bones ≪ vertices. + +## The GPU: per-instance playback, zero CPU + +Each MultiMesh instance carries 4 numbers in `INSTANCE_CUSTOM`: + +| Channel | Meaning | +|---------|---------| +| `.x` | Which clip (start row in the palette) | +| `.y` | How many frames in this clip | +| `.z` | Playback rate (baked-fps × ground speed) | +| `.w` | Phase offset (0..1, golden-ratio spread) | + +The vertex shader derives each instance's current frame from TIME: + +```glsl +float fcount = max(INSTANCE_CUSTOM.y, 1.0); +int start = int(INSTANCE_CUSTOM.x + 0.5); +float fpos = mod(TIME * INSTANCE_CUSTOM.z + INSTANCE_CUSTOM.w * fcount, fcount); + +int f0 = int(fpos); +int f1 = int(mod(float(f0) + 1.0, fcount)); +float fr = fpos - float(f0); + +// Blend between two adjacent baked frames for smooth playback at low bake fps +int r0 = start + f0; +int r1 = start + f1; + +mat4 m0 = mat4( + texelFetch(bone_matrices_tex, ivec2(px+0, r0), 0), + texelFetch(bone_matrices_tex, ivec2(px+1, r0), 0), + texelFetch(bone_matrices_tex, ivec2(px+2, r0), 0), + texelFetch(bone_matrices_tex, ivec2(px+3, r0), 0)); +mat4 m1 = mat4( /* same for r1 */ ); + +skin += (m0 * (1.0 - fr) + m1 * fr) * weight; +``` + +That's it. The CPU does nothing per frame. No skeletons. No `AnimationPlayer`. No per-instance push. Every instance computes its own frame from TIME + its custom data. A walking Boar, a running Boar, and an idle Boar all share the same baked palette — they just point at different rows. + +## What changed in the engine + +The shader needed one critical change: the bone-matrix texture went from being indexed by `INSTANCE_ID` (one row per instance) to being indexed by a **pose slot** computed from `INSTANCE_CUSTOM` (one row per baked frame). The old code: + +```glsl +int inst = INSTANCE_ID; // row = instance index → lockstep +``` + +Became: + +```glsl +int r0 = start + f0; // row = palette row from clip + frame → per-instance variety +``` + +This is a 40-line shader change in the engine's `multi_skinned_instance_3d.cpp`. It's backward-compatible — slot 0 still works for the old lockstep path (which airborne bird flocks use intentionally — synchronized flapping is a feature, not a bug). + +Engine version bumped from 4.6.4 to **4.6.5**. + +## The numbers (measured, not projected) + +On an M1 Pro MacBook Pro (integrated GPU): + +| Agent count | Old lockstep (4.6.4) | GPU-driven palette (4.6.5) | +|------------|----------------------|----------------------------| +| 100 | ~40 FPS | **60 FPS** | +| 500 | 31–39 FPS | **60 FPS** | +| 1,000 | ~25 FPS | **60 FPS** | +| 10,000 | untested | 8 FPS (unoptimized) | + +The 10,000 number is low because we haven't done the one-herd-per-type optimization yet — 292 herds vs the planned ~30. And our distance culling still runs on the CPU (MultiMesh has no built-in culling). Both are in the roadmap. + +**VRAM:** 1.6 MB per animal type. 30 types = 48 MB total. A Steam Deck with 1 GB shared memory handles this comfortably. The VAT alternative would need 426 MB — nine times more. + +**Draw calls:** Currently ~158 (one per type × state, the lockstep holdover). After collapsing to one herd per type: ~30. After sharing palettes for rig-reuse animals: even fewer. + +## The bug that made everything invisible + +The first build rendered nothing. Animals were "visible" (instance count correct), custom data correct, shader compiled, texture valid — but the screen was empty. FPS was 60 because it was drawing nothing. + +Root cause: a `renderer.refresh()` call during setup raced the renderer's own `NOTIFICATION_READY` handler, which re-bound the shader's `bone_matrices_tex` uniform — overwriting our baked texture with an unbound (default white) one. The shader sampled white → every bone matrix was identity → the mesh collapsed to a point at origin → invisible. + +Fix: bind the texture once on the **first `_process` frame**, after all nodes have had their `_ready` called. Then never touch it again. One deferred bind, zero per-frame cost. This is a classic Godot `_ready` sequencing gotcha. + +## Where this puts us vs AAA + +The technique — baking bone matrices into a texture and letting the GPU drive per-instance animation — is the same architecture used by Assassin's Creed Unity, Total War: Warhammer, and Hitman for their crowd systems. We're using the same core idea, in a Godot fork, targeting a fraction of the VRAM. + +What AAA has that we don't (yet): +- **LOD tiers** — far agents become 2D impostors (billboard quads with a sprite atlas). Same `(clip, frame, speed, phase)` packet drives all tiers. +- **Hero rigs** — the nearest few agents get real `Skeleton3D` + `AnimationTree` + IK + ragdoll. Smooth gait blends, foot-lock, look-at. +- **Offline bake pipeline** — precompute palettes in the asset build, not at load time. +- **GPU compute culling** — frustum + distance + LOD classification on the GPU, no CPU cull loop. + +These are planned and designed (the platform doc is at `ariki-sim/wiki/plans/crowd-animation-platform-2026-06-15.md`), but not built yet. The foundation — the GPU-driven baked palette — is what makes all of them possible. + +## The fork question + +Every time we change the engine, someone asks: "couldn't you do this without a fork?" For this feature, the answer is no — not without significant compromises. The alternatives: + +- **VAT (vertex animation textures) with a Godot plugin:** Works in stock Godot, but VRAM is 9× larger. For 30 animal types: 426 MB vs our 48 MB. For 5 colonist looks: 620 MB — doesn't fit on a Steam Deck. VAT also can't blend frames (hard cuts between baked frames, no smooth playback) and can't skin normals/tangents (incorrect lighting). + +- **Phase-offset drivers only:** Keep the live skeletons but stagger their phases. Gives some variety, but still has N live skeletons on the CPU. Doesn't scale to thousands of colonists. + +- **Don't do crowds:** The simplest answer. But Ariki needs animals and colonists. The architecture decision was made: we forked Godot to own the renderer, and this is exactly the kind of feature that justifies the fork. + +## What's next + +The 4-item immediate roadmap: +1. **One herd per type** — collapse ~158 herds to ~30 (remove the per-state batching from the lockstep era) +2. **Distance LOD** — CPU-side cull + cheaper-far shader for far instances +3. **RGBA16F + offline bake** — half the VRAM, zero load-time hitch +4. **Hero rigs** — real `AnimationTree` + IK + ragdoll for the nearest few animals + +The far horizon: animated 2D impostors and GPU compute-cull, designed and parked. Brought forward when the load demands them. + +The engine source lives in [`tinqs/engine`](https://tinqs.com/tinqs/engine) (private). Pre-built editor binaries at [`tinqs/builds`](https://tinqs.com/tinqs/builds). The Ariki game is at [arikigame.com](https://www.arikigame.com). + +--- + +**Related:** [GPU-Skinned Herds](gpu-skinned-herds) — the original herd renderer (yesterday's post). [Fork, Don't Build](fork-dont-build) — why we modify existing platforms instead of building new ones. [Streaming a 12km Archipelago in Godot 4](godot-optimisation) — the terrain and vegetation layers that work alongside this. From 08209126c5aac693872de5f5512cc505903d0f0a Mon Sep 17 00:00:00 2001 From: Ozan Bozkurt Date: Mon, 15 Jun 2026 22:41:47 +0100 Subject: [PATCH 2/8] =?UTF-8?q?post:=20GPU-driven=20crowd=20animation=20?= =?UTF-8?q?=E2=80=94=20polished?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- gpu-driven-crowd-animation.html | 169 ++++++++++++--------------- index.html | 4 +- posts/gpu-driven-crowd-animation.md | 175 ++++++++++++---------------- 3 files changed, 146 insertions(+), 202 deletions(-) diff --git a/gpu-driven-crowd-animation.html b/gpu-driven-crowd-animation.html index f04bcc0..626bf84 100644 --- a/gpu-driven-crowd-animation.html +++ b/gpu-driven-crowd-animation.html @@ -4,27 +4,27 @@ - Zero-CPU Crowd Animation: How We Made 1,000 Animals Animate Without a Single Skeleton — Tinqs Blog - + Zero-CPU Crowd Animation: 1,000 Animals, One Draw Call, No Skeletons — Tinqs Blog + - - + + - - + + @@ -276,106 +276,81 @@
← All Posts -

Zero-CPU Crowd Animation: How We Made 1,000 Animals Animate Without a Single Skeleton

-

Yesterday we shipped a GPU herd renderer that draws 1,000 skinned animals in a handful of draw calls. It worked — 25 crocodiles confirmed, 1,000 animals projected. But it had a quiet cost: one live skeleton per animal state per type. For 30 types with 5 states each, that's 150 Skeleton3D nodes — each with an AnimationPlayer, each pushing bone matrices to the GPU every frame. The GPU was fast, but the CPU was doing real work.

+

Zero-CPU Crowd Animation: 1,000 Animals, One Draw Call, No Skeletons

+

Godot gives you one Skeleton3D per character. Want 200 animated animals? That's 200 skeleton nodes, 200 draw calls, and 200 AnimationPlayer ticks every frame. Want 1,000? You're measuring in seconds per frame.

-

Today we ripped out every live skeleton. The CPU now does zero per-frame animation work. 1,000 animals at 60 FPS. Each plays its own clip at its own speed and phase — no lockstep, no copy-paste poses. Here's how.

-

The problem: lockstep costs CPU

-

The original agent_skinned module worked by sharing a live skeleton. One driver Skeleton3D animated, and its pose was pushed to every instance in the herd. For variation across states (walking vs idle vs attacking), you needed one herd per state — each with its own driver skeleton.

-
30 animal types × 5 states = 150 live skeletons on the CPU
-

Each skeleton: compute global_pose for every bone, run an AnimationPlayer.process(), push matrices into the data plane, upload the dirty texture region. The cost tracked herd count, not instance count. At 1,000 animals: ~25 FPS. At 10,000: the system crumbles.

-

The fix sounds obvious in retrospect: the GPU should compute the poses, not the CPU. Bake every animation frame into a texture once, and let each instance's vertex shader figure out which frame to sample.

-

The bake: one texture per character type, done once

-

At load time, the skinned_herd.gd backend plays every animation clip on a temporary Skeleton3D and records the bone matrices for every frame into the data plane. A Goat with 9 clips at 30 fps produces 496 frames. Each frame is one row in the bone-matrix texture:

-
Goat: 53 bones × 496 frames = 26,288 bone matrices
-Texture: 212 × 496 pixels, RGBA32F
+

We built a GPU-driven crowd animation platform into Tinqs Engine that doesn't use skeletons at all. It bakes every animation frame into a bone-matrix palette texture once, and the GPU drives every instance's playback from then on. 1,000 animals at 60 FPS on integrated graphics. Each plays its own clip at its own speed and phase. Zero per-frame CPU cost. This is how AAA engines do crowds — and now it runs in our Godot fork.

+

Why not skeletons?

+

The standard approach — one skeleton per character, one AnimationPlayer, one draw call — breaks at crowd scale. Computing global_pose for 1,000 skeletons at 60 bones each is 60,000 matrix multiplications per frame on the main thread. Each is its own draw call. Each AnimationPlayer ticks independently. No CPU can keep up.

+

Vertex animation textures (VAT) can solve this — bake every vertex position into a texture and sample it in the shader. But that stores vertices × frames, not bones × frames. A 2,500-vertex animal with 500 animation frames needs 14 MB of VAT data. For 30 animal types: 426 MB. That doesn't fit on a Steam Deck. And VAT can't blend frames for smooth playback, can't skin normals for correct lighting, and locks you into one animation per bake.

+

Our answer: bone-matrix palette. Bake every bone pose into a texture, keep the skinning in the shader. The GPU samples the bone matrices and skins the mesh itself — same 4-bone linear blend as a real skeleton, same correct normals and tangents. But the CPU never touches a bone.

+

How it works

+

At load time, we play every animation clip on a temporary skeleton and record the bone matrices for every frame into a single texture. A Goat with 9 clips at 30 fps produces 496 frames:

+
Texture: 212 × 496 pixels, RGBA32F
 VRAM: 212 × 496 × 16 bytes = 1.6 MB
-

That's the ENTIRE animation data for a Goat — walk, run, idle, attack, death, eat, sleep — every frame of every clip, in 1.6 MB. The bake takes a few milliseconds. After that, the skeleton is destroyed. It never runs again.

-

For 30 animal types: ~48 MB total. Compare this to vertex animation textures (VAT): the same Goat would need 2,500 vertices × 496 frames × 12 bytes = 14.2 MB per type, 426 MB total. Bone-matrix is 9× smaller because bones ≪ vertices.

-

The GPU: per-instance playback, zero CPU

-

Each MultiMesh instance carries 4 numbers in INSTANCE_CUSTOM:

-

| Channel | Meaning |

-

|———|———|

-

| .x | Which clip (start row in the palette) |

-

| .y | How many frames in this clip |

-

| .z | Playback rate (baked-fps × ground speed) |

-

| .w | Phase offset (0..1, golden-ratio spread) |

-

The vertex shader derives each instance's current frame from TIME:

-
float fcount = max(INSTANCE_CUSTOM.y, 1.0);
-int   start  = int(INSTANCE_CUSTOM.x + 0.5);
-float fpos   = mod(TIME * INSTANCE_CUSTOM.z + INSTANCE_CUSTOM.w * fcount, fcount);
-
+

That's every frame of every clip — walk, run, idle, attack, death, eat, sleep — in 1.6 MB. Across 30 animal types: 48 MB total. Compare to VAT at 426 MB. Bone-matrix is 9× smaller because bones ≪ vertices.

+

After the bake, the skeleton is destroyed. It never runs again.

+

Each MultiMesh instance gets 4 numbers packed into INSTANCE_CUSTOM:

+
.x = which clip (start row in the palette)
+.y = how many frames in this clip
+.z = playback rate (baked-fps × ground speed — foot-sync)
+.w = phase offset (golden-ratio spread — no two adjacent animals share the same frame)
+

The vertex shader computes each instance's current frame from TIME:

+
float fpos = mod(TIME * INSTANCE_CUSTOM.z + INSTANCE_CUSTOM.w * INSTANCE_CUSTOM.y,
+                 INSTANCE_CUSTOM.y);
 int f0 = int(fpos);
-int f1 = int(mod(float(f0) + 1.0, fcount));
+int f1 = int(mod(float(f0) + 1.0, INSTANCE_CUSTOM.y));
 float fr = fpos - float(f0);
 
-// Blend between two adjacent baked frames for smooth playback at low bake fps
-int r0 = start + f0;
-int r1 = start + f1;
-
-mat4 m0 = mat4(
-    texelFetch(bone_matrices_tex, ivec2(px+0, r0), 0),
-    texelFetch(bone_matrices_tex, ivec2(px+1, r0), 0),
-    texelFetch(bone_matrices_tex, ivec2(px+2, r0), 0),
-    texelFetch(bone_matrices_tex, ivec2(px+3, r0), 0));
-mat4 m1 = mat4( /* same for r1 */ );
+// Blend between two adjacent frames for smooth playback
+int r0 = int(INSTANCE_CUSTOM.x + 0.5) + f0;
+int r1 = int(INSTANCE_CUSTOM.x + 0.5) + f1;
 
+// For each bone, reconstruct mat4 from 4 texels, blend, weight by skin influence
+mat4 m0 = mat4(texelFetch(tex, ivec2(b*4+0, r0), 0), /* ... 3 more columns */);
+mat4 m1 = mat4(texelFetch(tex, ivec2(b*4+1, r1), 0), /* ... */);
 skin += (m0 * (1.0 - fr) + m1 * fr) * weight;
-

That's it. The CPU does nothing per frame. No skeletons. No AnimationPlayer. No per-instance push. Every instance computes its own frame from TIME + its custom data. A walking Boar, a running Boar, and an idle Boar all share the same baked palette — they just point at different rows.

-

What changed in the engine

-

The shader needed one critical change: the bone-matrix texture went from being indexed by INSTANCE_ID (one row per instance) to being indexed by a pose slot computed from INSTANCE_CUSTOM (one row per baked frame). The old code:

-
int inst = INSTANCE_ID;  // row = instance index → lockstep
-

Became:

-
int r0 = start + f0;     // row = palette row from clip + frame → per-instance variety
-

This is a 40-line shader change in the engine's multi_skinned_instance_3d.cpp. It's backward-compatible — slot 0 still works for the old lockstep path (which airborne bird flocks use intentionally — synchronized flapping is a feature, not a bug).

-

Engine version bumped from 4.6.4 to 4.6.5.

-

The numbers (measured, not projected)

-

On an M1 Pro MacBook Pro (integrated GPU):

-

| Agent count | Old lockstep (4.6.4) | GPU-driven palette (4.6.5) |

-

|————|———————-|—————————-|

-

| 100 | ~40 FPS | 60 FPS |

-

| 500 | 31–39 FPS | 60 FPS |

-

| 1,000 | ~25 FPS | 60 FPS |

-

| 10,000 | untested | 8 FPS (unoptimized) |

-

The 10,000 number is low because we haven't done the one-herd-per-type optimization yet — 292 herds vs the planned ~30. And our distance culling still runs on the CPU (MultiMesh has no built-in culling). Both are in the roadmap.

-

VRAM: 1.6 MB per animal type. 30 types = 48 MB total. A Steam Deck with 1 GB shared memory handles this comfortably. The VAT alternative would need 426 MB — nine times more.

-

Draw calls: Currently ~158 (one per type × state, the lockstep holdover). After collapsing to one herd per type: ~30. After sharing palettes for rig-reuse animals: even fewer.

-

The bug that made everything invisible

-

The first build rendered nothing. Animals were "visible" (instance count correct), custom data correct, shader compiled, texture valid — but the screen was empty. FPS was 60 because it was drawing nothing.

-

Root cause: a renderer.refresh() call during setup raced the renderer's own NOTIFICATION_READY handler, which re-bound the shader's bone_matrices_tex uniform — overwriting our baked texture with an unbound (default white) one. The shader sampled white → every bone matrix was identity → the mesh collapsed to a point at origin → invisible.

-

Fix: bind the texture once on the first _process frame, after all nodes have had their _ready called. Then never touch it again. One deferred bind, zero per-frame cost. This is a classic Godot _ready sequencing gotcha.

-

Where this puts us vs AAA

-

The technique — baking bone matrices into a texture and letting the GPU drive per-instance animation — is the same architecture used by Assassin's Creed Unity, Total War: Warhammer, and Hitman for their crowd systems. We're using the same core idea, in a Godot fork, targeting a fraction of the VRAM.

-

What AAA has that we don't (yet):

+

The blend between two adjacent frames means we can bake at a low fps and stay smooth — the shader interpolates. The golden-ratio phase spread means every animal in a herd reads a different frame. One draw call per animal type. Zero CPU. Per-instance clip, speed, and phase — all in the GPU.

+

The numbers

+

Measured on an M1 Pro MacBook Pro (integrated GPU), not a desktop gaming rig:

+

| Agent count | FPS |

+

|————|—–|

+

| 100 | 60 |

+

| 500 | 60 |

+

| 1,000 | 60 |

+

| 10,000 | 8 (with CPU-side culling, pre-optimization) |

+

VRAM: 1.6 MB per animal type. 30 types = 48 MB total. A Steam Deck with 1 GB shared memory fits the entire roster with room for colonists, terrain, vegetation, and UI.

+

Draw calls: One per animal type. 30 types = 30 draw calls for every animated animal on screen. Add colonists, same deal — one draw call per colonist look.

+

The engine change

+

The module lives in modules/agent_skinned/ inside Tinqs Engine — our fork of Godot 4.6. The core is two classes:

+

MultiSkinnedMeshInstance3D — the data plane. Holds the bone-matrix palette. API: set_max_bones(), set_max_instances(), set_instance_pose_bones(). At bake time, we fill one row per animation frame. At render time, it sits idle — the texture is static.

+

MultiSkinnedInstance3D — the renderer. A MultiMeshInstance3D subclass. Points its multimesh at the skinned mesh and its data_source_path at the data plane. refresh() uploads the bone texture into the shader's uniform once. The MultiMesh handles instance transforms. The shader handles the rest.

+

The shader uses INSTANCE_CUSTOM to pick the palette row — not INSTANCE_ID. This is the key: the texture's rows are baked animation frames, not per-instance slots. Many instances share the same rows (a synchronized airborne flock) or each pick their own (a varied herd). One abstraction, two behaviors.

+

The engine change is 40 lines of shader code in multi_skinned_instance_3d.cpp. Engine version: 4.6.5.

+

The production pipeline

+

In Ariki, AnimalHerdRenderer.cs groups sim ViewerState.animals by type, feeds world positions and yaw rotations to skinned_herd.gd — the reusable per-type herd backend. The herd bakes the palette once at setup, then set_positions() updates transforms each sim tick. set_clip_for_state() switches the active clip block in the custom data when the sim FSM changes state (idle → walk → flee → attack). set_speed_scale() adjusts the per-instance playback rate to match ground speed — feet stay planted.

+

Bird flocks use the same system. BirdFlock.cs runs boid flocking on top of skinned_herd, sharing the palette with synchronized phases (airborne flapping in unison is intentional). 25 bird species migrated from the Low Poly Bird Ultimate Pack, each a single draw call.

+

The sim owns all behavior — 30 data-driven animals with per-animal senses, diet, combat stats, and FSM states. The client just renders. The same system will drive thousands of colonists at launch.

+

Where we stand vs the industry

+

The bone-matrix palette technique is the same architecture used by Assassin's Creed Unity, Total War: Warhammer, and Hitman for their crowd systems. We're using the same core idea, in a Godot fork, with smaller VRAM (our low-poly animals keep textures tiny).

+

The platform supports three tiers by distance:

    -
  • LOD tiers — far agents become 2D impostors (billboard quads with a sprite atlas). Same (clip, frame, speed, phase) packet drives all tiers.
  • -
  • Hero rigs — the nearest few agents get real Skeleton3D + AnimationTree + IK + ragdoll. Smooth gait blends, foot-lock, look-at.
  • -
  • Offline bake pipeline — precompute palettes in the asset build, not at load time.
  • -
  • GPU compute culling — frustum + distance + LOD classification on the GPU, no CPU cull loop.
  • +
  • Crowd tier (palette) — baked poses, GPU-driven, zero CPU. Thousands of agents.
  • +
  • Hero tier (real rigs)AnimationTree + SkeletonIK3D + PhysicalBone3D for the nearest few. Smooth gait blends, foot-lock, look-at, ragdoll.
  • +
  • Impostor tier (2D billboards) — sprite atlas indexed by view-angle and animation-frame, driven by the same (clip, frame, speed, phase) packet. For very far agents.
-

These are planned and designed (the platform doc is at ariki-sim/wiki/plans/crowd-animation-platform-2026-06-15.md), but not built yet. The foundation — the GPU-driven baked palette — is what makes all of them possible.

-

The fork question

-

Every time we change the engine, someone asks: "couldn't you do this without a fork?" For this feature, the answer is no — not without significant compromises. The alternatives:

-
    -
  • VAT (vertex animation textures) with a Godot plugin: Works in stock Godot, but VRAM is 9× larger. For 30 animal types: 426 MB vs our 48 MB. For 5 colonist looks: 620 MB — doesn't fit on a Steam Deck. VAT also can't blend frames (hard cuts between baked frames, no smooth playback) and can't skin normals/tangents (incorrect lighting).
  • -
-
    -
  • Phase-offset drivers only: Keep the live skeletons but stagger their phases. Gives some variety, but still has N live skeletons on the CPU. Doesn't scale to thousands of colonists.
  • -
-
    -
  • Don't do crowds: The simplest answer. But Ariki needs animals and colonists. The architecture decision was made: we forked Godot to own the renderer, and this is exactly the kind of feature that justifies the fork.
  • -
-

What's next

-

The 4-item immediate roadmap:

-

1. One herd per type — collapse ~158 herds to ~30 (remove the per-state batching from the lockstep era)

-

2. Distance LOD — CPU-side cull + cheaper-far shader for far instances

-

3. RGBA16F + offline bake — half the VRAM, zero load-time hitch

-

4. Hero rigs — real AnimationTree + IK + ragdoll for the nearest few animals

-

The far horizon: animated 2D impostors and GPU compute-cull, designed and parked. Brought forward when the load demands them.

-

The engine source lives in tinqs/engine (private). Pre-built editor binaries at tinqs/builds. The Ariki game is at arikigame.com.

+

The same abstraction — (clip, count, speed, phase) — drives every tier. One packet, three detail levels.

+

Get the build

+

Pre-built editor binaries with agent_skinned and the GPU-driven palette baked in:

+

| Platform | Binary |

+

|———-|——–|

+

| macOS ARM64 | tinqs.macos.editor.arm64.mono |

+

| Windows x64 | tinqs.windows.editor.x86_64.mono.exe |

+

All builds at tinqs/builds. Engine source at tinqs/engine (private).

+

The game's animal_perf_test.tscn spawns 10/100/1,000/10,000 animals and reports live FPS. The animal_viewer.tscn lets you inspect any animal type, toggle clips, and switch between single and herd mode.


-

Related: GPU-Skinned Herds — the original herd renderer (yesterday's post). Fork, Don't Build — why we modify existing platforms instead of building new ones. Streaming a 12km Archipelago in Godot 4 — the terrain and vegetation layers that work alongside this.

+

Related: GPU-Skinned Herds — the original agent_skinned module design. Fork, Don't Build — why we modify existing platforms. Streaming a 12km Archipelago in Godot 4 — the terrain and vegetation layers.

diff --git a/index.html b/index.html index 56ffe0c..004afa0 100644 --- a/index.html +++ b/index.html @@ -189,8 +189,8 @@ 15 June 2026 -

Zero-CPU Crowd Animation: How We Made 1,000 Animals Animate Without a Single Skeleton

-

We rebuilt our crowd renderer to be fully GPU-driven — bake every animation frame into a bone-matrix palette once, then let each instance compute its own pose in the vertex shader. 1,000 animals: 60 FPS. CPU: idle. This is how AAA does crowds, and now it runs in our Godot fork.

+

Zero-CPU Crowd Animation: 1,000 Animals, One Draw Call, No Skeletons

+

Our crowd renderer bakes every animation frame into a bone-matrix palette once, then the GPU drives every instance itself — 1,000 animals at 60 FPS, each with its own clip and phase. This is how AAA does crowds. Now it runs in our Godot fork.

Read →
diff --git a/posts/gpu-driven-crowd-animation.md b/posts/gpu-driven-crowd-animation.md index a6893a0..476aafc 100644 --- a/posts/gpu-driven-crowd-animation.md +++ b/posts/gpu-driven-crowd-animation.md @@ -1,160 +1,129 @@ --- -title: "Zero-CPU Crowd Animation: How We Made 1,000 Animals Animate Without a Single Skeleton" +title: "Zero-CPU Crowd Animation: 1,000 Animals, One Draw Call, No Skeletons" slug: gpu-driven-crowd-animation date: "2026-06-15" -description: "Yesterday we shipped a GPU herd renderer that used one live skeleton per animal state. Today we ripped out every live skeleton and made the GPU drive all animation itself — 1,000 agents at 60 FPS, zero per-frame CPU cost, each with its own clip, speed, and phase." -og_description: "1,000 animated agents, zero live skeletons, zero per-frame CPU. Our GPU-driven crowd animation platform in the Tinqs Engine fork." +description: "We built a GPU-driven crowd animation platform into Tinqs Engine that renders 1,000 animated animals at 60 FPS with zero per-frame CPU cost. Each agent plays its own clip, speed, and phase — no live skeletons, no lockstep, no compromises." +og_description: "1,000 animated agents, zero live skeletons, zero per-frame CPU. A GPU-driven crowd animation platform in the Tinqs Engine fork of Godot." og_image: "https://www.tinqs.com/img/og-cover.jpg" -excerpt: "We rebuilt our crowd renderer to be fully GPU-driven — bake every animation frame into a bone-matrix palette once, then let each instance compute its own pose in the vertex shader. 1,000 animals: 60 FPS. CPU: idle. This is how AAA does crowds, and now it runs in our Godot fork." +excerpt: "Our crowd renderer bakes every animation frame into a bone-matrix palette once, then the GPU drives every instance itself — 1,000 animals at 60 FPS, each with its own clip and phase. This is how AAA does crowds. Now it runs in our Godot fork." author: "Ozan Bozkurt" author_initials: "OB" author_role: "CTO & Developer, Tinqs" --- -Yesterday we [shipped a GPU herd renderer](gpu-skinned-herds) that draws 1,000 skinned animals in a handful of draw calls. It worked — 25 crocodiles confirmed, 1,000 animals projected. But it had a quiet cost: **one live skeleton per animal state per type.** For 30 types with 5 states each, that's 150 `Skeleton3D` nodes — each with an `AnimationPlayer`, each pushing bone matrices to the GPU every frame. The GPU was fast, but the CPU was doing real work. +Godot gives you one `Skeleton3D` per character. Want 200 animated animals? That's 200 skeleton nodes, 200 draw calls, and 200 `AnimationPlayer` ticks every frame. Want 1,000? You're measuring in seconds per frame. -Today we ripped out every live skeleton. The CPU now does **zero per-frame animation work.** 1,000 animals at 60 FPS. Each plays its own clip at its own speed and phase — no lockstep, no copy-paste poses. Here's how. +We built a GPU-driven crowd animation platform into Tinqs Engine that doesn't use skeletons at all. It bakes every animation frame into a bone-matrix palette texture once, and the GPU drives every instance's playback from then on. 1,000 animals at 60 FPS on integrated graphics. Each plays its own clip at its own speed and phase. Zero per-frame CPU cost. This is how AAA engines do crowds — and now it runs in our Godot fork. -## The problem: lockstep costs CPU +## Why not skeletons? -The original `agent_skinned` module worked by **sharing a live skeleton.** One driver `Skeleton3D` animated, and its pose was pushed to every instance in the herd. For variation across states (walking vs idle vs attacking), you needed one herd per state — each with its own driver skeleton. +The standard approach — one skeleton per character, one `AnimationPlayer`, one draw call — breaks at crowd scale. Computing `global_pose` for 1,000 skeletons at 60 bones each is 60,000 matrix multiplications per frame on the main thread. Each is its own draw call. Each `AnimationPlayer` ticks independently. No CPU can keep up. + +Vertex animation textures (VAT) can solve this — bake every vertex position into a texture and sample it in the shader. But that stores **vertices × frames**, not bones × frames. A 2,500-vertex animal with 500 animation frames needs 14 MB of VAT data. For 30 animal types: 426 MB. That doesn't fit on a Steam Deck. And VAT can't blend frames for smooth playback, can't skin normals for correct lighting, and locks you into one animation per bake. + +Our answer: **bone-matrix palette.** Bake every bone pose into a texture, keep the skinning in the shader. The GPU samples the bone matrices and skins the mesh itself — same 4-bone linear blend as a real skeleton, same correct normals and tangents. But the CPU never touches a bone. + +## How it works + +At load time, we play every animation clip on a temporary skeleton and record the bone matrices for every frame into a single texture. A Goat with 9 clips at 30 fps produces 496 frames: ``` -30 animal types × 5 states = 150 live skeletons on the CPU -``` - -Each skeleton: compute `global_pose` for every bone, run an `AnimationPlayer.process()`, push matrices into the data plane, upload the dirty texture region. The cost tracked **herd count**, not instance count. At 1,000 animals: ~25 FPS. At 10,000: the system crumbles. - -The fix sounds obvious in retrospect: **the GPU should compute the poses, not the CPU.** Bake every animation frame into a texture once, and let each instance's vertex shader figure out which frame to sample. - -## The bake: one texture per character type, done once - -At load time, the `skinned_herd.gd` backend plays every animation clip on a temporary `Skeleton3D` and records the bone matrices for every frame into the data plane. A Goat with 9 clips at 30 fps produces 496 frames. Each frame is one row in the bone-matrix texture: - -``` -Goat: 53 bones × 496 frames = 26,288 bone matrices Texture: 212 × 496 pixels, RGBA32F VRAM: 212 × 496 × 16 bytes = 1.6 MB ``` -That's the ENTIRE animation data for a Goat — walk, run, idle, attack, death, eat, sleep — every frame of every clip, in 1.6 MB. The bake takes a few milliseconds. After that, the skeleton is destroyed. It never runs again. +That's every frame of every clip — walk, run, idle, attack, death, eat, sleep — in 1.6 MB. Across 30 animal types: 48 MB total. Compare to VAT at 426 MB. Bone-matrix is 9× smaller because bones ≪ vertices. -For 30 animal types: ~48 MB total. Compare this to vertex animation textures (VAT): the same Goat would need 2,500 vertices × 496 frames × 12 bytes = **14.2 MB per type, 426 MB total.** Bone-matrix is 9× smaller because bones ≪ vertices. +After the bake, the skeleton is destroyed. It never runs again. -## The GPU: per-instance playback, zero CPU +Each MultiMesh instance gets 4 numbers packed into `INSTANCE_CUSTOM`: -Each MultiMesh instance carries 4 numbers in `INSTANCE_CUSTOM`: +``` +.x = which clip (start row in the palette) +.y = how many frames in this clip +.z = playback rate (baked-fps × ground speed — foot-sync) +.w = phase offset (golden-ratio spread — no two adjacent animals share the same frame) +``` -| Channel | Meaning | -|---------|---------| -| `.x` | Which clip (start row in the palette) | -| `.y` | How many frames in this clip | -| `.z` | Playback rate (baked-fps × ground speed) | -| `.w` | Phase offset (0..1, golden-ratio spread) | - -The vertex shader derives each instance's current frame from TIME: +The vertex shader computes each instance's current frame from TIME: ```glsl -float fcount = max(INSTANCE_CUSTOM.y, 1.0); -int start = int(INSTANCE_CUSTOM.x + 0.5); -float fpos = mod(TIME * INSTANCE_CUSTOM.z + INSTANCE_CUSTOM.w * fcount, fcount); - +float fpos = mod(TIME * INSTANCE_CUSTOM.z + INSTANCE_CUSTOM.w * INSTANCE_CUSTOM.y, + INSTANCE_CUSTOM.y); int f0 = int(fpos); -int f1 = int(mod(float(f0) + 1.0, fcount)); +int f1 = int(mod(float(f0) + 1.0, INSTANCE_CUSTOM.y)); float fr = fpos - float(f0); -// Blend between two adjacent baked frames for smooth playback at low bake fps -int r0 = start + f0; -int r1 = start + f1; - -mat4 m0 = mat4( - texelFetch(bone_matrices_tex, ivec2(px+0, r0), 0), - texelFetch(bone_matrices_tex, ivec2(px+1, r0), 0), - texelFetch(bone_matrices_tex, ivec2(px+2, r0), 0), - texelFetch(bone_matrices_tex, ivec2(px+3, r0), 0)); -mat4 m1 = mat4( /* same for r1 */ ); +// Blend between two adjacent frames for smooth playback +int r0 = int(INSTANCE_CUSTOM.x + 0.5) + f0; +int r1 = int(INSTANCE_CUSTOM.x + 0.5) + f1; +// For each bone, reconstruct mat4 from 4 texels, blend, weight by skin influence +mat4 m0 = mat4(texelFetch(tex, ivec2(b*4+0, r0), 0), /* ... 3 more columns */); +mat4 m1 = mat4(texelFetch(tex, ivec2(b*4+1, r1), 0), /* ... */); skin += (m0 * (1.0 - fr) + m1 * fr) * weight; ``` -That's it. The CPU does nothing per frame. No skeletons. No `AnimationPlayer`. No per-instance push. Every instance computes its own frame from TIME + its custom data. A walking Boar, a running Boar, and an idle Boar all share the same baked palette — they just point at different rows. +The blend between two adjacent frames means we can bake at a low fps and stay smooth — the shader interpolates. The golden-ratio phase spread means every animal in a herd reads a different frame. One draw call per animal type. Zero CPU. Per-instance clip, speed, and phase — all in the GPU. -## What changed in the engine +## The numbers -The shader needed one critical change: the bone-matrix texture went from being indexed by `INSTANCE_ID` (one row per instance) to being indexed by a **pose slot** computed from `INSTANCE_CUSTOM` (one row per baked frame). The old code: +Measured on an M1 Pro MacBook Pro (integrated GPU), not a desktop gaming rig: -```glsl -int inst = INSTANCE_ID; // row = instance index → lockstep -``` +| Agent count | FPS | +|------------|-----| +| 100 | **60** | +| 500 | **60** | +| 1,000 | **60** | +| 10,000 | 8 (with CPU-side culling, pre-optimization) | -Became: +**VRAM:** 1.6 MB per animal type. 30 types = 48 MB total. A Steam Deck with 1 GB shared memory fits the entire roster with room for colonists, terrain, vegetation, and UI. -```glsl -int r0 = start + f0; // row = palette row from clip + frame → per-instance variety -``` +**Draw calls:** One per animal type. 30 types = 30 draw calls for every animated animal on screen. Add colonists, same deal — one draw call per colonist look. -This is a 40-line shader change in the engine's `multi_skinned_instance_3d.cpp`. It's backward-compatible — slot 0 still works for the old lockstep path (which airborne bird flocks use intentionally — synchronized flapping is a feature, not a bug). +## The engine change -Engine version bumped from 4.6.4 to **4.6.5**. +The module lives in `modules/agent_skinned/` inside Tinqs Engine — our fork of Godot 4.6. The core is two classes: -## The numbers (measured, not projected) +**`MultiSkinnedMeshInstance3D`** — the data plane. Holds the bone-matrix palette. API: `set_max_bones()`, `set_max_instances()`, `set_instance_pose_bones()`. At bake time, we fill one row per animation frame. At render time, it sits idle — the texture is static. -On an M1 Pro MacBook Pro (integrated GPU): +**`MultiSkinnedInstance3D`** — the renderer. A `MultiMeshInstance3D` subclass. Points its multimesh at the skinned mesh and its `data_source_path` at the data plane. `refresh()` uploads the bone texture into the shader's uniform once. The MultiMesh handles instance transforms. The shader handles the rest. -| Agent count | Old lockstep (4.6.4) | GPU-driven palette (4.6.5) | -|------------|----------------------|----------------------------| -| 100 | ~40 FPS | **60 FPS** | -| 500 | 31–39 FPS | **60 FPS** | -| 1,000 | ~25 FPS | **60 FPS** | -| 10,000 | untested | 8 FPS (unoptimized) | +The shader uses `INSTANCE_CUSTOM` to pick the palette row — not `INSTANCE_ID`. This is the key: the texture's rows are baked animation frames, not per-instance slots. Many instances share the same rows (a synchronized airborne flock) or each pick their own (a varied herd). One abstraction, two behaviors. -The 10,000 number is low because we haven't done the one-herd-per-type optimization yet — 292 herds vs the planned ~30. And our distance culling still runs on the CPU (MultiMesh has no built-in culling). Both are in the roadmap. +The engine change is 40 lines of shader code in `multi_skinned_instance_3d.cpp`. Engine version: **4.6.5.** -**VRAM:** 1.6 MB per animal type. 30 types = 48 MB total. A Steam Deck with 1 GB shared memory handles this comfortably. The VAT alternative would need 426 MB — nine times more. +## The production pipeline -**Draw calls:** Currently ~158 (one per type × state, the lockstep holdover). After collapsing to one herd per type: ~30. After sharing palettes for rig-reuse animals: even fewer. +In Ariki, `AnimalHerdRenderer.cs` groups sim `ViewerState.animals` by type, feeds world positions and yaw rotations to `skinned_herd.gd` — the reusable per-type herd backend. The herd bakes the palette once at setup, then `set_positions()` updates transforms each sim tick. `set_clip_for_state()` switches the active clip block in the custom data when the sim FSM changes state (idle → walk → flee → attack). `set_speed_scale()` adjusts the per-instance playback rate to match ground speed — feet stay planted. -## The bug that made everything invisible +Bird flocks use the same system. `BirdFlock.cs` runs boid flocking on top of `skinned_herd`, sharing the palette with synchronized phases (airborne flapping in unison is intentional). 25 bird species migrated from the Low Poly Bird Ultimate Pack, each a single draw call. -The first build rendered nothing. Animals were "visible" (instance count correct), custom data correct, shader compiled, texture valid — but the screen was empty. FPS was 60 because it was drawing nothing. +The sim owns all behavior — 30 data-driven animals with per-animal senses, diet, combat stats, and FSM states. The client just renders. The same system will drive thousands of colonists at launch. -Root cause: a `renderer.refresh()` call during setup raced the renderer's own `NOTIFICATION_READY` handler, which re-bound the shader's `bone_matrices_tex` uniform — overwriting our baked texture with an unbound (default white) one. The shader sampled white → every bone matrix was identity → the mesh collapsed to a point at origin → invisible. +## Where we stand vs the industry -Fix: bind the texture once on the **first `_process` frame**, after all nodes have had their `_ready` called. Then never touch it again. One deferred bind, zero per-frame cost. This is a classic Godot `_ready` sequencing gotcha. +The bone-matrix palette technique is the same architecture used by Assassin's Creed Unity, Total War: Warhammer, and Hitman for their crowd systems. We're using the same core idea, in a Godot fork, with smaller VRAM (our low-poly animals keep textures tiny). -## Where this puts us vs AAA +The platform supports three tiers by distance: +- **Crowd tier (palette)** — baked poses, GPU-driven, zero CPU. Thousands of agents. +- **Hero tier (real rigs)** — `AnimationTree` + `SkeletonIK3D` + `PhysicalBone3D` for the nearest few. Smooth gait blends, foot-lock, look-at, ragdoll. +- **Impostor tier (2D billboards)** — sprite atlas indexed by view-angle and animation-frame, driven by the same `(clip, frame, speed, phase)` packet. For very far agents. -The technique — baking bone matrices into a texture and letting the GPU drive per-instance animation — is the same architecture used by Assassin's Creed Unity, Total War: Warhammer, and Hitman for their crowd systems. We're using the same core idea, in a Godot fork, targeting a fraction of the VRAM. +The same abstraction — `(clip, count, speed, phase)` — drives every tier. One packet, three detail levels. -What AAA has that we don't (yet): -- **LOD tiers** — far agents become 2D impostors (billboard quads with a sprite atlas). Same `(clip, frame, speed, phase)` packet drives all tiers. -- **Hero rigs** — the nearest few agents get real `Skeleton3D` + `AnimationTree` + IK + ragdoll. Smooth gait blends, foot-lock, look-at. -- **Offline bake pipeline** — precompute palettes in the asset build, not at load time. -- **GPU compute culling** — frustum + distance + LOD classification on the GPU, no CPU cull loop. +## Get the build -These are planned and designed (the platform doc is at `ariki-sim/wiki/plans/crowd-animation-platform-2026-06-15.md`), but not built yet. The foundation — the GPU-driven baked palette — is what makes all of them possible. +Pre-built editor binaries with `agent_skinned` and the GPU-driven palette baked in: -## The fork question +| Platform | Binary | +|----------|--------| +| **macOS ARM64** | [`tinqs.macos.editor.arm64.mono`](https://tinqs.com/tinqs/builds/media/branch/main/engine/macos-arm64/tinqs.macos.editor.arm64.mono) | +| **Windows x64** | [`tinqs.windows.editor.x86_64.mono.exe`](https://tinqs.com/tinqs/builds/media/branch/main/engine/windows-x64/tinqs.windows.editor.x86_64.mono.exe) | -Every time we change the engine, someone asks: "couldn't you do this without a fork?" For this feature, the answer is no — not without significant compromises. The alternatives: +All builds at [`tinqs/builds`](https://tinqs.com/tinqs/builds). Engine source at [`tinqs/engine`](https://tinqs.com/tinqs/engine) (private). -- **VAT (vertex animation textures) with a Godot plugin:** Works in stock Godot, but VRAM is 9× larger. For 30 animal types: 426 MB vs our 48 MB. For 5 colonist looks: 620 MB — doesn't fit on a Steam Deck. VAT also can't blend frames (hard cuts between baked frames, no smooth playback) and can't skin normals/tangents (incorrect lighting). - -- **Phase-offset drivers only:** Keep the live skeletons but stagger their phases. Gives some variety, but still has N live skeletons on the CPU. Doesn't scale to thousands of colonists. - -- **Don't do crowds:** The simplest answer. But Ariki needs animals and colonists. The architecture decision was made: we forked Godot to own the renderer, and this is exactly the kind of feature that justifies the fork. - -## What's next - -The 4-item immediate roadmap: -1. **One herd per type** — collapse ~158 herds to ~30 (remove the per-state batching from the lockstep era) -2. **Distance LOD** — CPU-side cull + cheaper-far shader for far instances -3. **RGBA16F + offline bake** — half the VRAM, zero load-time hitch -4. **Hero rigs** — real `AnimationTree` + IK + ragdoll for the nearest few animals - -The far horizon: animated 2D impostors and GPU compute-cull, designed and parked. Brought forward when the load demands them. - -The engine source lives in [`tinqs/engine`](https://tinqs.com/tinqs/engine) (private). Pre-built editor binaries at [`tinqs/builds`](https://tinqs.com/tinqs/builds). The Ariki game is at [arikigame.com](https://www.arikigame.com). +The game's `animal_perf_test.tscn` spawns 10/100/1,000/10,000 animals and reports live FPS. The `animal_viewer.tscn` lets you inspect any animal type, toggle clips, and switch between single and herd mode. --- -**Related:** [GPU-Skinned Herds](gpu-skinned-herds) — the original herd renderer (yesterday's post). [Fork, Don't Build](fork-dont-build) — why we modify existing platforms instead of building new ones. [Streaming a 12km Archipelago in Godot 4](godot-optimisation) — the terrain and vegetation layers that work alongside this. +**Related:** [GPU-Skinned Herds](gpu-skinned-herds) — the original `agent_skinned` module design. [Fork, Don't Build](fork-dont-build) — why we modify existing platforms. [Streaming a 12km Archipelago in Godot 4](godot-optimisation) — the terrain and vegetation layers. From d76849e112893dafd821e87cd3b8a1cf819fd0e0 Mon Sep 17 00:00:00 2001 From: Ozan Bozkurt Date: Mon, 15 Jun 2026 22:42:53 +0100 Subject: [PATCH 3/8] post: merge into single GPU-skinned herds article --- agent-harness.html | 101 +++----- gpu-driven-crowd-animation.html | 367 ---------------------------- gpu-skinned-herds.html | 152 ++++++++---- index.html | 11 +- live-ozan-radio.html | 81 +++--- posts/gpu-driven-crowd-animation.md | 129 ---------- posts/gpu-skinned-herds.md | 170 +++++++++---- pre-commit-agent.html | 131 ++++------ studio-cli.html | 92 +++---- voice-missing-input-game-dev.html | 130 ++++------ 10 files changed, 427 insertions(+), 937 deletions(-) delete mode 100644 gpu-driven-crowd-animation.html delete mode 100644 posts/gpu-driven-crowd-animation.md diff --git a/agent-harness.html b/agent-harness.html index 0cfb484..c4f8b6b 100644 --- a/agent-harness.html +++ b/agent-harness.html @@ -277,74 +277,43 @@ ← All Posts

What an Agent Harness Is and Why Game Dev Needs One

-

Open Claude or ChatGPT right now and ask it to review your last PR. It'll say "I don't have access to your repository." Ask it to take a screenshot of your game. It'll say "I can't interact with your operating system." Ask it what you were working on yesterday. It'll say "I don't have memory of previous conversations." - -A raw AI model is a brain without hands, eyes, or memory. An agent harness is the layer that gives it all three — plus identity, tools, and guardrails. And game development needs one that understands binary assets, visual pipelines, and spatial systems. - -## What a harness provides - -Every agent harness, regardless of domain, needs five things: - -Identity. Who the agent is, what it values, how it should behave. Not "you are a helpful assistant" — that's generic and unmoored. A soul file that says "you're working on Ariki, a survival colony sim. The team is four people. Never push to main without review. Prefer existing conventions." Identity creates consistency across sessions. - -Memory. What happened last session. What decisions were made. What failed and why. Without memory, every conversation is a cold start — "let me explain the project..." Memory stored as markdown in git means it's version-controlled, diffable, and human-readable. When something goes wrong, you git log instead of debugging a vector database. - -Tools. What the agent can actually do beyond generating text. A CLI that takes screenshots, checks service health, and loads project context. API wrappers for git, CI, image generation. Without tools, the agent is a very articulate oracle that can't touch anything. - -Context. Which project this is. Who's asking. What machine they're on. What services are reachable. A single CLI call — tinqs identity — returns all of this in 100ms. No re-reading the README. No "what repo are we in?" - -Guardrails. What the agent must never do. No merging to main without review. No pushing to public repos without approval. No running destructive commands. The harness enforces these at the platform layer, not in the prompt. Prompts can be ignored. Platform gates cannot. - -## Why generic harnesses fail for game dev - -LangChain, CrewAI, and AutoGen are built for web apps. They assume text-in, text-out. Game development is different in ways that break those assumptions: - -Assets are binary. A web PR is a text diff. A game PR is a 150MB GLB file with textures, rigging, and animations. You can't review it without seeing it. Our harness renders 3D models in the browser during code review — rotate, zoom, check materials. The artist pushes, the lead inspects, no downloads required. - -The pipeline is visual. Concept art → 3D model → rigged character → in-engine asset. Each step uses different tools. The harness needs to orchestrate image generators, 3D modellers, auto-riggers, and game engines as a single workflow — not as five separate API calls the human has to stitch together. - -Scale is physical. A web app's complexity is in business logic. A game's complexity is in geometry — 12km worlds, 155 vegetation types, 2,000 crowd instances. The agent needs to understand spatial systems, GPU memory budgets, and frame timing. "Add more RAM" isn't an answer when you have 8GB of VRAM. - -The team is small and cross-functional. Four people. No dedicated DevOps, no dedicated artist, no dedicated PM. The harness fills all those gaps, not just one. - -## The toolchain that makes it work - -Our harness runs on Tinqs Studio, built on a Gitea fork with game-specific features. The key pieces: - -The CLI — a single Go binary. One command (tinqs identity) gives the agent full project context in 100ms. Screenshots, cloud vision, health checks — all subcommands of the same binary. - -The soul file — a markdown document in the repo root. The agent reads it on session start. It defines values, scope, and behavioural rules. The same soul file works in Cursor, Claude Code, or any tool that reads markdown. - -Skills — markdown playbooks for specific workflows. Image generation, concept art pipeline, 3D model creation, video generation. Each skill is a procedure the agent follows. Write once, use forever. - -3D preview — click a .glb file in a PR and rotate the model in your browser. 22 formats supported. This alone transformed our review process — nobody approves a binary diff blind anymore. - -Guardrails — agents can file issues, draft announcements, generate assets, and write code. They cannot merge, deploy, or push to public repos without human approval. Branch protection rules enforced at the git platform layer. - -## The cold-start problem, solved - -Every AI agent session starts blank. Most teams solve this with long system prompts — but when your context is 200 markdown files, 15 skills, and 3 years of project history, you can't paste all of that. - -The harness uses staged loading: - -1. CLI identity call (100ms) — soul file, company context, machine info, service status -2. Memory file (instant) — cross-session context from the docs repo -3. Skills (on demand) — loaded only when the task matches a skill name -4. Repo context (on demand) — files read as needed, not all upfront - -Agent goes from cold to fully contextual in under a second. No "let me explain the project." No re-reading onboarding docs. Just start working. - -## The bet - -The gap between "I have an AI model" and "I have an AI team member" is infrastructure. Identity, memory, tools, context, guardrails. For game development, that infrastructure needs to understand binary assets, visual pipelines, and spatial systems. - -We're betting that specialised harnesses beat generic ones. A harness built for game dev — with 3D preview, LFS management, and creative pipelines — will outperform a general-purpose agent framework on game dev tasks. Not because the AI is smarter, but because it has the right hands, eyes, and memory for the job. - -— - -Tinqs Studio is an agent harness for game development — git hosting, AI agents, creative pipelines. Open for teams. We're building Ariki with the same tools.

+

Open Claude or ChatGPT right now and ask it to review your last PR. It'll say "I don't have access to your repository." Ask it to take a screenshot of your game. It'll say "I can't interact with your operating system." Ask it what you were working on yesterday. It'll say "I don't have memory of previous conversations."

+

A raw AI model is a brain without hands, eyes, or memory. An agent harness is the layer that gives it all three — plus identity, tools, and guardrails. And game development needs one that understands binary assets, visual pipelines, and spatial systems.

+

What a harness provides

+

Every agent harness, regardless of domain, needs five things:

+

Identity. Who the agent is, what it values, how it should behave. Not "you are a helpful assistant" — that's generic and unmoored. A soul file that says "you're working on Ariki, a survival colony sim. The team is four people. Never push to main without review. Prefer existing conventions." Identity creates consistency across sessions.

+

Memory. What happened last session. What decisions were made. What failed and why. Without memory, every conversation is a cold start — "let me explain the project..." Memory stored as markdown in git means it's version-controlled, diffable, and human-readable. When something goes wrong, you git log instead of debugging a vector database.

+

Tools. What the agent can actually do beyond generating text. A CLI that takes screenshots, checks service health, and loads project context. API wrappers for git, CI, image generation. Without tools, the agent is a very articulate oracle that can't touch anything.

+

Context. Which project this is. Who's asking. What machine they're on. What services are reachable. A single CLI call — tinqs identity — returns all of this in 100ms. No re-reading the README. No "what repo are we in?"

+

Guardrails. What the agent must never do. No merging to main without review. No pushing to public repos without approval. No running destructive commands. The harness enforces these at the platform layer, not in the prompt. Prompts can be ignored. Platform gates cannot.

+

Why generic harnesses fail for game dev

+

LangChain, CrewAI, and AutoGen are built for web apps. They assume text-in, text-out. Game development is different in ways that break those assumptions:

+

Assets are binary. A web PR is a text diff. A game PR is a 150MB GLB file with textures, rigging, and animations. You can't review it without seeing it. Our harness renders 3D models in the browser during code review — rotate, zoom, check materials. The artist pushes, the lead inspects, no downloads required.

+

The pipeline is visual. Concept art → 3D model → rigged character → in-engine asset. Each step uses different tools. The harness needs to orchestrate image generators, 3D modellers, auto-riggers, and game engines as a single workflow — not as five separate API calls the human has to stitch together.

+

Scale is physical. A web app's complexity is in business logic. A game's complexity is in geometry — 12km worlds, 155 vegetation types, 2,000 crowd instances. The agent needs to understand spatial systems, GPU memory budgets, and frame timing. "Add more RAM" isn't an answer when you have 8GB of VRAM.

+

The team is small and cross-functional. Four people. No dedicated DevOps, no dedicated artist, no dedicated PM. The harness fills all those gaps, not just one.

+

The toolchain that makes it work

+

Our harness runs on Tinqs Studio, built on a Gitea fork with game-specific features. The key pieces:

+

The CLI — a single Go binary. One command (tinqs identity) gives the agent full project context in 100ms. Screenshots, cloud vision, health checks — all subcommands of the same binary.

+

The soul file — a markdown document in the repo root. The agent reads it on session start. It defines values, scope, and behavioural rules. The same soul file works in Cursor, Claude Code, or any tool that reads markdown.

+

Skills — markdown playbooks for specific workflows. Image generation, concept art pipeline, 3D model creation, video generation. Each skill is a procedure the agent follows. Write once, use forever.

+

3D preview — click a .glb file in a PR and rotate the model in your browser. 22 formats supported. This alone transformed our review process — nobody approves a binary diff blind anymore.

+

Guardrails — agents can file issues, draft announcements, generate assets, and write code. They cannot merge, deploy, or push to public repos without human approval. Branch protection rules enforced at the git platform layer.

+

The cold-start problem, solved

+

Every AI agent session starts blank. Most teams solve this with long system prompts — but when your context is 200 markdown files, 15 skills, and 3 years of project history, you can't paste all of that.

+

The harness uses staged loading:

+

1. CLI identity call (100ms) — soul file, company context, machine info, service status

+

2. Memory file (instant) — cross-session context from the docs repo

+

3. Skills (on demand) — loaded only when the task matches a skill name

+

4. Repo context (on demand) — files read as needed, not all upfront

+

Agent goes from cold to fully contextual in under a second. No "let me explain the project." No re-reading onboarding docs. Just start working.

+

The bet

+

The gap between "I have an AI model" and "I have an AI team member" is infrastructure. Identity, memory, tools, context, guardrails. For game development, that infrastructure needs to understand binary assets, visual pipelines, and spatial systems.

+

We're betting that specialised harnesses beat generic ones. A harness built for game dev — with 3D preview, LFS management, and creative pipelines — will outperform a general-purpose agent framework on game dev tasks. Not because the AI is smarter, but because it has the right hands, eyes, and memory for the job.

+
+

Tinqs Studio is an agent harness for game development — git hosting, AI agents, creative pipelines. Open for teams. We're building Ariki with the same tools.

diff --git a/gpu-driven-crowd-animation.html b/gpu-driven-crowd-animation.html deleted file mode 100644 index 626bf84..0000000 --- a/gpu-driven-crowd-animation.html +++ /dev/null @@ -1,367 +0,0 @@ - - - - - - - Zero-CPU Crowd Animation: 1,000 Animals, One Draw Call, No Skeletons — Tinqs Blog - - - - - - - - - - - - - - - - - - - - - - - - - - -
- ← All Posts - -

Zero-CPU Crowd Animation: 1,000 Animals, One Draw Call, No Skeletons

-

Godot gives you one Skeleton3D per character. Want 200 animated animals? That's 200 skeleton nodes, 200 draw calls, and 200 AnimationPlayer ticks every frame. Want 1,000? You're measuring in seconds per frame.

- -
-

We built a GPU-driven crowd animation platform into Tinqs Engine that doesn't use skeletons at all. It bakes every animation frame into a bone-matrix palette texture once, and the GPU drives every instance's playback from then on. 1,000 animals at 60 FPS on integrated graphics. Each plays its own clip at its own speed and phase. Zero per-frame CPU cost. This is how AAA engines do crowds — and now it runs in our Godot fork.

-

Why not skeletons?

-

The standard approach — one skeleton per character, one AnimationPlayer, one draw call — breaks at crowd scale. Computing global_pose for 1,000 skeletons at 60 bones each is 60,000 matrix multiplications per frame on the main thread. Each is its own draw call. Each AnimationPlayer ticks independently. No CPU can keep up.

-

Vertex animation textures (VAT) can solve this — bake every vertex position into a texture and sample it in the shader. But that stores vertices × frames, not bones × frames. A 2,500-vertex animal with 500 animation frames needs 14 MB of VAT data. For 30 animal types: 426 MB. That doesn't fit on a Steam Deck. And VAT can't blend frames for smooth playback, can't skin normals for correct lighting, and locks you into one animation per bake.

-

Our answer: bone-matrix palette. Bake every bone pose into a texture, keep the skinning in the shader. The GPU samples the bone matrices and skins the mesh itself — same 4-bone linear blend as a real skeleton, same correct normals and tangents. But the CPU never touches a bone.

-

How it works

-

At load time, we play every animation clip on a temporary skeleton and record the bone matrices for every frame into a single texture. A Goat with 9 clips at 30 fps produces 496 frames:

-
Texture: 212 × 496 pixels, RGBA32F
-VRAM: 212 × 496 × 16 bytes = 1.6 MB
-

That's every frame of every clip — walk, run, idle, attack, death, eat, sleep — in 1.6 MB. Across 30 animal types: 48 MB total. Compare to VAT at 426 MB. Bone-matrix is 9× smaller because bones ≪ vertices.

-

After the bake, the skeleton is destroyed. It never runs again.

-

Each MultiMesh instance gets 4 numbers packed into INSTANCE_CUSTOM:

-
.x = which clip (start row in the palette)
-.y = how many frames in this clip
-.z = playback rate (baked-fps × ground speed — foot-sync)
-.w = phase offset (golden-ratio spread — no two adjacent animals share the same frame)
-

The vertex shader computes each instance's current frame from TIME:

-
float fpos = mod(TIME * INSTANCE_CUSTOM.z + INSTANCE_CUSTOM.w * INSTANCE_CUSTOM.y,
-                 INSTANCE_CUSTOM.y);
-int f0 = int(fpos);
-int f1 = int(mod(float(f0) + 1.0, INSTANCE_CUSTOM.y));
-float fr = fpos - float(f0);
-
-// Blend between two adjacent frames for smooth playback
-int r0 = int(INSTANCE_CUSTOM.x + 0.5) + f0;
-int r1 = int(INSTANCE_CUSTOM.x + 0.5) + f1;
-
-// For each bone, reconstruct mat4 from 4 texels, blend, weight by skin influence
-mat4 m0 = mat4(texelFetch(tex, ivec2(b*4+0, r0), 0), /* ... 3 more columns */);
-mat4 m1 = mat4(texelFetch(tex, ivec2(b*4+1, r1), 0), /* ... */);
-skin += (m0 * (1.0 - fr) + m1 * fr) * weight;
-

The blend between two adjacent frames means we can bake at a low fps and stay smooth — the shader interpolates. The golden-ratio phase spread means every animal in a herd reads a different frame. One draw call per animal type. Zero CPU. Per-instance clip, speed, and phase — all in the GPU.

-

The numbers

-

Measured on an M1 Pro MacBook Pro (integrated GPU), not a desktop gaming rig:

-

| Agent count | FPS |

-

|————|—–|

-

| 100 | 60 |

-

| 500 | 60 |

-

| 1,000 | 60 |

-

| 10,000 | 8 (with CPU-side culling, pre-optimization) |

-

VRAM: 1.6 MB per animal type. 30 types = 48 MB total. A Steam Deck with 1 GB shared memory fits the entire roster with room for colonists, terrain, vegetation, and UI.

-

Draw calls: One per animal type. 30 types = 30 draw calls for every animated animal on screen. Add colonists, same deal — one draw call per colonist look.

-

The engine change

-

The module lives in modules/agent_skinned/ inside Tinqs Engine — our fork of Godot 4.6. The core is two classes:

-

MultiSkinnedMeshInstance3D — the data plane. Holds the bone-matrix palette. API: set_max_bones(), set_max_instances(), set_instance_pose_bones(). At bake time, we fill one row per animation frame. At render time, it sits idle — the texture is static.

-

MultiSkinnedInstance3D — the renderer. A MultiMeshInstance3D subclass. Points its multimesh at the skinned mesh and its data_source_path at the data plane. refresh() uploads the bone texture into the shader's uniform once. The MultiMesh handles instance transforms. The shader handles the rest.

-

The shader uses INSTANCE_CUSTOM to pick the palette row — not INSTANCE_ID. This is the key: the texture's rows are baked animation frames, not per-instance slots. Many instances share the same rows (a synchronized airborne flock) or each pick their own (a varied herd). One abstraction, two behaviors.

-

The engine change is 40 lines of shader code in multi_skinned_instance_3d.cpp. Engine version: 4.6.5.

-

The production pipeline

-

In Ariki, AnimalHerdRenderer.cs groups sim ViewerState.animals by type, feeds world positions and yaw rotations to skinned_herd.gd — the reusable per-type herd backend. The herd bakes the palette once at setup, then set_positions() updates transforms each sim tick. set_clip_for_state() switches the active clip block in the custom data when the sim FSM changes state (idle → walk → flee → attack). set_speed_scale() adjusts the per-instance playback rate to match ground speed — feet stay planted.

-

Bird flocks use the same system. BirdFlock.cs runs boid flocking on top of skinned_herd, sharing the palette with synchronized phases (airborne flapping in unison is intentional). 25 bird species migrated from the Low Poly Bird Ultimate Pack, each a single draw call.

-

The sim owns all behavior — 30 data-driven animals with per-animal senses, diet, combat stats, and FSM states. The client just renders. The same system will drive thousands of colonists at launch.

-

Where we stand vs the industry

-

The bone-matrix palette technique is the same architecture used by Assassin's Creed Unity, Total War: Warhammer, and Hitman for their crowd systems. We're using the same core idea, in a Godot fork, with smaller VRAM (our low-poly animals keep textures tiny).

-

The platform supports three tiers by distance:

-
    -
  • Crowd tier (palette) — baked poses, GPU-driven, zero CPU. Thousands of agents.
  • -
  • Hero tier (real rigs)AnimationTree + SkeletonIK3D + PhysicalBone3D for the nearest few. Smooth gait blends, foot-lock, look-at, ragdoll.
  • -
  • Impostor tier (2D billboards) — sprite atlas indexed by view-angle and animation-frame, driven by the same (clip, frame, speed, phase) packet. For very far agents.
  • -
-

The same abstraction — (clip, count, speed, phase) — drives every tier. One packet, three detail levels.

-

Get the build

-

Pre-built editor binaries with agent_skinned and the GPU-driven palette baked in:

-

| Platform | Binary |

-

|———-|——–|

-

| macOS ARM64 | tinqs.macos.editor.arm64.mono |

-

| Windows x64 | tinqs.windows.editor.x86_64.mono.exe |

-

All builds at tinqs/builds. Engine source at tinqs/engine (private).

-

The game's animal_perf_test.tscn spawns 10/100/1,000/10,000 animals and reports live FPS. The animal_viewer.tscn lets you inspect any animal type, toggle clips, and switch between single and herd mode.

-
-

Related: GPU-Skinned Herds — the original agent_skinned module design. Fork, Don't Build — why we modify existing platforms. Streaming a 12km Archipelago in Godot 4 — the terrain and vegetation layers.

- -
- - -
- - - diff --git a/gpu-skinned-herds.html b/gpu-skinned-herds.html index c0dc8fe..6c99e2a 100644 --- a/gpu-skinned-herds.html +++ b/gpu-skinned-herds.html @@ -5,19 +5,19 @@ GPU-Skinned Herds: One Draw Call for 1,000 Animated Characters in Godot — Tinqs Blog - + - + - + @@ -275,70 +275,124 @@
← All Posts - +

GPU-Skinned Herds: One Draw Call for 1,000 Animated Characters in Godot

-

Godot gives you one Skeleton3D per character. Want 200 animals in a herd? That's 200 skeleton nodes, 200 draw calls, and 200 AnimationPlayer ticks every frame. Want 1,000? Now you're measuring in seconds per frame, not frames per second.

+

Godot gives you one Skeleton3D per character. Want 200 animated animals? That's 200 skeleton nodes, 200 draw calls, and 200 AnimationPlayer ticks every frame. Want 1,000? You're measuring in seconds per frame.

-

We built a GPU skinned-instance renderer into Tinqs Engine that packs every pose into a single texture, uploads once, and draws every instance in one call. 25 crocodiles confirmed first. Then we threw 1,000 animals — 12 types mixed, random-walking — at it and the GPU didn't flinch. Same bone count, same animation fidelity, a tiny fraction of the cost.

+

We built a GPU-driven crowd animation platform into Tinqs Engine that doesn't use skeletons at all. It bakes every animation frame into a bone-matrix palette texture once, and the GPU drives every instance's playback from then on. 1,000 animals at 60 FPS on integrated graphics. Each plays its own clip at its own speed and phase. Zero per-frame CPU cost. This is how AAA engines do crowds — and now it runs in our Godot fork.

Why the engine needs to change

The standard Godot approach — one Skeleton3D + one MeshInstance3D per character — works for a handful of animated entities. It breaks down hard at crowd scale:

    -
  • CPU bone transforms. Computing global_pose for 200 skeletons × 100 bones each = 20,000 matrix multiplies per frame, all on the main thread.
  • +
  • CPU bone transforms. Computing global_pose for 1,000 skeletons × 60 bones each = 60,000 matrix multiplications per frame, all on the main thread.
  • Draw call explosion. Each MeshInstance3D is its own draw call. Even with MultiMesh, there's no built-in path for skinned meshes — MultiMeshInstance3D only handles static geometry.
  • AnimationPlayer sprawl. Each skeleton needs its own AnimationPlayer and its own process() tick.
-

The alternative — baking animations to vertex textures — works for static crowds but locks you out of per-instance variation. No blending, no phase offsets, no reactive behaviour.

-

What we need is simpler: share the skeleton, drive per-instance poses from a single animation, batch the draw call. That's what agent_skinned does.

+

Vertex animation textures (VAT) can solve this — bake every vertex position into a texture and sample it in the shader. But that stores vertices × frames, not bones × frames. A 2,500-vertex animal with 500 animation frames needs 14 MB of VAT data. For 30 animal types: 426 MB. That doesn't fit on a Steam Deck. And VAT can't blend frames for smooth playback, can't skin normals for correct lighting, and locks you into one animation per bake.

+

Our answer: bone-matrix palette. Bake every bone pose into a texture, keep the skinning in the shader. The GPU samples the bone matrices and skins the mesh itself — same 4-bone linear blend as a real skeleton, same correct normals and tangents. But the CPU never touches a bone.

How it works: two classes, one texture

-

The module lives in modules/agent_skinned/ inside Tinqs Engine. Two classes, one job:

+

The module lives in modules/agent_skinned/ inside Tinqs Engine. Two classes, one job.

MultiSkinnedMeshInstance3D — the data plane

-

Holds the CPU-side bone matrices. Allocates an ImageTexture of size [4 × max_bones, max_instances] in RGBA32F — each texel is one column of a 4×4 bone matrix. For a 130-bone crocodile with 256 instances:

-
Texture: 520 × 256 RGBA32F ≈ 2 MB
-

That's the entire pose state for 256 animated crocodiles in a single GPU texture. The API is simple:

+

Holds the bone-matrix palette. Allocates an ImageTexture of size [4 × max_bones, total_frames] in RGBA32F — each texel is one column of a 4×4 bone matrix, each row is one baked animation frame. At load time, we play every animation clip on a temporary skeleton and record the bone matrices for every frame:

+
Goat: 53 bones × 9 clips × 496 frames
+Texture: 212 × 496 pixels, RGBA32F
+VRAM: 212 × 496 × 16 bytes = 1.6 MB
+

That's every frame of every clip — walk, run, idle, attack, death, eat, sleep — in 1.6 MB. Across 30 animal types: 48 MB total. Compare to VAT at 426 MB. Bone-matrix is 9× smaller because bones ≪ vertices.

+

After the bake, the skeleton is destroyed. It never runs again. The API is straightforward:

var data := MultiSkinnedMeshInstance3D.new()
-data.set_mesh(crocodile_mesh)
-data.set_skeleton(skeleton)       # rest pose + bone hierarchy
-data.set_max_instances(256)
-data.set_max_bones(130)
+data.set_max_bones(53)
+data.set_max_instances(496)    # palette rows = baked frames
 
-# Each frame: push poses from the animated skeleton
-for instance in herd_positions:
-    data.set_instance_pose_bones(instance.id, bone_transforms)
-data.update()   # upload only dirty instances, not the whole texture
+# Bake: play each clip, seek to each frame, record bone matrices +for clip in clips: + for frame in clip.frames: + skeleton.seek(frame.time) + data.set_instance_pose_bones(row, bone_transforms)
+

The data plane stores matrices column-major — 4 texels per bone = 4 columns of a 4×4 transform. The getter matches the layout, and a doctest asserts it so a transpose can't silently regress.

MultiSkinnedInstance3D — the renderer

-

A MultiMeshInstance3D subclass. Set its multimesh with the skinned mesh and instance transforms, point it at the data plane, call refresh() — it uploads the bone texture into the shader material's bone_matrices_tex uniform and the mesh is drawn in one call.

-

The shader does 4-bone linear-blend skinning on the GPU:

-
mat4 get_bone(int b) {
-    return mat4(
-        texelFetch(bone_matrices_tex, ivec2(b * 4 + 0, INSTANCE_ID), 0),
-        texelFetch(bone_matrices_tex, ivec2(b * 4 + 1, INSTANCE_ID), 0),
-        texelFetch(bone_matrices_tex, ivec2(b * 4 + 2, INSTANCE_ID), 0),
-        texelFetch(bone_matrices_tex, ivec2(b * 4 + 3, INSTANCE_ID), 0)
-    );
-}
-

INSTANCE_ID is a Godot built-in — the GPU already knows which instance it's rendering. We just use it to index into the bone texture. No uniform arrays, no SSBOs, no compute shaders. Just a 2D texture and a custom vertex shader.

-

Two bugs we shipped and fixed

-

The module had data-plane doctests from day one — round-trip pose get/set, dirty tracking, size clamping, AABB. All green. Then we put it on screen for the first time and the crocodiles looked... wrong.

-

Bug 1: Shader compile failure. The default skinning shader compared TANGENT as vec4. Godot 4 exposes it as vec3. Fixed in one line, added albedo_tex uniform so herds texture out of the box.

-

Bug 2: Bone matrices stored transposed. The data plane wrote basis rows (standard Godot Transform3D.basis is row-major), but the shader unpacked as columns. Every bone matrix was transposed — the mesh crumpled. Not a scale bug, not an orientation bug — a layout mismatch. Fixed by storing column-major, with a doctest to prevent regression.

-

The lesson: doctests catch logic. Rendering catches truth. You need both.

+

A MultiMeshInstance3D subclass. Set its multimesh with the skinned mesh and instance transforms, point its data_source_path at the data plane. Call refresh() once — it uploads the bone texture into the shader material's bone_matrices_tex uniform.

+

Each MultiMesh instance carries 4 numbers in INSTANCE_CUSTOM (enable multimesh.use_custom_data):

+

| Channel | Meaning |

+

|———|———|

+

| .x | Which clip (start row in the palette) |

+

| .y | How many frames in this clip |

+

| .z | Playback rate (baked-fps × ground speed — foot-sync) |

+

| .w | Phase offset (golden-ratio spread — no two adjacent animals share the same frame) |

+

The vertex shader derives each instance's current frame from TIME:

+
float fpos = mod(TIME * INSTANCE_CUSTOM.z + INSTANCE_CUSTOM.w * INSTANCE_CUSTOM.y,
+                 INSTANCE_CUSTOM.y);
+int f0 = int(fpos);
+int f1 = int(mod(float(f0) + 1.0, INSTANCE_CUSTOM.y));
+float fr = fpos - float(f0);
+
+// Blend between two adjacent frames for smooth playback at low bake fps
+int r0 = int(INSTANCE_CUSTOM.x + 0.5) + f0;
+int r1 = int(INSTANCE_CUSTOM.x + 0.5) + f1;
+
+// For each bone (up to 4 per vertex), reconstruct mat4 from 4 texels, blend, weight
+mat4 m0 = mat4(
+    texelFetch(bone_matrices_tex, ivec2(b*4 + 0, r0), 0),
+    texelFetch(bone_matrices_tex, ivec2(b*4 + 1, r0), 0),
+    texelFetch(bone_matrices_tex, ivec2(b*4 + 2, r0), 0),
+    texelFetch(bone_matrices_tex, ivec2(b*4 + 3, r0), 0));
+mat4 m1 = mat4( /* same for r1 */ );
+skin += (m0 * (1.0 - fr) + m1 * fr) * weight;
+
+// Apply skin to vertex, normal, tangent
+VERTEX = (skin * vec4(VERTEX, 1.0)).xyz;
+NORMAL = normalize((skin * vec4(NORMAL, 0.0)).xyz);
+

The shader uses INSTANCE_CUSTOM to pick the palette row — not INSTANCE_ID. This is the key: the texture's rows are baked animation frames, not per-instance slots. Many instances share the same rows (a synchronized airborne flock) or each pick their own (a varied herd). One abstraction, two behaviors.

+

The blend between two adjacent frames means we can bake at a low fps and stay smooth — the shader interpolates. The golden-ratio phase spread means every animal in a herd reads a different frame. One draw call per animal type. Zero CPU. Per-instance clip, speed, and phase — all in the GPU.

+

The shader ships as the default material on MultiSkinnedInstance3D. It includes an albedo_tex uniform — the caller sets it from the source mesh's material so herds texture out of the box. No ShaderMaterial assembly required unless you want custom shading.

+

The numbers

+

Measured on an M1 Pro MacBook Pro (integrated GPU):

+

| Agent count | FPS |

+

|————|—–|

+

| 100 | 60 |

+

| 500 | 60 |

+

| 1,000 | 60 |

+

| 10,000 | 8 (with CPU-side culling, pre-optimization) |

+

VRAM: 1.6 MB per animal type. 30 types = 48 MB total. A Steam Deck with 1 GB shared memory fits the entire roster.

+

Draw calls: One per animal type. 30 types = 30 draw calls for every animated animal on screen. Future colonists share the same architecture — one draw call per colonist look.

What's driving it

-

In Ariki, the sim tracks animal migration across a 12km archipelago. AnimalHerdRenderer.cs groups sim ViewerState.animals by type, feeds positions to skinned_herd.gd (a reusable per-type herd backend), which drives the renderer. One AnimationPlayer animates a single driver skeleton; poses propagate to every instance.

-

The crocodile herd scene was 25 instances, one draw call. The perf test scene does 1,000 animals across 12 types — Boar, Cow, Crab, Crocodile, Deer, Fish, Goat, Hen, Pig, Rabbit, Sheep, Tiger — each type its own GPU herd, all mixed, all random-walking, FPS holding steady.

+

In Ariki, the sim tracks animal migration across a 12km archipelago. AnimalHerdRenderer.cs groups sim ViewerState.animals by type, feeds world positions and yaw rotations to skinned_herd.gd — the reusable per-type herd backend. The herd bakes the palette once at setup, then set_positions() updates transforms each sim tick. set_clip_for_state() switches the active clip block in the custom data when the sim FSM changes state. set_speed_scale() adjusts the per-instance playback rate to match ground speed — feet stay planted.

+

The sim owns all behavior — 30 data-driven animals with per-animal senses, diet, combat stats, and FSM states (graze, drink, sleep, hunt, flee, scavenge, die). The client just renders. This is the same code in single-player and multiplayer — the sim is the host.

+

Bird flocks use the same system. BirdFlock.cs runs boid flocking on top of skinned_herd, sharing the palette with synchronized phases (airborne flapping in unison is intentional). 25 bird species migrated from the Low Poly Bird Ultimate Pack, each a single draw call.

+

Per-instance custom data means a walking Boar, a running Boar, an idle Boar, and an attacking Boar all share the same baked palette — they just point at different rows. The renderer groups by type, not by state. One palette, one draw call, any number of states.

+

Two bugs we shipped and fixed

+

The module had data-plane doctests from day one — round-trip pose get/set, dirty tracking, size clamping, AABB, column-major layout. All green. Then we put it on screen and two things were wrong.

+

Bug 1: Shader compile failure. The default skinning shader compared TANGENT as vec4. Godot 4 exposes it as vec3. Fixed in one line, added albedo_tex uniform so herds texture out of the box.

+

Bug 2: Bone matrices stored transposed. The initial data plane wrote basis rows (standard Godot Transform3D.basis is row-major), but the shader reads mat4(c0,c1,c2,c3) as columns. Every bone matrix was transposed — the mesh crumpled. Not a scale bug, not an orientation bug — a layout mismatch. Fixed by storing column-major, with a doctest to prevent regression.

+

The lesson: doctests catch logic. Rendering catches truth. You need both.

+

The engine change

+

The module is 40 lines of shader code and ~500 lines of C++ in the engine's modules/agent_skinned/. The critical detail is in the shader: the bone-matrix texture is indexed by a pose slot computed from INSTANCE_CUSTOM, not by INSTANCE_ID. This is what decouples the palette from the instance count — the texture stores animation frames, the MultiMesh stores instance transforms, and the shader bridges them.

+

Engine version: 4.6.5.

+

No C# wrapper is generated — instantiate from GDScript via ClassDB.instantiate() and call the bound methods. The binding surface is small and stable. See ariki-game/scenes/animals/skinned_herd.gd for the reference backend.

+

The production pipeline

+

The migrate_animals.py tool converts polyperfect FBX packs to game-ready GLBs — imports, cleans hierarchy, rebuilds named NLA clips from frame ranges, strips duplicate meshes, bakes into the flat assets/models/glbs/ directory. Each animal gets a catalog entry in animals_catalog.json with clip metadata, default state mapping, and an animSpeedRef for foot-sync.

+

At runtime, AnimalHerdRenderer spawns one skinned_herd per animal type. The herd bakes the palette from the catalog GLB's clips. AnimalAnimationLogic maps sim FSM states to clip keywords (attack → "attack"/"bite", flee → "run"/"gallop", wander → "walk"). The renderer lerps positions between sim ticks for smooth motion and writes per-instance custom data each frame. Zero per-frame CPU on the animation path.

+

Where we stand vs the industry

+

The bone-matrix palette technique is the same architecture used by Assassin's Creed Unity, Total War: Warhammer, and Hitman for their crowd systems. We're using the same core idea, in a Godot fork, with smaller VRAM — our low-poly animals keep textures tiny.

+

The platform supports three tiers by distance, all driven by the same (clip, count, speed, phase) packet:

+
    +
  • Crowd tier (palette) — baked poses, GPU-driven, zero CPU. Thousands of agents.
  • +
  • Hero tier (real rigs)AnimationTree + SkeletonIK3D + PhysicalBone3D for the nearest few. Smooth gait blends, foot-lock, look-at, ragdoll.
  • +
  • Impostor tier (2D billboards) — sprite atlas indexed by view-angle and animation-frame. For very far agents.
  • +
+

One abstraction, three detail levels. The same code that drives 30 animals today will drive thousands of colonists at launch.

What's deliberately not here

  • No C# wrapper. Instantiate from GDScript via ClassDB.instantiate() — the binding surface is small and stable.
  • -
  • No automatic AnimationPlayer integration. You drive poses. We give you the texture. Freedom to animate however you want.
  • -
  • No GPU occlusion or LOD. That's the game's job. The engine provides the tool; the game decides what to draw.
  • +
  • No automatic AnimationPlayer integration. You drive poses at bake time. We give you the texture. Freedom to animate however you want.
  • +
  • No GPU occlusion culling. That's the game's job. The engine provides the tool; the game decides what to draw.

Get the build

-

Pre-built editor binaries with agent_skinned baked in — no engine compile required. The game's animal_perf_test.tscn lets you toggle 10 / 100 / 1000 animals and read live FPS:

-

| Platform | Binary | Engine commit |

-

|———-|——–|—————|

-

| macOS ARM64 | tinqs.macos.editor.arm64.mono | 4fe1323 (4.6.4, Xcode 26.3) |

-

| Windows x64 | tinqs.windows.editor.x86_64.mono.exe | 64fb5cc (4.6.4, MSVC 2022) |

-

All builds live in the public tinqs/builds repo — engine source is private, but the binaries are yours. See manifest.json for checksums and build details.

+

Pre-built editor binaries with agent_skinned and the GPU-driven palette baked in — no engine compile required. The game's animal_perf_test.tscn lets you spawn 10/100/1,000/10,000 animals and read live FPS:

+

| Platform | Binary |

+

|———-|——–|

+

| macOS ARM64 | tinqs.macos.editor.arm64.mono |

+

| Windows x64 | tinqs.windows.editor.x86_64.mono.exe |

+

All builds at tinqs/builds — engine source is private, but the binaries are yours. See manifest.json for checksums and build details.

The engine source lives in tinqs/engine (private). Module docs: modules/agent_skinned/README.md and .agents/wiki/agent-skinned-gpu-herd.md.


Related: Fork, Don't Build — why we modify existing platforms instead of building new ones. Streaming a 12km Archipelago in Godot 4 — the terrain and vegetation streaming layers that work alongside this.

diff --git a/index.html b/index.html index 004afa0..71a5fdc 100644 --- a/index.html +++ b/index.html @@ -187,17 +187,10 @@ Read → - - 15 June 2026 -

Zero-CPU Crowd Animation: 1,000 Animals, One Draw Call, No Skeletons

-

Our crowd renderer bakes every animation frame into a bone-matrix palette once, then the GPU drives every instance itself — 1,000 animals at 60 FPS, each with its own clip and phase. This is how AAA does crowds. Now it runs in our Godot fork.

- Read → -
- - 14 June 2026 + 15 June 2026

GPU-Skinned Herds: One Draw Call for 1,000 Animated Characters in Godot

-

Godot can't batch-render 1,000 animated characters. We built a GPU skinned-instance herd renderer into the engine itself — already driving crocodile herds in Ariki. Pre-built editor binaries for macOS and Windows.

+

Godot can't batch-render 1,000 animated characters. We built a GPU-driven crowd renderer into the engine itself — bake every animation frame into a texture once, let the GPU drive every instance. 1,000 animals, 60 FPS, zero skeletons. Pre-built editor binaries.

Read →
diff --git a/live-ozan-radio.html b/live-ozan-radio.html index 5624543..78fa7f7 100644 --- a/live-ozan-radio.html +++ b/live-ozan-radio.html @@ -277,58 +277,39 @@ ← All Posts

Live Ozan Radio: A Personal AI Station in Cursor

-

I do not want a playlist. I want a station — something that feels like late-night desert dub and Anadolu psych drifting out of a speaker, but every track is composed fresh, never pulled from Spotify or Apple Music. So we built Live Ozan Radio: DeepSeek as the on-air DJ, Google Lyria 3 as the music engine, and our own Gitea instance as the host. - -!Live Ozan Radio in Cursor — player dashboard, saved songs, and DJ chat beside the editor - -That screenshot is how I actually use it: Cursor on the left with taste notes and vocal cues, the local player on :8787 on the right, saved songs in a scrollable library, and a chat box to steer the next generation. It is dogfooding in the truest sense — we run our game studio on the same Gitea fork we sell as Tinqs Studio, and the radio lives in that repo too. - -## No catalog, ever - -The rule is simple: nothing pre-recorded from the outside world. Every MP3 is generated by Lyria from a prompt that DeepSeek writes using settings.json (taste profile) and taste_seeds.json (built from Spotify screenshots in Cursor — no Spotify API). Shuffle mode mixes saved tracks with new compositions until the daily cap hits. - -The stack: - -| Layer | What it does | -|——-|—————-| -| DeepSeek | DJ brain — mood, Lyria prompts, chat | -| Lyria 3 Pro | Full songs (~1–2 min), vocals optional | -| FastAPI player | Stream, library, cost dashboard, vocal booth | -| Git LFS | Committed songs + rich metadata | - -## Curating every generation like a real DJ - -Raw MP3s are not enough. Each track gets an extensive *.meta.json: rating (love / keeper / skip), what I loved, what I hated, clone_prompt_hints for successors, BPM, skip-intro timestamps, and instrument tags. A library_index.json rolls that up so the DJ knows, for example, that Sahara's Saz is the gold standard (saz + ney + sub bass by twenty seconds) and that fuzz electric guitar on vocal tracks is a hard avoid. - -That metadata feeds back into every plan_next call. When I said Caravan of the Night was "not bad" but I hated the electric guitar bit, that went into disliked and avoid_in_successors — the next griot-vocal generation should keep the Sahel chants and drop the Tinariwen-style guitar. - -## Public auto-play on tinqs.com - -The full dashboard (python -m ozan_radio servehttp://127.0.0.1:8787/player) needs the API for Lyria compose, settings, and chat. But listening-only is static: gateway/index.html in the repo embeds the playlist and auto-plays from Git LFS media URLs when you open it on Git Studio: - -https://tinqs.com/tinqs/live-radio/src/branch/main/gateway/index.html - -Commit, push, share the link — no server required for listeners. New tracks run export-web (or auto-export on save) to refresh the embedded playlist. - -## Vocal booth in the browser - -For instrumentals where I want my own layer (deep chest / throat-sync over Nomad's Saz when the saz kicks in at ~0:30), the player includes a Chrome vocal booth: cue sheet with timestamps, skip-to-saz, MediaRecorder for takes, download as WebM. Lyria cannot remix an existing MP3 — but I can record over it locally while the track plays in headphones. - -## Lyria settings in the web UI - -The player settings panel talks to /api/lyria and adapts from the real Gemini API: model (Pro vs 30s Clip), vocal mode (instrumental / mix / full vocals), lyric language, singer profile, WAV vs MP3. Changes persist to settings.json and shape both the Lyria suffix and the DJ system prompt. - -## Repo - -Open source on our Gitea: https://tinqs.com/tinqs/live-radio - -Clone, add your taste via Cursor + screenshots, run the server, or just hit the gateway link and let it shuffle. If you are building on Tinqs Studio, this is the kind of small, weird, personal tool that belongs in the same forge as your docs and your game — not on someone else's CDN. - -— - -Inspired by Google's Magenta RealTime 2 segment on AI Search — we wanted the Lyria full-song path first; optional live MRT2 layer on Apple Silicon is on the roadmap.

+

I do not want a playlist. I want a station — something that feels like late-night desert dub and Anadolu psych drifting out of a speaker, but every track is composed fresh, never pulled from Spotify or Apple Music. So we built Live Ozan Radio: DeepSeek as the on-air DJ, Google Lyria 3 as the music engine, and our own Gitea instance as the host.

+
+ Live Ozan Radio in Cursor — player dashboard, saved songs, and DJ chat beside the editor +
Live Ozan Radio in Cursor — player dashboard, saved songs, and DJ chat beside the editor
+
+

That screenshot is how I actually use it: Cursor on the left with taste notes and vocal cues, the local player on :8787 on the right, saved songs in a scrollable library, and a chat box to steer the next generation. It is dogfooding in the truest sense — we run our game studio on the same Gitea fork we sell as Tinqs Studio, and the radio lives in that repo too.

+

No catalog, ever

+

The rule is simple: nothing pre-recorded from the outside world. Every MP3 is generated by Lyria from a prompt that DeepSeek writes using settings.json (taste profile) and taste_seeds.json (built from Spotify screenshots in Cursor — no Spotify API). Shuffle mode mixes saved tracks with new compositions until the daily cap hits.

+

The stack:

+

| Layer | What it does |

+

|——-|—————-|

+

| DeepSeek | DJ brain — mood, Lyria prompts, chat |

+

| Lyria 3 Pro | Full songs (~1–2 min), vocals optional |

+

| FastAPI player | Stream, library, cost dashboard, vocal booth |

+

| Git LFS | Committed songs + rich metadata |

+

Curating every generation like a real DJ

+

Raw MP3s are not enough. Each track gets an extensive *.meta.json: rating (love / keeper / skip), what I loved, what I hated, clone_prompt_hints for successors, BPM, skip-intro timestamps, and instrument tags. A library_index.json rolls that up so the DJ knows, for example, that Sahara's Saz is the gold standard (saz + ney + sub bass by twenty seconds) and that fuzz electric guitar on vocal tracks is a hard avoid.

+

That metadata feeds back into every plan_next call. When I said Caravan of the Night was "not bad" but I hated the electric guitar bit, that went into disliked and avoid_in_successors — the next griot-vocal generation should keep the Sahel chants and drop the Tinariwen-style guitar.

+

Public auto-play on tinqs.com

+

The full dashboard (python -m ozan_radio servehttp://127.0.0.1:8787/player) needs the API for Lyria compose, settings, and chat. But listening-only is static: gateway/index.html in the repo embeds the playlist and auto-plays from Git LFS media URLs when you open it on Git Studio:

+

https://tinqs.com/tinqs/live-radio/src/branch/main/gateway/index.html

+

Commit, push, share the link — no server required for listeners. New tracks run export-web (or auto-export on save) to refresh the embedded playlist.

+

Vocal booth in the browser

+

For instrumentals where I want my own layer (deep chest / throat-sync over Nomad's Saz when the saz kicks in at ~0:30), the player includes a Chrome vocal booth: cue sheet with timestamps, skip-to-saz, MediaRecorder for takes, download as WebM. Lyria cannot remix an existing MP3 — but I can record over it locally while the track plays in headphones.

+

Lyria settings in the web UI

+

The player settings panel talks to /api/lyria and adapts from the real Gemini API: model (Pro vs 30s Clip), vocal mode (instrumental / mix / full vocals), lyric language, singer profile, WAV vs MP3. Changes persist to settings.json and shape both the Lyria suffix and the DJ system prompt.

+

Repo

+

Open source on our Gitea: https://tinqs.com/tinqs/live-radio

+

Clone, add your taste via Cursor + screenshots, run the server, or just hit the gateway link and let it shuffle. If you are building on Tinqs Studio, this is the kind of small, weird, personal tool that belongs in the same forge as your docs and your game — not on someone else's CDN.

+
+

Inspired by Google's Magenta RealTime 2 segment on AI Search — we wanted the Lyria full-song path first; optional live MRT2 layer on Apple Silicon is on the roadmap.

diff --git a/posts/gpu-driven-crowd-animation.md b/posts/gpu-driven-crowd-animation.md deleted file mode 100644 index 476aafc..0000000 --- a/posts/gpu-driven-crowd-animation.md +++ /dev/null @@ -1,129 +0,0 @@ ---- -title: "Zero-CPU Crowd Animation: 1,000 Animals, One Draw Call, No Skeletons" -slug: gpu-driven-crowd-animation -date: "2026-06-15" -description: "We built a GPU-driven crowd animation platform into Tinqs Engine that renders 1,000 animated animals at 60 FPS with zero per-frame CPU cost. Each agent plays its own clip, speed, and phase — no live skeletons, no lockstep, no compromises." -og_description: "1,000 animated agents, zero live skeletons, zero per-frame CPU. A GPU-driven crowd animation platform in the Tinqs Engine fork of Godot." -og_image: "https://www.tinqs.com/img/og-cover.jpg" -excerpt: "Our crowd renderer bakes every animation frame into a bone-matrix palette once, then the GPU drives every instance itself — 1,000 animals at 60 FPS, each with its own clip and phase. This is how AAA does crowds. Now it runs in our Godot fork." -author: "Ozan Bozkurt" -author_initials: "OB" -author_role: "CTO & Developer, Tinqs" ---- -Godot gives you one `Skeleton3D` per character. Want 200 animated animals? That's 200 skeleton nodes, 200 draw calls, and 200 `AnimationPlayer` ticks every frame. Want 1,000? You're measuring in seconds per frame. - -We built a GPU-driven crowd animation platform into Tinqs Engine that doesn't use skeletons at all. It bakes every animation frame into a bone-matrix palette texture once, and the GPU drives every instance's playback from then on. 1,000 animals at 60 FPS on integrated graphics. Each plays its own clip at its own speed and phase. Zero per-frame CPU cost. This is how AAA engines do crowds — and now it runs in our Godot fork. - -## Why not skeletons? - -The standard approach — one skeleton per character, one `AnimationPlayer`, one draw call — breaks at crowd scale. Computing `global_pose` for 1,000 skeletons at 60 bones each is 60,000 matrix multiplications per frame on the main thread. Each is its own draw call. Each `AnimationPlayer` ticks independently. No CPU can keep up. - -Vertex animation textures (VAT) can solve this — bake every vertex position into a texture and sample it in the shader. But that stores **vertices × frames**, not bones × frames. A 2,500-vertex animal with 500 animation frames needs 14 MB of VAT data. For 30 animal types: 426 MB. That doesn't fit on a Steam Deck. And VAT can't blend frames for smooth playback, can't skin normals for correct lighting, and locks you into one animation per bake. - -Our answer: **bone-matrix palette.** Bake every bone pose into a texture, keep the skinning in the shader. The GPU samples the bone matrices and skins the mesh itself — same 4-bone linear blend as a real skeleton, same correct normals and tangents. But the CPU never touches a bone. - -## How it works - -At load time, we play every animation clip on a temporary skeleton and record the bone matrices for every frame into a single texture. A Goat with 9 clips at 30 fps produces 496 frames: - -``` -Texture: 212 × 496 pixels, RGBA32F -VRAM: 212 × 496 × 16 bytes = 1.6 MB -``` - -That's every frame of every clip — walk, run, idle, attack, death, eat, sleep — in 1.6 MB. Across 30 animal types: 48 MB total. Compare to VAT at 426 MB. Bone-matrix is 9× smaller because bones ≪ vertices. - -After the bake, the skeleton is destroyed. It never runs again. - -Each MultiMesh instance gets 4 numbers packed into `INSTANCE_CUSTOM`: - -``` -.x = which clip (start row in the palette) -.y = how many frames in this clip -.z = playback rate (baked-fps × ground speed — foot-sync) -.w = phase offset (golden-ratio spread — no two adjacent animals share the same frame) -``` - -The vertex shader computes each instance's current frame from TIME: - -```glsl -float fpos = mod(TIME * INSTANCE_CUSTOM.z + INSTANCE_CUSTOM.w * INSTANCE_CUSTOM.y, - INSTANCE_CUSTOM.y); -int f0 = int(fpos); -int f1 = int(mod(float(f0) + 1.0, INSTANCE_CUSTOM.y)); -float fr = fpos - float(f0); - -// Blend between two adjacent frames for smooth playback -int r0 = int(INSTANCE_CUSTOM.x + 0.5) + f0; -int r1 = int(INSTANCE_CUSTOM.x + 0.5) + f1; - -// For each bone, reconstruct mat4 from 4 texels, blend, weight by skin influence -mat4 m0 = mat4(texelFetch(tex, ivec2(b*4+0, r0), 0), /* ... 3 more columns */); -mat4 m1 = mat4(texelFetch(tex, ivec2(b*4+1, r1), 0), /* ... */); -skin += (m0 * (1.0 - fr) + m1 * fr) * weight; -``` - -The blend between two adjacent frames means we can bake at a low fps and stay smooth — the shader interpolates. The golden-ratio phase spread means every animal in a herd reads a different frame. One draw call per animal type. Zero CPU. Per-instance clip, speed, and phase — all in the GPU. - -## The numbers - -Measured on an M1 Pro MacBook Pro (integrated GPU), not a desktop gaming rig: - -| Agent count | FPS | -|------------|-----| -| 100 | **60** | -| 500 | **60** | -| 1,000 | **60** | -| 10,000 | 8 (with CPU-side culling, pre-optimization) | - -**VRAM:** 1.6 MB per animal type. 30 types = 48 MB total. A Steam Deck with 1 GB shared memory fits the entire roster with room for colonists, terrain, vegetation, and UI. - -**Draw calls:** One per animal type. 30 types = 30 draw calls for every animated animal on screen. Add colonists, same deal — one draw call per colonist look. - -## The engine change - -The module lives in `modules/agent_skinned/` inside Tinqs Engine — our fork of Godot 4.6. The core is two classes: - -**`MultiSkinnedMeshInstance3D`** — the data plane. Holds the bone-matrix palette. API: `set_max_bones()`, `set_max_instances()`, `set_instance_pose_bones()`. At bake time, we fill one row per animation frame. At render time, it sits idle — the texture is static. - -**`MultiSkinnedInstance3D`** — the renderer. A `MultiMeshInstance3D` subclass. Points its multimesh at the skinned mesh and its `data_source_path` at the data plane. `refresh()` uploads the bone texture into the shader's uniform once. The MultiMesh handles instance transforms. The shader handles the rest. - -The shader uses `INSTANCE_CUSTOM` to pick the palette row — not `INSTANCE_ID`. This is the key: the texture's rows are baked animation frames, not per-instance slots. Many instances share the same rows (a synchronized airborne flock) or each pick their own (a varied herd). One abstraction, two behaviors. - -The engine change is 40 lines of shader code in `multi_skinned_instance_3d.cpp`. Engine version: **4.6.5.** - -## The production pipeline - -In Ariki, `AnimalHerdRenderer.cs` groups sim `ViewerState.animals` by type, feeds world positions and yaw rotations to `skinned_herd.gd` — the reusable per-type herd backend. The herd bakes the palette once at setup, then `set_positions()` updates transforms each sim tick. `set_clip_for_state()` switches the active clip block in the custom data when the sim FSM changes state (idle → walk → flee → attack). `set_speed_scale()` adjusts the per-instance playback rate to match ground speed — feet stay planted. - -Bird flocks use the same system. `BirdFlock.cs` runs boid flocking on top of `skinned_herd`, sharing the palette with synchronized phases (airborne flapping in unison is intentional). 25 bird species migrated from the Low Poly Bird Ultimate Pack, each a single draw call. - -The sim owns all behavior — 30 data-driven animals with per-animal senses, diet, combat stats, and FSM states. The client just renders. The same system will drive thousands of colonists at launch. - -## Where we stand vs the industry - -The bone-matrix palette technique is the same architecture used by Assassin's Creed Unity, Total War: Warhammer, and Hitman for their crowd systems. We're using the same core idea, in a Godot fork, with smaller VRAM (our low-poly animals keep textures tiny). - -The platform supports three tiers by distance: -- **Crowd tier (palette)** — baked poses, GPU-driven, zero CPU. Thousands of agents. -- **Hero tier (real rigs)** — `AnimationTree` + `SkeletonIK3D` + `PhysicalBone3D` for the nearest few. Smooth gait blends, foot-lock, look-at, ragdoll. -- **Impostor tier (2D billboards)** — sprite atlas indexed by view-angle and animation-frame, driven by the same `(clip, frame, speed, phase)` packet. For very far agents. - -The same abstraction — `(clip, count, speed, phase)` — drives every tier. One packet, three detail levels. - -## Get the build - -Pre-built editor binaries with `agent_skinned` and the GPU-driven palette baked in: - -| Platform | Binary | -|----------|--------| -| **macOS ARM64** | [`tinqs.macos.editor.arm64.mono`](https://tinqs.com/tinqs/builds/media/branch/main/engine/macos-arm64/tinqs.macos.editor.arm64.mono) | -| **Windows x64** | [`tinqs.windows.editor.x86_64.mono.exe`](https://tinqs.com/tinqs/builds/media/branch/main/engine/windows-x64/tinqs.windows.editor.x86_64.mono.exe) | - -All builds at [`tinqs/builds`](https://tinqs.com/tinqs/builds). Engine source at [`tinqs/engine`](https://tinqs.com/tinqs/engine) (private). - -The game's `animal_perf_test.tscn` spawns 10/100/1,000/10,000 animals and reports live FPS. The `animal_viewer.tscn` lets you inspect any animal type, toggle clips, and switch between single and herd mode. - ---- - -**Related:** [GPU-Skinned Herds](gpu-skinned-herds) — the original `agent_skinned` module design. [Fork, Don't Build](fork-dont-build) — why we modify existing platforms. [Streaming a 12km Archipelago in Godot 4](godot-optimisation) — the terrain and vegetation layers. diff --git a/posts/gpu-skinned-herds.md b/posts/gpu-skinned-herds.md index e86100f..143f23a 100644 --- a/posts/gpu-skinned-herds.md +++ b/posts/gpu-skinned-herds.md @@ -1,109 +1,185 @@ --- title: "GPU-Skinned Herds: One Draw Call for 1,000 Animated Characters in Godot" slug: gpu-skinned-herds -date: "2026-06-14" -description: "Godot has no built-in way to render 1,000 skinned characters in one draw call. We built a GPU skinned-instance renderer into Tinqs Engine that does — 25 crocodiles verified, 1,000+ projected. Pre-built binaries for macOS and Windows." -og_description: "One draw call, 1,000 animated characters. GPU-skinned herd renderer built into the Tinqs Engine fork of Godot." +date: "2026-06-15" +description: "Godot has no built-in way to render 1,000 skinned characters in one draw call. We built a GPU-driven crowd animation platform into Tinqs Engine that does — 1,000 animals at 60 FPS, each with its own clip and phase, zero per-frame CPU. Pre-built binaries for macOS and Windows." +og_description: "One draw call, 1,000 animated characters, zero CPU. GPU-driven crowd animation platform built into the Tinqs Engine fork of Godot." og_image: "https://www.tinqs.com/img/og-cover.jpg" -excerpt: "Godot can't batch-render 1,000 animated characters. We built a GPU skinned-instance herd renderer into the engine itself — already driving crocodile herds in Ariki. Pre-built editor binaries for macOS and Windows." +excerpt: "Godot can't batch-render 1,000 animated characters. We built a GPU-driven crowd renderer into the engine itself — bake every animation frame into a texture once, let the GPU drive every instance. 1,000 animals, 60 FPS, zero skeletons. Pre-built editor binaries." author: "Ozan Bozkurt" author_initials: "OB" author_role: "CTO & Developer, Tinqs" --- -Godot gives you one `Skeleton3D` per character. Want 200 animals in a herd? That's 200 skeleton nodes, 200 draw calls, and 200 `AnimationPlayer` ticks every frame. Want 1,000? Now you're measuring in seconds per frame, not frames per second. +Godot gives you one `Skeleton3D` per character. Want 200 animated animals? That's 200 skeleton nodes, 200 draw calls, and 200 `AnimationPlayer` ticks every frame. Want 1,000? You're measuring in seconds per frame. -We built a GPU skinned-instance renderer into Tinqs Engine that packs every pose into a single texture, uploads once, and draws every instance in one call. 25 crocodiles confirmed first. Then we threw 1,000 animals — 12 types mixed, random-walking — at it and the GPU didn't flinch. Same bone count, same animation fidelity, a tiny fraction of the cost. +We built a GPU-driven crowd animation platform into Tinqs Engine that doesn't use skeletons at all. It bakes every animation frame into a bone-matrix palette texture once, and the GPU drives every instance's playback from then on. 1,000 animals at 60 FPS on integrated graphics. Each plays its own clip at its own speed and phase. Zero per-frame CPU cost. This is how AAA engines do crowds — and now it runs in our Godot fork. ## Why the engine needs to change The standard Godot approach — one `Skeleton3D` + one `MeshInstance3D` per character — works for a handful of animated entities. It breaks down hard at crowd scale: -- **CPU bone transforms.** Computing `global_pose` for 200 skeletons × 100 bones each = 20,000 matrix multiplies per frame, all on the main thread. +- **CPU bone transforms.** Computing `global_pose` for 1,000 skeletons × 60 bones each = 60,000 matrix multiplications per frame, all on the main thread. - **Draw call explosion.** Each `MeshInstance3D` is its own draw call. Even with MultiMesh, there's no built-in path for skinned meshes — `MultiMeshInstance3D` only handles static geometry. - **AnimationPlayer sprawl.** Each skeleton needs its own `AnimationPlayer` and its own `process()` tick. -The alternative — baking animations to vertex textures — works for static crowds but locks you out of per-instance variation. No blending, no phase offsets, no reactive behaviour. +Vertex animation textures (VAT) can solve this — bake every vertex position into a texture and sample it in the shader. But that stores **vertices × frames**, not bones × frames. A 2,500-vertex animal with 500 animation frames needs 14 MB of VAT data. For 30 animal types: 426 MB. That doesn't fit on a Steam Deck. And VAT can't blend frames for smooth playback, can't skin normals for correct lighting, and locks you into one animation per bake. -What we need is simpler: **share the skeleton, drive per-instance poses from a single animation, batch the draw call.** That's what `agent_skinned` does. +Our answer: **bone-matrix palette.** Bake every bone pose into a texture, keep the skinning in the shader. The GPU samples the bone matrices and skins the mesh itself — same 4-bone linear blend as a real skeleton, same correct normals and tangents. But the CPU never touches a bone. ## How it works: two classes, one texture -The module lives in `modules/agent_skinned/` inside [Tinqs Engine](https://tinqs.com/tinqs/engine). Two classes, one job: +The module lives in `modules/agent_skinned/` inside [Tinqs Engine](https://tinqs.com/tinqs/engine). Two classes, one job. ### `MultiSkinnedMeshInstance3D` — the data plane -Holds the CPU-side bone matrices. Allocates an `ImageTexture` of size `[4 × max_bones, max_instances]` in RGBA32F — each texel is one column of a 4×4 bone matrix. For a 130-bone crocodile with 256 instances: +Holds the bone-matrix palette. Allocates an `ImageTexture` of size `[4 × max_bones, total_frames]` in RGBA32F — each texel is one column of a 4×4 bone matrix, each row is one baked animation frame. At load time, we play every animation clip on a temporary skeleton and record the bone matrices for every frame: ``` -Texture: 520 × 256 RGBA32F ≈ 2 MB +Goat: 53 bones × 9 clips × 496 frames +Texture: 212 × 496 pixels, RGBA32F +VRAM: 212 × 496 × 16 bytes = 1.6 MB ``` -That's the entire pose state for 256 animated crocodiles in a single GPU texture. The API is simple: +That's every frame of every clip — walk, run, idle, attack, death, eat, sleep — in 1.6 MB. Across 30 animal types: **48 MB total.** Compare to VAT at 426 MB. Bone-matrix is 9× smaller because bones ≪ vertices. + +After the bake, the skeleton is destroyed. It never runs again. The API is straightforward: ```gdscript var data := MultiSkinnedMeshInstance3D.new() -data.set_mesh(crocodile_mesh) -data.set_skeleton(skeleton) # rest pose + bone hierarchy -data.set_max_instances(256) -data.set_max_bones(130) +data.set_max_bones(53) +data.set_max_instances(496) # palette rows = baked frames -# Each frame: push poses from the animated skeleton -for instance in herd_positions: - data.set_instance_pose_bones(instance.id, bone_transforms) -data.update() # upload only dirty instances, not the whole texture +# Bake: play each clip, seek to each frame, record bone matrices +for clip in clips: + for frame in clip.frames: + skeleton.seek(frame.time) + data.set_instance_pose_bones(row, bone_transforms) ``` +The data plane stores matrices column-major — 4 texels per bone = 4 columns of a 4×4 transform. The getter matches the layout, and a doctest asserts it so a transpose can't silently regress. + ### `MultiSkinnedInstance3D` — the renderer -A `MultiMeshInstance3D` subclass. Set its multimesh with the skinned mesh and instance transforms, point it at the data plane, call `refresh()` — it uploads the bone texture into the shader material's `bone_matrices_tex` uniform and the mesh is drawn in one call. +A `MultiMeshInstance3D` subclass. Set its multimesh with the skinned mesh and instance transforms, point its `data_source_path` at the data plane. Call `refresh()` once — it uploads the bone texture into the shader material's `bone_matrices_tex` uniform. -The shader does 4-bone linear-blend skinning on the GPU: +Each MultiMesh instance carries 4 numbers in `INSTANCE_CUSTOM` (enable `multimesh.use_custom_data`): + +| Channel | Meaning | +|---------|---------| +| `.x` | Which clip (start row in the palette) | +| `.y` | How many frames in this clip | +| `.z` | Playback rate (baked-fps × ground speed — foot-sync) | +| `.w` | Phase offset (golden-ratio spread — no two adjacent animals share the same frame) | + +The vertex shader derives each instance's current frame from TIME: ```glsl -mat4 get_bone(int b) { - return mat4( - texelFetch(bone_matrices_tex, ivec2(b * 4 + 0, INSTANCE_ID), 0), - texelFetch(bone_matrices_tex, ivec2(b * 4 + 1, INSTANCE_ID), 0), - texelFetch(bone_matrices_tex, ivec2(b * 4 + 2, INSTANCE_ID), 0), - texelFetch(bone_matrices_tex, ivec2(b * 4 + 3, INSTANCE_ID), 0) - ); -} +float fpos = mod(TIME * INSTANCE_CUSTOM.z + INSTANCE_CUSTOM.w * INSTANCE_CUSTOM.y, + INSTANCE_CUSTOM.y); +int f0 = int(fpos); +int f1 = int(mod(float(f0) + 1.0, INSTANCE_CUSTOM.y)); +float fr = fpos - float(f0); + +// Blend between two adjacent frames for smooth playback at low bake fps +int r0 = int(INSTANCE_CUSTOM.x + 0.5) + f0; +int r1 = int(INSTANCE_CUSTOM.x + 0.5) + f1; + +// For each bone (up to 4 per vertex), reconstruct mat4 from 4 texels, blend, weight +mat4 m0 = mat4( + texelFetch(bone_matrices_tex, ivec2(b*4 + 0, r0), 0), + texelFetch(bone_matrices_tex, ivec2(b*4 + 1, r0), 0), + texelFetch(bone_matrices_tex, ivec2(b*4 + 2, r0), 0), + texelFetch(bone_matrices_tex, ivec2(b*4 + 3, r0), 0)); +mat4 m1 = mat4( /* same for r1 */ ); +skin += (m0 * (1.0 - fr) + m1 * fr) * weight; + +// Apply skin to vertex, normal, tangent +VERTEX = (skin * vec4(VERTEX, 1.0)).xyz; +NORMAL = normalize((skin * vec4(NORMAL, 0.0)).xyz); ``` -`INSTANCE_ID` is a Godot built-in — the GPU already knows which instance it's rendering. We just use it to index into the bone texture. No uniform arrays, no SSBOs, no compute shaders. Just a 2D texture and a custom vertex shader. +The shader uses `INSTANCE_CUSTOM` to pick the palette row — not `INSTANCE_ID`. This is the key: the texture's rows are baked animation frames, not per-instance slots. Many instances share the same rows (a synchronized airborne flock) or each pick their own (a varied herd). One abstraction, two behaviors. -## Two bugs we shipped and fixed +The blend between two adjacent frames means we can bake at a low fps and stay smooth — the shader interpolates. The golden-ratio phase spread means every animal in a herd reads a different frame. One draw call per animal type. Zero CPU. Per-instance clip, speed, and phase — all in the GPU. -The module had data-plane doctests from day one — round-trip pose get/set, dirty tracking, size clamping, AABB. All green. Then we put it on screen for the first time and the crocodiles looked... wrong. +The shader ships as the default material on `MultiSkinnedInstance3D`. It includes an `albedo_tex` uniform — the caller sets it from the source mesh's material so herds texture out of the box. No `ShaderMaterial` assembly required unless you want custom shading. -**Bug 1: Shader compile failure.** The default skinning shader compared `TANGENT` as `vec4`. Godot 4 exposes it as `vec3`. Fixed in one line, added `albedo_tex` uniform so herds texture out of the box. +## The numbers -**Bug 2: Bone matrices stored transposed.** The data plane wrote basis rows (standard Godot `Transform3D.basis` is row-major), but the shader unpacked as columns. Every bone matrix was transposed — the mesh crumpled. Not a scale bug, not an orientation bug — a layout mismatch. Fixed by storing column-major, with a doctest to prevent regression. +Measured on an M1 Pro MacBook Pro (integrated GPU): -The lesson: doctests catch logic. Rendering catches truth. You need both. +| Agent count | FPS | +|------------|-----| +| 100 | **60** | +| 500 | **60** | +| 1,000 | **60** | +| 10,000 | 8 (with CPU-side culling, pre-optimization) | + +**VRAM:** 1.6 MB per animal type. 30 types = 48 MB total. A Steam Deck with 1 GB shared memory fits the entire roster. + +**Draw calls:** One per animal type. 30 types = 30 draw calls for every animated animal on screen. Future colonists share the same architecture — one draw call per colonist look. ## What's driving it -In [Ariki](https://www.arikigame.com), the sim tracks animal migration across a 12km archipelago. `AnimalHerdRenderer.cs` groups sim `ViewerState.animals` by type, feeds positions to `skinned_herd.gd` (a reusable per-type herd backend), which drives the renderer. One `AnimationPlayer` animates a single driver skeleton; poses propagate to every instance. +In [Ariki](https://www.arikigame.com), the sim tracks animal migration across a 12km archipelago. `AnimalHerdRenderer.cs` groups sim `ViewerState.animals` by type, feeds world positions and yaw rotations to `skinned_herd.gd` — the reusable per-type herd backend. The herd bakes the palette once at setup, then `set_positions()` updates transforms each sim tick. `set_clip_for_state()` switches the active clip block in the custom data when the sim FSM changes state. `set_speed_scale()` adjusts the per-instance playback rate to match ground speed — feet stay planted. -The crocodile herd scene was 25 instances, one draw call. The perf test scene does 1,000 animals across 12 types — Boar, Cow, Crab, Crocodile, Deer, Fish, Goat, Hen, Pig, Rabbit, Sheep, Tiger — each type its own GPU herd, all mixed, all random-walking, FPS holding steady. +The sim owns all behavior — 30 data-driven animals with per-animal senses, diet, combat stats, and FSM states (graze, drink, sleep, hunt, flee, scavenge, die). The client just renders. This is the same code in single-player and multiplayer — the sim is the host. + +Bird flocks use the same system. `BirdFlock.cs` runs boid flocking on top of `skinned_herd`, sharing the palette with synchronized phases (airborne flapping in unison is intentional). 25 bird species migrated from the Low Poly Bird Ultimate Pack, each a single draw call. + +Per-instance custom data means a walking Boar, a running Boar, an idle Boar, and an attacking Boar all share the same baked palette — they just point at different rows. The renderer groups by type, not by state. One palette, one draw call, any number of states. + +## Two bugs we shipped and fixed + +The module had data-plane doctests from day one — round-trip pose get/set, dirty tracking, size clamping, AABB, column-major layout. All green. Then we put it on screen and two things were wrong. + +**Bug 1: Shader compile failure.** The default skinning shader compared `TANGENT` as `vec4`. Godot 4 exposes it as `vec3`. Fixed in one line, added `albedo_tex` uniform so herds texture out of the box. + +**Bug 2: Bone matrices stored transposed.** The initial data plane wrote basis rows (standard Godot `Transform3D.basis` is row-major), but the shader reads `mat4(c0,c1,c2,c3)` as columns. Every bone matrix was transposed — the mesh crumpled. Not a scale bug, not an orientation bug — a layout mismatch. Fixed by storing column-major, with a doctest to prevent regression. + +The lesson: doctests catch logic. Rendering catches truth. You need both. + +## The engine change + +The module is 40 lines of shader code and ~500 lines of C++ in the engine's `modules/agent_skinned/`. The critical detail is in the shader: the bone-matrix texture is indexed by a **pose slot** computed from `INSTANCE_CUSTOM`, not by `INSTANCE_ID`. This is what decouples the palette from the instance count — the texture stores animation frames, the MultiMesh stores instance transforms, and the shader bridges them. + +Engine version: **4.6.5.** + +No C# wrapper is generated — instantiate from GDScript via `ClassDB.instantiate()` and call the bound methods. The binding surface is small and stable. See `ariki-game/scenes/animals/skinned_herd.gd` for the reference backend. + +## The production pipeline + +The `migrate_animals.py` tool converts polyperfect FBX packs to game-ready GLBs — imports, cleans hierarchy, rebuilds named NLA clips from frame ranges, strips duplicate meshes, bakes into the flat `assets/models/glbs/` directory. Each animal gets a catalog entry in `animals_catalog.json` with clip metadata, default state mapping, and an `animSpeedRef` for foot-sync. + +At runtime, `AnimalHerdRenderer` spawns one `skinned_herd` per animal type. The herd bakes the palette from the catalog GLB's clips. `AnimalAnimationLogic` maps sim FSM states to clip keywords (attack → "attack"/"bite", flee → "run"/"gallop", wander → "walk"). The renderer lerps positions between sim ticks for smooth motion and writes per-instance custom data each frame. Zero per-frame CPU on the animation path. + +## Where we stand vs the industry + +The bone-matrix palette technique is the same architecture used by Assassin's Creed Unity, Total War: Warhammer, and Hitman for their crowd systems. We're using the same core idea, in a Godot fork, with smaller VRAM — our low-poly animals keep textures tiny. + +The platform supports three tiers by distance, all driven by the same `(clip, count, speed, phase)` packet: +- **Crowd tier (palette)** — baked poses, GPU-driven, zero CPU. Thousands of agents. +- **Hero tier (real rigs)** — `AnimationTree` + `SkeletonIK3D` + `PhysicalBone3D` for the nearest few. Smooth gait blends, foot-lock, look-at, ragdoll. +- **Impostor tier (2D billboards)** — sprite atlas indexed by view-angle and animation-frame. For very far agents. + +One abstraction, three detail levels. The same code that drives 30 animals today will drive thousands of colonists at launch. ## What's deliberately not here - **No C# wrapper.** Instantiate from GDScript via `ClassDB.instantiate()` — the binding surface is small and stable. -- **No automatic `AnimationPlayer` integration.** You drive poses. We give you the texture. Freedom to animate however you want. -- **No GPU occlusion or LOD.** That's the game's job. The engine provides the tool; the game decides what to draw. +- **No automatic `AnimationPlayer` integration.** You drive poses at bake time. We give you the texture. Freedom to animate however you want. +- **No GPU occlusion culling.** That's the game's job. The engine provides the tool; the game decides what to draw. ## Get the build -Pre-built editor binaries with `agent_skinned` baked in — no engine compile required. The game's `animal_perf_test.tscn` lets you toggle 10 / 100 / 1000 animals and read live FPS: +Pre-built editor binaries with `agent_skinned` and the GPU-driven palette baked in — no engine compile required. The game's `animal_perf_test.tscn` lets you spawn 10/100/1,000/10,000 animals and read live FPS: -| Platform | Binary | Engine commit | -|----------|--------|---------------| -| **macOS ARM64** | [`tinqs.macos.editor.arm64.mono`](https://tinqs.com/tinqs/builds/media/branch/main/engine/macos-arm64/tinqs.macos.editor.arm64.mono) | `4fe1323` (4.6.4, Xcode 26.3) | -| **Windows x64** | [`tinqs.windows.editor.x86_64.mono.exe`](https://tinqs.com/tinqs/builds/media/branch/main/engine/windows-x64/tinqs.windows.editor.x86_64.mono.exe) | `64fb5cc` (4.6.4, MSVC 2022) | +| Platform | Binary | +|----------|--------| +| **macOS ARM64** | [`tinqs.macos.editor.arm64.mono`](https://tinqs.com/tinqs/builds/media/branch/main/engine/macos-arm64/tinqs.macos.editor.arm64.mono) | +| **Windows x64** | [`tinqs.windows.editor.x86_64.mono.exe`](https://tinqs.com/tinqs/builds/media/branch/main/engine/windows-x64/tinqs.windows.editor.x86_64.mono.exe) | -All builds live in the public [`tinqs/builds`](https://tinqs.com/tinqs/builds) repo — engine source is private, but the binaries are yours. See [`manifest.json`](https://tinqs.com/tinqs/builds/src/branch/main/manifest.json) for checksums and build details. +All builds at [`tinqs/builds`](https://tinqs.com/tinqs/builds) — engine source is private, but the binaries are yours. See [`manifest.json`](https://tinqs.com/tinqs/builds/src/branch/main/manifest.json) for checksums and build details. The engine source lives in [`tinqs/engine`](https://tinqs.com/tinqs/engine) (private). Module docs: `modules/agent_skinned/README.md` and `.agents/wiki/agent-skinned-gpu-herd.md`. diff --git a/pre-commit-agent.html b/pre-commit-agent.html index 3045305..f7f7645 100644 --- a/pre-commit-agent.html +++ b/pre-commit-agent.html @@ -277,45 +277,33 @@ ← All Posts

A Pre-Commit Agent That Guards Your Secrets for $0.001

-

Every small team has the same problem: too many things to remember before git commit. Don't leak API keys. Don't reference the classified AI codename in public posts. Don't link to GitHub repos we deleted six months ago. Don't push a blog post with a 90-character title. +

Every small team has the same problem: too many things to remember before git commit. Don't leak API keys. Don't reference the classified AI codename in public posts. Don't link to GitHub repos we deleted six months ago. Don't push a blog post with a 90-character title.

-A checklist in the README doesn't work. Humans skip checklists. Code review catches some issues but not all — reviewers focus on logic, not whether a URL points to a deleted org. - -We built a pre-commit hook with two layers: a regex blocklist that's instant and free, and an LLM review that costs $0.001. Together they catch everything. - -## Layer 1: Regex blocklist (0ms, $0.00) - -A text file of patterns, each tagged with scope and message: - -`` -public|\b\b|Classified codename — use the public-facing alias +
+

A checklist in the README doesn't work. Humans skip checklists. Code review catches some issues but not all — reviewers focus on logic, not whether a URL points to a deleted org.

+

We built a pre-commit hook with two layers: a regex blocklist that's instant and free, and an LLM review that costs $0.001. Together they catch everything.

+

Layer 1: Regex blocklist (0ms, $0.00)

+

A text file of patterns, each tagged with scope and message:

+
public|\b<internal-codename>\b|Classified codename — use the public-facing alias
 all|github\.com/(tinqs-ltd|tinqs)/|GitHub repos deleted — use tinqs.com
 all|sk-[a-zA-Z0-9]{20,}|Possible API key leaked
 all|AKIA[A-Z0-9]{16}|AWS access key leaked
-public|admin\.|Internal admin URL in public content
-`
-
-The scope field controls where patterns apply. all means every file. public means only public-facing content — blog posts, website, marketing pages. We want classified codenames in internal architecture docs. We just don't want them in blog posts.
-
-The blocklist runs grep against the staged diff. No network call, no API, no latency. Match found → commit blocked immediately with file path and explanation. This catches 80% of issues before the LLM wakes up.
-
-## Layer 2: DeepSeek V4 Flash review (~4s, $0.001)
-
-If the commit touches public-facing files, the hook sends the staged diff to DeepSeek V4 Flash. The system prompt tells it exactly what to check:
-
-- Leaked secrets — API keys, tokens, credentials the regex might have missed
-- Classified terms — codenames not yet in the blocklist
-- Internal URLs — references to services that shouldn't be public
-- Blog quality — title length, meta description, slug consistency
-- Broken links — malformed URLs, obvious typos
-- Announcements — if it's a new blog post, draft a one-line summary
-
-The model responds with structured JSON: errors (block) or warnings (inform but allow). If the API is unreachable or times out, the commit proceeds — the hook never blocks work for infrastructure reasons.
-
-## The architecture
-
-`
-git commit
+public|admin\.<internal-domain>|Internal admin URL in public content
+

The scope field controls where patterns apply. all means every file. public means only public-facing content — blog posts, website, marketing pages. We want classified codenames in internal architecture docs. We just don't want them in blog posts.

+

The blocklist runs grep against the staged diff. No network call, no API, no latency. Match found → commit blocked immediately with file path and explanation. This catches 80% of issues before the LLM wakes up.

+

Layer 2: DeepSeek V4 Flash review (~4s, $0.001)

+

If the commit touches public-facing files, the hook sends the staged diff to DeepSeek V4 Flash. The system prompt tells it exactly what to check:

+
    +
  • Leaked secrets — API keys, tokens, credentials the regex might have missed
  • +
  • Classified terms — codenames not yet in the blocklist
  • +
  • Internal URLs — references to services that shouldn't be public
  • +
  • Blog quality — title length, meta description, slug consistency
  • +
  • Broken links — malformed URLs, obvious typos
  • +
  • Announcements — if it's a new blog post, draft a one-line summary
  • +
+

The model responds with structured JSON: errors (block) or warnings (inform but allow). If the API is unreachable or times out, the commit proceeds — the hook never blocks work for infrastructure reasons.

+

The architecture

+
git commit
   ↓
 Phase 0: Collect staged diff + classify files (public vs internal)
   ↓
@@ -331,52 +319,33 @@ Phase 3: Parse JSON response
   → Errors → BLOCK
   → Warnings → print, exit 0
   → Announcement → print draft
-  → API failure → warn, exit 0 (never block on infra)
-`
-
-The hook lives in .githooks/ — committed, version-controlled, shared by the team. A setup script points git config core.hooksPath there.
-
-## What it costs
-
-| | Tokens | Cost |
-|–|——–|——|
-| Input (prompt + diff) | ~4,000 | $0.00056 |
-| Output (JSON response) | ~200 | $0.00006 |
-| Per commit | | $0.00062 |
-
-A tenth of a cent. Twenty commits a day: $0.012/day. About $0.40/month. Commits that only touch internal files skip the AI review entirely — zero cost.
-
-## What it caught (first week)
-
-- 2 classified codename leaks in draft blog posts — caught by blocklist
-- 1 GitHub URL from an old copy-paste — caught by blocklist
-- 3 blog SEO warnings — titles over 60 chars, missing og_description — caught by AI
-- 1 announcement draft auto-generated when a new post was committed
-
-Zero false positives on the blocklist. Two false positives from the AI — flagged an internal URL in a code example that was clearly illustrative. We added a note to the prompt: ignore URLs inside fenced code blocks.
-
-## Setup
-
-`bash
-bash scripts/setup-hooks.sh          # or .\scripts\setup-hooks.ps1 on Windows
-export TINQS_HOOK_TOKEN=  # same PAT used for git push
-`
-
-That's it. Every git commit runs the two-layer review. Bypass with git commit –no-verify` for emergencies.
-
-## The pattern: guard rails at the edge
-
-This is the same principle we apply everywhere: put the guard rail where the action happens. Don't rely on a human checklist. Don't wait for code review. Don't hope someone remembers.
-
-The pre-commit hook is $0.001 of prevention. A leaked API key in a public post is hours of rotation, revocation, and audit. A classified codename in a blog post is a confidentiality breach. A dead link is a broken experience nobody notices for weeks.
-
-The tools exist. DeepSeek V4 Flash is cheap enough to call on every commit. The hook is 150 lines of bash. The blocklist is a text file. Total infrastructure cost: zero — it runs on the developer's machine, calls an API we already pay for, adds 4 seconds to the commit flow.
-
-—
-
-The pre-commit hook is part of Tinqs Studio. The inference proxy, blocklist patterns, and review prompt are open and reusable. Every commit in Ariki runs through the same guard.

- -
+ → API failure → warn, exit 0 (never block on infra)
+

The hook lives in .githooks/ — committed, version-controlled, shared by the team. A setup script points git config core.hooksPath there.

+

What it costs

+

| | Tokens | Cost |

+

|–|——–|——|

+

| Input (prompt + diff) | ~4,000 | $0.00056 |

+

| Output (JSON response) | ~200 | $0.00006 |

+

| Per commit | | $0.00062 |

+

A tenth of a cent. Twenty commits a day: $0.012/day. About $0.40/month. Commits that only touch internal files skip the AI review entirely — zero cost.

+

What it caught (first week)

+
    +
  • 2 classified codename leaks in draft blog posts — caught by blocklist
  • +
  • 1 GitHub URL from an old copy-paste — caught by blocklist
  • +
  • 3 blog SEO warnings — titles over 60 chars, missing og_description — caught by AI
  • +
  • 1 announcement draft auto-generated when a new post was committed
  • +
+

Zero false positives on the blocklist. Two false positives from the AI — flagged an internal URL in a code example that was clearly illustrative. We added a note to the prompt: ignore URLs inside fenced code blocks.

+

Setup

+
bash scripts/setup-hooks.sh          # or .\scripts\setup-hooks.ps1 on Windows
+export TINQS_HOOK_TOKEN=<your-token>  # same PAT used for git push
+

That's it. Every git commit runs the two-layer review. Bypass with git commit –no-verify for emergencies.

+

The pattern: guard rails at the edge

+

This is the same principle we apply everywhere: put the guard rail where the action happens. Don't rely on a human checklist. Don't wait for code review. Don't hope someone remembers.

+

The pre-commit hook is $0.001 of prevention. A leaked API key in a public post is hours of rotation, revocation, and audit. A classified codename in a blog post is a confidentiality breach. A dead link is a broken experience nobody notices for weeks.

+

The tools exist. DeepSeek V4 Flash is cheap enough to call on every commit. The hook is 150 lines of bash. The blocklist is a text file. Total infrastructure cost: zero — it runs on the developer's machine, calls an API we already pay for, adds 4 seconds to the commit flow.

+
+

The pre-commit hook is part of Tinqs Studio. The inference proxy, blocklist patterns, and review prompt are open and reusable. Every commit in Ariki runs through the same guard.

diff --git a/studio-cli.html b/studio-cli.html index 268760f..123bed3 100644 --- a/studio-cli.html +++ b/studio-cli.html @@ -277,64 +277,44 @@ ← All Posts

One Binary to Rule Them All: Our Studio CLI

-

Every AI agent session starts the same way: cold. The agent doesn't know what project this is, who's asking, what tools are available, or what happened yesterday. You spend the first five minutes re-explaining context. - -Our CLI solves this in 100ms. One command — tinqs identity — and the agent knows everything. The binary is 15MB, has zero runtime dependencies, and runs on every machine in the studio. - -## The identity command (100ms) - -When an agent starts, the first thing it calls is tinqs identity. The output: - -- Soul file — the agent's persistent identity, values, operating principles -- Company context — team members, roles, what the company does -- Machine context — hostname, OS, which repos are cloned, what services are running -- Ecosystem — other repos and their purpose -- Service status — which URLs are live and reachable - -This data lives in markdown files in the docs repo. Any machine on the network can read it. The agent goes from blank to fully contextual in under a second. - -This started as a convenience tool for humans. It became the single most important function in our stack. Every agent session — Cursor, Claude Code, Pi — starts with tinqs identity. Without it, every conversation begins with "let me explain the project." With it, the agent already knows. - -## Screenshots and cloud vision - -The CLI can capture any window from outside the process. No in-game overlay, no rendering pipeline integration. OS-level capture — GDI+ on Windows, screencapture on Mac. - -A photo command sends the screenshot to a cloud vision model. The agent says "take a photo of the game" and gets back: "The player character is standing near a half-built hut. Three palm trees to the left. The terrain has a visible seam between two biomes." - -This is how you file bugs without typing. Look at the game, tell the agent what's wrong. It takes a screenshot, describes what it sees, and creates an issue with both the description and the image attached. Keyboard-free bug reporting. - -## Health checks - -tinqs doctor runs a comprehensive check: - -- Is the git platform reachable and authenticated? -- Is the game server running? -- Are all expected repos cloned and on the right branch? -- Are required tools installed at the right version? - -Output is a green/yellow/red table. Essential for unattended agent sessions — the agent verifies its environment before starting work. No "the build failed because port 3000 was already taken" at 3am. - -## Why Go - -Go compiles to a single static binary. No Python virtualenvs, no Node.js version managers, no DLL hell on Windows. The same binary runs on a gaming PC, a designer's MacBook, and a CI runner in AWS. - -Cross-compilation is trivial. We build Windows, Mac (arm64 + amd64), and Linux binaries from a single CI workflow. Push a tag, CI builds all three, uploads to S3. The binary is 15MB, starts in under 100ms, has zero runtime dependencies. - -## What we learned - -The CLI is the API for AI agents. What started as a human convenience tool became the primary interface for agents. Every session starts with tinqs identity. The agent's "hands and eyes" — screenshots, vision, health checks — are subcommands of the same binary. - -One binary beats ten scripts. Scripts rot. They have different shells, different PATH assumptions, different error handling. A compiled binary either works or it doesn't. It ships with dependencies baked in. It doesn't care if your Python is 3.9 or 3.12. - -Cloud vision is underrated for game dev. Sending a screenshot to a vision model sounds gimmicky. In practice, it's the fastest way to document visual bugs. "The tree is floating 2m above the terrain" is much faster to communicate when the AI is looking at the same screen. - -Agent cold starts are the real problem. Without the identity system, every session starts with the agent asking "what project is this?" With it, the agent knows everything in 100ms. That's the difference between an AI assistant and an AI team member. - -— - -The CLI is part of Tinqs Studio. Every time we find ourselves about to write a script that needs to work on multiple machines, we add a subcommand instead. One binary that makes the studio work — whether the operator is human or AI.

+

Every AI agent session starts the same way: cold. The agent doesn't know what project this is, who's asking, what tools are available, or what happened yesterday. You spend the first five minutes re-explaining context.

+

Our CLI solves this in 100ms. One command — tinqs identity — and the agent knows everything. The binary is 15MB, has zero runtime dependencies, and runs on every machine in the studio.

+

The identity command (100ms)

+

When an agent starts, the first thing it calls is tinqs identity. The output:

+
    +
  • Soul file — the agent's persistent identity, values, operating principles
  • +
  • Company context — team members, roles, what the company does
  • +
  • Machine context — hostname, OS, which repos are cloned, what services are running
  • +
  • Ecosystem — other repos and their purpose
  • +
  • Service status — which URLs are live and reachable
  • +
+

This data lives in markdown files in the docs repo. Any machine on the network can read it. The agent goes from blank to fully contextual in under a second.

+

This started as a convenience tool for humans. It became the single most important function in our stack. Every agent session — Cursor, Claude Code, Pi — starts with tinqs identity. Without it, every conversation begins with "let me explain the project." With it, the agent already knows.

+

Screenshots and cloud vision

+

The CLI can capture any window from outside the process. No in-game overlay, no rendering pipeline integration. OS-level capture — GDI+ on Windows, screencapture on Mac.

+

A photo command sends the screenshot to a cloud vision model. The agent says "take a photo of the game" and gets back: "The player character is standing near a half-built hut. Three palm trees to the left. The terrain has a visible seam between two biomes."

+

This is how you file bugs without typing. Look at the game, tell the agent what's wrong. It takes a screenshot, describes what it sees, and creates an issue with both the description and the image attached. Keyboard-free bug reporting.

+

Health checks

+

tinqs doctor runs a comprehensive check:

+
    +
  • Is the git platform reachable and authenticated?
  • +
  • Is the game server running?
  • +
  • Are all expected repos cloned and on the right branch?
  • +
  • Are required tools installed at the right version?
  • +
+

Output is a green/yellow/red table. Essential for unattended agent sessions — the agent verifies its environment before starting work. No "the build failed because port 3000 was already taken" at 3am.

+

Why Go

+

Go compiles to a single static binary. No Python virtualenvs, no Node.js version managers, no DLL hell on Windows. The same binary runs on a gaming PC, a designer's MacBook, and a CI runner in AWS.

+

Cross-compilation is trivial. We build Windows, Mac (arm64 + amd64), and Linux binaries from a single CI workflow. Push a tag, CI builds all three, uploads to S3. The binary is 15MB, starts in under 100ms, has zero runtime dependencies.

+

What we learned

+

The CLI is the API for AI agents. What started as a human convenience tool became the primary interface for agents. Every session starts with tinqs identity. The agent's "hands and eyes" — screenshots, vision, health checks — are subcommands of the same binary.

+

One binary beats ten scripts. Scripts rot. They have different shells, different PATH assumptions, different error handling. A compiled binary either works or it doesn't. It ships with dependencies baked in. It doesn't care if your Python is 3.9 or 3.12.

+

Cloud vision is underrated for game dev. Sending a screenshot to a vision model sounds gimmicky. In practice, it's the fastest way to document visual bugs. "The tree is floating 2m above the terrain" is much faster to communicate when the AI is looking at the same screen.

+

Agent cold starts are the real problem. Without the identity system, every session starts with the agent asking "what project is this?" With it, the agent knows everything in 100ms. That's the difference between an AI assistant and an AI team member.

+
+

The CLI is part of Tinqs Studio. Every time we find ourselves about to write a script that needs to work on multiple machines, we add a subcommand instead. One binary that makes the studio work — whether the operator is human or AI.

diff --git a/voice-missing-input-game-dev.html b/voice-missing-input-game-dev.html index 8686964..7c2f3f0 100644 --- a/voice-missing-input-game-dev.html +++ b/voice-missing-input-game-dev.html @@ -277,28 +277,21 @@ ← All Posts

Why Voice Is the Missing Input for Game Development

-

Every game developer knows this moment. You're playtesting, running through the world, and you see something wrong — a tree floating two meters above the terrain, a UI element clipping, an animation that stutters on frame 14. You make a mental note. Ten minutes later, back at the editor, you try to file it. The coordinates are fuzzy. The exact reproduction steps are gone. You type something vague like "tree floating on west beach maybe" and hope you remember more tomorrow. +

Every game developer knows this moment. You're playtesting, running through the world, and you see something wrong — a tree floating two meters above the terrain, a UI element clipping, an animation that stutters on frame 14. You make a mental note. Ten minutes later, back at the editor, you try to file it. The coordinates are fuzzy. The exact reproduction steps are gone. You type something vague like "tree floating on west beach maybe" and hope you remember more tomorrow.

-Voice changes this entirely. Speak the bug while you're looking at it, and an agent turns your words into a structured issue — with a screenshot, a vision-model description, coordinates, and a severity estimate. No keyboard. No context switch. No memory loss. - -## The latency that kills bug reports - -The distance between seeing a bug and filing it is a memory decay curve. Every second that passes, your recollection loses precision: - -| Elapsed time | What you remember | -|—|—| -| 0 seconds | Exact position, camera angle, what you were doing, what's on screen | -| 30 seconds | "There was a tree... somewhere west... maybe floating?" | -| 5 minutes | "I think there was a rendering issue? Or was it yesterday?" | - -Typed bug reports are reconstructions from decaying memory. Voice bug reports are real-time captures. The difference in quality isn't marginal — it's the difference between a fix you can act on immediately and a ticket that sits in the backlog for three months while someone tries to reproduce it. - -## The pipeline: voice → text → structured issue - -Here's what actually happens when you speak a bug during playtesting: - -`` -1. You speak: "There's a tree floating two meters above the terrain +
+

Voice changes this entirely. Speak the bug while you're looking at it, and an agent turns your words into a structured issue — with a screenshot, a vision-model description, coordinates, and a severity estimate. No keyboard. No context switch. No memory loss.

+

The latency that kills bug reports

+

The distance between seeing a bug and filing it is a memory decay curve. Every second that passes, your recollection loses precision:

+

| Elapsed time | What you remember |

+

|—|—|

+

| 0 seconds | Exact position, camera angle, what you were doing, what's on screen |

+

| 30 seconds | "There was a tree... somewhere west... maybe floating?" |

+

| 5 minutes | "I think there was a rendering issue? Or was it yesterday?" |

+

Typed bug reports are reconstructions from decaying memory. Voice bug reports are real-time captures. The difference in quality isn't marginal — it's the difference between a fix you can act on immediately and a ticket that sits in the backlog for three months while someone tries to reproduce it.

+

The pipeline: voice → text → structured issue

+

Here's what actually happens when you speak a bug during playtesting:

+
1. You speak: "There's a tree floating two meters above the terrain
    on the west beach, near the big rock formation. Happens after
    the vegetation culling pass kicks in around sunset."
    
@@ -316,68 +309,39 @@ Here's what actually happens when you speak a bug during playtesting:
 5. Agent files a structured issue with all of the above,
    tags the rendering engineer, and posts the digest to team chat.
    
-Total latency: under 2 seconds. You keep playing.
-``
-
-This isn't theoretical. The pipeline runs on our own game project, and it's caught bugs that would have slipped through playtesting entirely — the ones you see, make a mental note about, and forget by the time you alt-tab.
-
-## Why game dev is the perfect voice use case
-
-You're already looking at the screen. Voice input doesn't require switching windows or breaking flow. You're playtesting — your hands are on the controller or WASD, your eyes are on the game. Speaking is the only input channel that doesn't interrupt the thing you're actually doing.
-
-Game bugs are spatial and visual. "The crafting UI text overflows on items with names longer than 20 characters" is something you see, not something you calculate. Describing it verbally while looking at it produces a far richer bug report than typing from memory.
-
-Reproduction is half the battle. When you speak the bug at the moment of occurrence, you naturally include the context: what you were doing, what just happened, what the game state was. You don't have to reconstruct it later.
-
-Voice scales to the whole team. Artists see visual bugs. Designers see balance issues. Producers see UX friction. Not everyone on a game team is a fast typist or comfortable with issue trackers. Everyone can speak.
-
-## What the agent adds beyond transcription
-
-Raw transcription is useful — it's a notepad you don't have to type. But the agent layer is what makes voice input a pipeline rather than a dictation tool:
-
-Screenshot coordination. The agent calls the game engine's HTTP API, captures the current frame, and attaches it to the issue. You don't take screenshots. The agent does.
-
-Vision model description. The screenshot goes through a vision model that writes a text description of what's on screen. Future-you searching the issue tracker for "floating tree" finds it even if the transcription was garbled.
-
-Coordinates and context. The game engine provides the player's world position, camera angle, and current game state. The agent bakes these into the issue. A developer can teleport directly to the bug location.
-
-Severity and routing. The agent estimates severity from context ("floating" is visual, "crash" is critical) and tags the right team member. An artist doesn't get pinged for a shader bug. A rendering engineer doesn't get pinged for a UI text overflow.
-
-## The numbers
-
-| Method | Time from observation to filed issue | Information loss |
-|—|—|—|
-| Mental note → type later | 5-30 minutes | High (positions, steps, context) |
-| Alt-tab → type immediately | 30-60 seconds | Medium (screenshots missed, flow broken) |
-| Voice → agent pipeline | 2 seconds | Low (screenshot + position captured automatically) |
-
-The throughput difference compounds. A 30-minute playtest session with keyboard-only bug filing might yield 3-4 issues, half of them vague. The same session with voice-to-agent produces 10-15 issues, all with screenshots, positions, and reproduction context.
-
-## Setup is simpler than you think
-
-You need three things, all of which you probably already have:
-
-1. A microphone. The one in your headset is fine. Transcription models handle suboptimal audio surprisingly well.
-2. Transcription. Whisper runs locally and is free. Cloud APIs are sub-cent per minute. Both work.
-3. An agent that speaks your game engine's API. If your engine has an HTTP interface for screenshots and game state, the agent can wire the rest together. If it doesn't — add one. It's a weekend project.
-
-The agent itself doesn't need to be custom-built. Any coding agent with tool access can be told "watch the game, transcribe voice input, file issues in the tracker." It's a skill file, not a product.
-
-## What changes when you stop typing bugs
-
-The most surprising effect isn't the speed. It's the coverage. When filing a bug costs two seconds of speaking, you file bugs you would have previously ignored. The minor visual glitch. The slight animation hitch. The UI element that's two pixels misaligned.
-
-Individually these are low-priority. Collectively they're the difference between a game that feels polished and one that feels rough. And they only get caught when the cost of reporting approaches zero.
-
-The second effect is that playtesting becomes a primary input channel. Instead of structured QA sessions with checklists and forms, you just play the game. The agent captures everything. When you're done, you have a list of filed issues with screenshots and context — generated from your spoken observations in real time.
-
-Voice isn't a gimmick for game development. It's the input channel that matches the way we actually work — looking at the screen, noticing things, and talking about them. The tools exist. The latency is sub-second. The cost is negligible. The only thing missing is the habit.
-
-—
-
-We build Tinqs Studio — a game dev platform with built-in AI agents, git hosting, and creative pipelines. Ariki is the survival colony sim we're building with every tool described here.

- -
+Total latency: under 2 seconds. You keep playing.
+

This isn't theoretical. The pipeline runs on our own game project, and it's caught bugs that would have slipped through playtesting entirely — the ones you see, make a mental note about, and forget by the time you alt-tab.

+

Why game dev is the perfect voice use case

+

You're already looking at the screen. Voice input doesn't require switching windows or breaking flow. You're playtesting — your hands are on the controller or WASD, your eyes are on the game. Speaking is the only input channel that doesn't interrupt the thing you're actually doing.

+

Game bugs are spatial and visual. "The crafting UI text overflows on items with names longer than 20 characters" is something you see, not something you calculate. Describing it verbally while looking at it produces a far richer bug report than typing from memory.

+

Reproduction is half the battle. When you speak the bug at the moment of occurrence, you naturally include the context: what you were doing, what just happened, what the game state was. You don't have to reconstruct it later.

+

Voice scales to the whole team. Artists see visual bugs. Designers see balance issues. Producers see UX friction. Not everyone on a game team is a fast typist or comfortable with issue trackers. Everyone can speak.

+

What the agent adds beyond transcription

+

Raw transcription is useful — it's a notepad you don't have to type. But the agent layer is what makes voice input a pipeline rather than a dictation tool:

+

Screenshot coordination. The agent calls the game engine's HTTP API, captures the current frame, and attaches it to the issue. You don't take screenshots. The agent does.

+

Vision model description. The screenshot goes through a vision model that writes a text description of what's on screen. Future-you searching the issue tracker for "floating tree" finds it even if the transcription was garbled.

+

Coordinates and context. The game engine provides the player's world position, camera angle, and current game state. The agent bakes these into the issue. A developer can teleport directly to the bug location.

+

Severity and routing. The agent estimates severity from context ("floating" is visual, "crash" is critical) and tags the right team member. An artist doesn't get pinged for a shader bug. A rendering engineer doesn't get pinged for a UI text overflow.

+

The numbers

+

| Method | Time from observation to filed issue | Information loss |

+

|—|—|—|

+

| Mental note → type later | 5-30 minutes | High (positions, steps, context) |

+

| Alt-tab → type immediately | 30-60 seconds | Medium (screenshots missed, flow broken) |

+

| Voice → agent pipeline | 2 seconds | Low (screenshot + position captured automatically) |

+

The throughput difference compounds. A 30-minute playtest session with keyboard-only bug filing might yield 3-4 issues, half of them vague. The same session with voice-to-agent produces 10-15 issues, all with screenshots, positions, and reproduction context.

+

Setup is simpler than you think

+

You need three things, all of which you probably already have:

+

1. A microphone. The one in your headset is fine. Transcription models handle suboptimal audio surprisingly well.

+

2. Transcription. Whisper runs locally and is free. Cloud APIs are sub-cent per minute. Both work.

+

3. An agent that speaks your game engine's API. If your engine has an HTTP interface for screenshots and game state, the agent can wire the rest together. If it doesn't — add one. It's a weekend project.

+

The agent itself doesn't need to be custom-built. Any coding agent with tool access can be told "watch the game, transcribe voice input, file issues in the tracker." It's a skill file, not a product.

+

What changes when you stop typing bugs

+

The most surprising effect isn't the speed. It's the coverage. When filing a bug costs two seconds of speaking, you file bugs you would have previously ignored. The minor visual glitch. The slight animation hitch. The UI element that's two pixels misaligned.

+

Individually these are low-priority. Collectively they're the difference between a game that feels polished and one that feels rough. And they only get caught when the cost of reporting approaches zero.

+

The second effect is that playtesting becomes a primary input channel. Instead of structured QA sessions with checklists and forms, you just play the game. The agent captures everything. When you're done, you have a list of filed issues with screenshots and context — generated from your spoken observations in real time.

+

Voice isn't a gimmick for game development. It's the input channel that matches the way we actually work — looking at the screen, noticing things, and talking about them. The tools exist. The latency is sub-second. The cost is negligible. The only thing missing is the habit.

+
+

We build Tinqs Studio — a game dev platform with built-in AI agents, git hosting, and creative pipelines. Ariki is the survival colony sim we're building with every tool described here.

From 85a6db41c5288fce028bddc7f6a33a1e9884c7d7 Mon Sep 17 00:00:00 2001 From: Ozan Bozkurt Date: Mon, 15 Jun 2026 22:48:25 +0100 Subject: [PATCH 4/8] post: add engine improvement roadmap (Tier A-D) --- gpu-skinned-herds.html | 23 +++++++++++++++++------ posts/gpu-skinned-herds.md | 36 ++++++++++++++++++++++++++++++++---- 2 files changed, 49 insertions(+), 10 deletions(-) diff --git a/gpu-skinned-herds.html b/gpu-skinned-herds.html index 6c99e2a..2cbce02 100644 --- a/gpu-skinned-herds.html +++ b/gpu-skinned-herds.html @@ -380,12 +380,23 @@ NORMAL = normalize((skin * vec4(NORMAL, 0.0)).xyz);
  • Impostor tier (2D billboards) — sprite atlas indexed by view-angle and animation-frame. For very far agents.
  • One abstraction, three detail levels. The same code that drives 30 animals today will drive thousands of colonists at launch.

    -

    What's deliberately not here

    -
      -
    • No C# wrapper. Instantiate from GDScript via ClassDB.instantiate() — the binding surface is small and stable.
    • -
    • No automatic AnimationPlayer integration. You drive poses at bake time. We give you the texture. Freedom to animate however you want.
    • -
    • No GPU occlusion culling. That's the game's job. The engine provides the tool; the game decides what to draw.
    • -
    +

    The engine roadmap — where we push next

    +

    The 10,000-agent load test pointed at exactly where the engine can still win. The bottleneck at extreme scale isn't the GPU skinning — it's the CPU path feeding it: C# builds arrays, marshals Variants into GDScript, GDScript loops per-instance set_instance_transform / set_instance_custom_data. Three layers of per-instance overhead, all on the main thread. The fixes are engine-deep.

    +

    Tier A — kill the per-frame CPU path

    +

    Bulk instance-upload API. Add set_instance_data_bulk() that does a single memcpy into the MultiMesh buffer instead of N scripted per-instance calls. One marshalled call + one copy per herd per frame instead of thousands.

    +

    GPU-driven cull + indirect multi-draw. A compute pass classifies frustum, distance, and LOD-tier on the GPU and writes an indirect draw buffer per tier — the CPU stops iterating instances entirely. Pairs with bulk upload: together the main thread does ~zero per-instance work.

    +

    GPU dead-reckoning of position. Store per-instance velocity in custom data. Advance transforms from TIME in the vertex shader. The CPU only touches an instance on a sim snapshot (~every 0.4s), not every frame.

    +

    Tier B — skinning core upgrades

    +

    Dual-quaternion skinning. 2 texels per bone instead of 4. Halves palette VRAM, halves per-vertex texel fetches, and fixes the "candy-wrapper" collapse on twisting joints that linear blend skinning has. A real engine-grade upgrade.

    +

    Mat4x3 storage. The 4th column of a bone matrix is always (0,0,0,1) — dropping it saves 25% VRAM with zero quality loss. A quick win if dual-quat is too big a step.

    +

    Reduced-bone far LOD. Drop fingers, tail, and face bones for the far tier — fewer fetches where detail isn't visible.

    +

    Tier C — visibility & render passes

    +

    Frustum-cull integration. MultiMesh draws everything in visible_instance_count — wire per-instance frustum culling into the engine's visibility system.

    +

    Shadow-pass LOD. The skinning shader runs again in the depth/shadow pass. Skip skinning or drop shadow casters beyond a distance — often a hidden ~2× vertex cost.

    +

    Tier D — quality & pipeline

    +

    In-shader clip cross-fade. Blend two clip blocks per instance (second custom slot + blend factor) instead of hard state cuts — brings hero-rig smoothness to the whole crowd without a real skeleton.

    +

    Threaded bake. Move the palette bake to a worker thread so first-encounter of a new animal type never hitches the main thread.

    +

    The recommended order: bulk upload (directly fixes the measured bottleneck, small, low-risk) → mat4x3 storage (immediate VRAM win) → GPU-driven cull + indirect draw (removes CPU from the loop entirely, unlocks tens of thousands) → dual-quaternion skinning (the skinning-quality leap). The first two are a day each and compounding; the latter two are the deep engine investments that make this a genuinely AAA crowd platform.

    Get the build

    Pre-built editor binaries with agent_skinned and the GPU-driven palette baked in — no engine compile required. The game's animal_perf_test.tscn lets you spawn 10/100/1,000/10,000 animals and read live FPS:

    | Platform | Binary |

    diff --git a/posts/gpu-skinned-herds.md b/posts/gpu-skinned-herds.md index 143f23a..c11fe50 100644 --- a/posts/gpu-skinned-herds.md +++ b/posts/gpu-skinned-herds.md @@ -164,11 +164,39 @@ The platform supports three tiers by distance, all driven by the same `(clip, co One abstraction, three detail levels. The same code that drives 30 animals today will drive thousands of colonists at launch. -## What's deliberately not here +## The engine roadmap — where we push next -- **No C# wrapper.** Instantiate from GDScript via `ClassDB.instantiate()` — the binding surface is small and stable. -- **No automatic `AnimationPlayer` integration.** You drive poses at bake time. We give you the texture. Freedom to animate however you want. -- **No GPU occlusion culling.** That's the game's job. The engine provides the tool; the game decides what to draw. +The 10,000-agent load test pointed at exactly where the engine can still win. The bottleneck at extreme scale isn't the GPU skinning — it's the CPU path feeding it: C# builds arrays, marshals Variants into GDScript, GDScript loops per-instance `set_instance_transform` / `set_instance_custom_data`. Three layers of per-instance overhead, all on the main thread. The fixes are engine-deep. + +### Tier A — kill the per-frame CPU path + +**Bulk instance-upload API.** Add `set_instance_data_bulk()` that does a single memcpy into the MultiMesh buffer instead of N scripted per-instance calls. One marshalled call + one copy per herd per frame instead of thousands. + +**GPU-driven cull + indirect multi-draw.** A compute pass classifies frustum, distance, and LOD-tier on the GPU and writes an indirect draw buffer per tier — the CPU stops iterating instances entirely. Pairs with bulk upload: together the main thread does ~zero per-instance work. + +**GPU dead-reckoning of position.** Store per-instance velocity in custom data. Advance transforms from `TIME` in the vertex shader. The CPU only touches an instance on a sim snapshot (~every 0.4s), not every frame. + +### Tier B — skinning core upgrades + +**Dual-quaternion skinning.** 2 texels per bone instead of 4. Halves palette VRAM, halves per-vertex texel fetches, and fixes the "candy-wrapper" collapse on twisting joints that linear blend skinning has. A real engine-grade upgrade. + +**Mat4x3 storage.** The 4th column of a bone matrix is always `(0,0,0,1)` — dropping it saves 25% VRAM with zero quality loss. A quick win if dual-quat is too big a step. + +**Reduced-bone far LOD.** Drop fingers, tail, and face bones for the far tier — fewer fetches where detail isn't visible. + +### Tier C — visibility & render passes + +**Frustum-cull integration.** MultiMesh draws everything in `visible_instance_count` — wire per-instance frustum culling into the engine's visibility system. + +**Shadow-pass LOD.** The skinning shader runs again in the depth/shadow pass. Skip skinning or drop shadow casters beyond a distance — often a hidden ~2× vertex cost. + +### Tier D — quality & pipeline + +**In-shader clip cross-fade.** Blend two clip blocks per instance (second custom slot + blend factor) instead of hard state cuts — brings hero-rig smoothness to the whole crowd without a real skeleton. + +**Threaded bake.** Move the palette bake to a worker thread so first-encounter of a new animal type never hitches the main thread. + +The recommended order: bulk upload (directly fixes the measured bottleneck, small, low-risk) → mat4x3 storage (immediate VRAM win) → GPU-driven cull + indirect draw (removes CPU from the loop entirely, unlocks tens of thousands) → dual-quaternion skinning (the skinning-quality leap). The first two are a day each and compounding; the latter two are the deep engine investments that make this a genuinely AAA crowd platform. ## Get the build From 6524ac3597d747f30fe8dbec24759b9ded2ecb2d Mon Sep 17 00:00:00 2001 From: Ozan Bozkurt Date: Mon, 15 Jun 2026 22:50:19 +0100 Subject: [PATCH 5/8] =?UTF-8?q?post:=20remove=20internal=20roadmap=20?= =?UTF-8?q?=E2=80=94=20public-facing=20only?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- gpu-skinned-herds.html | 23 ++++++----------------- posts/gpu-skinned-herds.md | 36 ++++-------------------------------- 2 files changed, 10 insertions(+), 49 deletions(-) diff --git a/gpu-skinned-herds.html b/gpu-skinned-herds.html index 2cbce02..6c99e2a 100644 --- a/gpu-skinned-herds.html +++ b/gpu-skinned-herds.html @@ -380,23 +380,12 @@ NORMAL = normalize((skin * vec4(NORMAL, 0.0)).xyz);
  • Impostor tier (2D billboards) — sprite atlas indexed by view-angle and animation-frame. For very far agents.
  • One abstraction, three detail levels. The same code that drives 30 animals today will drive thousands of colonists at launch.

    -

    The engine roadmap — where we push next

    -

    The 10,000-agent load test pointed at exactly where the engine can still win. The bottleneck at extreme scale isn't the GPU skinning — it's the CPU path feeding it: C# builds arrays, marshals Variants into GDScript, GDScript loops per-instance set_instance_transform / set_instance_custom_data. Three layers of per-instance overhead, all on the main thread. The fixes are engine-deep.

    -

    Tier A — kill the per-frame CPU path

    -

    Bulk instance-upload API. Add set_instance_data_bulk() that does a single memcpy into the MultiMesh buffer instead of N scripted per-instance calls. One marshalled call + one copy per herd per frame instead of thousands.

    -

    GPU-driven cull + indirect multi-draw. A compute pass classifies frustum, distance, and LOD-tier on the GPU and writes an indirect draw buffer per tier — the CPU stops iterating instances entirely. Pairs with bulk upload: together the main thread does ~zero per-instance work.

    -

    GPU dead-reckoning of position. Store per-instance velocity in custom data. Advance transforms from TIME in the vertex shader. The CPU only touches an instance on a sim snapshot (~every 0.4s), not every frame.

    -

    Tier B — skinning core upgrades

    -

    Dual-quaternion skinning. 2 texels per bone instead of 4. Halves palette VRAM, halves per-vertex texel fetches, and fixes the "candy-wrapper" collapse on twisting joints that linear blend skinning has. A real engine-grade upgrade.

    -

    Mat4x3 storage. The 4th column of a bone matrix is always (0,0,0,1) — dropping it saves 25% VRAM with zero quality loss. A quick win if dual-quat is too big a step.

    -

    Reduced-bone far LOD. Drop fingers, tail, and face bones for the far tier — fewer fetches where detail isn't visible.

    -

    Tier C — visibility & render passes

    -

    Frustum-cull integration. MultiMesh draws everything in visible_instance_count — wire per-instance frustum culling into the engine's visibility system.

    -

    Shadow-pass LOD. The skinning shader runs again in the depth/shadow pass. Skip skinning or drop shadow casters beyond a distance — often a hidden ~2× vertex cost.

    -

    Tier D — quality & pipeline

    -

    In-shader clip cross-fade. Blend two clip blocks per instance (second custom slot + blend factor) instead of hard state cuts — brings hero-rig smoothness to the whole crowd without a real skeleton.

    -

    Threaded bake. Move the palette bake to a worker thread so first-encounter of a new animal type never hitches the main thread.

    -

    The recommended order: bulk upload (directly fixes the measured bottleneck, small, low-risk) → mat4x3 storage (immediate VRAM win) → GPU-driven cull + indirect draw (removes CPU from the loop entirely, unlocks tens of thousands) → dual-quaternion skinning (the skinning-quality leap). The first two are a day each and compounding; the latter two are the deep engine investments that make this a genuinely AAA crowd platform.

    +

    What's deliberately not here

    +
      +
    • No C# wrapper. Instantiate from GDScript via ClassDB.instantiate() — the binding surface is small and stable.
    • +
    • No automatic AnimationPlayer integration. You drive poses at bake time. We give you the texture. Freedom to animate however you want.
    • +
    • No GPU occlusion culling. That's the game's job. The engine provides the tool; the game decides what to draw.
    • +

    Get the build

    Pre-built editor binaries with agent_skinned and the GPU-driven palette baked in — no engine compile required. The game's animal_perf_test.tscn lets you spawn 10/100/1,000/10,000 animals and read live FPS:

    | Platform | Binary |

    diff --git a/posts/gpu-skinned-herds.md b/posts/gpu-skinned-herds.md index c11fe50..143f23a 100644 --- a/posts/gpu-skinned-herds.md +++ b/posts/gpu-skinned-herds.md @@ -164,39 +164,11 @@ The platform supports three tiers by distance, all driven by the same `(clip, co One abstraction, three detail levels. The same code that drives 30 animals today will drive thousands of colonists at launch. -## The engine roadmap — where we push next +## What's deliberately not here -The 10,000-agent load test pointed at exactly where the engine can still win. The bottleneck at extreme scale isn't the GPU skinning — it's the CPU path feeding it: C# builds arrays, marshals Variants into GDScript, GDScript loops per-instance `set_instance_transform` / `set_instance_custom_data`. Three layers of per-instance overhead, all on the main thread. The fixes are engine-deep. - -### Tier A — kill the per-frame CPU path - -**Bulk instance-upload API.** Add `set_instance_data_bulk()` that does a single memcpy into the MultiMesh buffer instead of N scripted per-instance calls. One marshalled call + one copy per herd per frame instead of thousands. - -**GPU-driven cull + indirect multi-draw.** A compute pass classifies frustum, distance, and LOD-tier on the GPU and writes an indirect draw buffer per tier — the CPU stops iterating instances entirely. Pairs with bulk upload: together the main thread does ~zero per-instance work. - -**GPU dead-reckoning of position.** Store per-instance velocity in custom data. Advance transforms from `TIME` in the vertex shader. The CPU only touches an instance on a sim snapshot (~every 0.4s), not every frame. - -### Tier B — skinning core upgrades - -**Dual-quaternion skinning.** 2 texels per bone instead of 4. Halves palette VRAM, halves per-vertex texel fetches, and fixes the "candy-wrapper" collapse on twisting joints that linear blend skinning has. A real engine-grade upgrade. - -**Mat4x3 storage.** The 4th column of a bone matrix is always `(0,0,0,1)` — dropping it saves 25% VRAM with zero quality loss. A quick win if dual-quat is too big a step. - -**Reduced-bone far LOD.** Drop fingers, tail, and face bones for the far tier — fewer fetches where detail isn't visible. - -### Tier C — visibility & render passes - -**Frustum-cull integration.** MultiMesh draws everything in `visible_instance_count` — wire per-instance frustum culling into the engine's visibility system. - -**Shadow-pass LOD.** The skinning shader runs again in the depth/shadow pass. Skip skinning or drop shadow casters beyond a distance — often a hidden ~2× vertex cost. - -### Tier D — quality & pipeline - -**In-shader clip cross-fade.** Blend two clip blocks per instance (second custom slot + blend factor) instead of hard state cuts — brings hero-rig smoothness to the whole crowd without a real skeleton. - -**Threaded bake.** Move the palette bake to a worker thread so first-encounter of a new animal type never hitches the main thread. - -The recommended order: bulk upload (directly fixes the measured bottleneck, small, low-risk) → mat4x3 storage (immediate VRAM win) → GPU-driven cull + indirect draw (removes CPU from the loop entirely, unlocks tens of thousands) → dual-quaternion skinning (the skinning-quality leap). The first two are a day each and compounding; the latter two are the deep engine investments that make this a genuinely AAA crowd platform. +- **No C# wrapper.** Instantiate from GDScript via `ClassDB.instantiate()` — the binding surface is small and stable. +- **No automatic `AnimationPlayer` integration.** You drive poses at bake time. We give you the texture. Freedom to animate however you want. +- **No GPU occlusion culling.** That's the game's job. The engine provides the tool; the game decides what to draw. ## Get the build From 3226fe28a60bc49edb0b9d2ecc61d35724b8fd45 Mon Sep 17 00:00:00 2001 From: Ozan Bozkurt Date: Mon, 15 Jun 2026 22:55:35 +0100 Subject: [PATCH 6/8] =?UTF-8?q?add=20.agents/design/=20=E2=80=94=20blog=20?= =?UTF-8?q?voice=20+=20rules=20for=20agents?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- .agents/design/README.md | 32 ++++++++++++++++++++++++++++++++ gpu-skinned-herds.html | 4 ++-- posts/gpu-skinned-herds.md | 4 ++-- 3 files changed, 36 insertions(+), 4 deletions(-) create mode 100644 .agents/design/README.md diff --git a/.agents/design/README.md b/.agents/design/README.md new file mode 100644 index 0000000..5c4fc09 --- /dev/null +++ b/.agents/design/README.md @@ -0,0 +1,32 @@ +# Blog design — agent rules + +Design and content rules for agents writing tinqs/blog posts. + +## Voice + +- **Confident, shipped.** Every feature is presented as built, complete, and working. Never "we plan to," "we're working on," "next we'll." +- **No evolution narrative.** Don't tell the "first we tried X, then we fixed Y" story. Present the final design as if it was the plan from day one. +- **External audience.** Readers are game developers and technical audiences outside Tinqs. No internal project names, no team org charts, no "Ozan decided" or "Uygar built." + +## What to never mention + +- **Asset pack names or vendors.** Polyperfect, Low Poly Ultimate Pack, Quaternius, Kevin Iglesias, etc. Say "our animal models" or nothing at all. +- **Unity import details.** FBX source format, `.anim` files, `.meta` clip ranges, `isleborn/` paths. +- **Internal tooling specifics.** `migrate_animals.py` internals, Blender pipeline details, repo paths. +- **Things that failed or were removed.** Failed migrations, broken assets, animals we deleted, bugs we shipped then fixed. If you must mention a bug, frame it as a design insight learned during development — never "we shipped this broken." +- **Roadmaps, tiers, trade-offs, future plans.** The post describes what exists. No "Tier A/B/C," no "recommended build order," no "what's next." + +## Structure + +- **Title:** technical, specific, bold. "How We Made 1,000 Animals Animate Without a Single Skeleton" not "Crowd Animation Update." +- **Opening:** state the problem (what stock Godot can't do), state our solution, give the numbers. +- **Body:** architecture, shader code, data flow, VRAM math, benchmarks. Ground everything in numbers. +- **Closing:** where to get it, what it drives (Ariki), related posts. No roadmap. + +## Numbers rule + +Every claim about performance, VRAM, or scale must cite a measured number. "60 FPS at 1,000 agents on M1 Pro" not "great performance." "1.6 MB per type" not "tiny VRAM." + +## Code + +Shader code and architecture diagrams are encouraged — this is a technical blog. But keep code blocks focused on the key insight, not the whole file. diff --git a/gpu-skinned-herds.html b/gpu-skinned-herds.html index 6c99e2a..2449704 100644 --- a/gpu-skinned-herds.html +++ b/gpu-skinned-herds.html @@ -357,7 +357,7 @@ NORMAL = normalize((skin * vec4(NORMAL, 0.0)).xyz);

    What's driving it

    In Ariki, the sim tracks animal migration across a 12km archipelago. AnimalHerdRenderer.cs groups sim ViewerState.animals by type, feeds world positions and yaw rotations to skinned_herd.gd — the reusable per-type herd backend. The herd bakes the palette once at setup, then set_positions() updates transforms each sim tick. set_clip_for_state() switches the active clip block in the custom data when the sim FSM changes state. set_speed_scale() adjusts the per-instance playback rate to match ground speed — feet stay planted.

    The sim owns all behavior — 30 data-driven animals with per-animal senses, diet, combat stats, and FSM states (graze, drink, sleep, hunt, flee, scavenge, die). The client just renders. This is the same code in single-player and multiplayer — the sim is the host.

    -

    Bird flocks use the same system. BirdFlock.cs runs boid flocking on top of skinned_herd, sharing the palette with synchronized phases (airborne flapping in unison is intentional). 25 bird species migrated from the Low Poly Bird Ultimate Pack, each a single draw call.

    +

    Bird flocks use the same system. BirdFlock.cs runs boid flocking on top of skinned_herd, sharing the palette with synchronized phases (airborne flapping in unison is intentional). 25 bird species, each a single draw call.

    Per-instance custom data means a walking Boar, a running Boar, an idle Boar, and an attacking Boar all share the same baked palette — they just point at different rows. The renderer groups by type, not by state. One palette, one draw call, any number of states.

    Two bugs we shipped and fixed

    The module had data-plane doctests from day one — round-trip pose get/set, dirty tracking, size clamping, AABB, column-major layout. All green. Then we put it on screen and two things were wrong.

    @@ -369,7 +369,7 @@ NORMAL = normalize((skin * vec4(NORMAL, 0.0)).xyz);

    Engine version: 4.6.5.

    No C# wrapper is generated — instantiate from GDScript via ClassDB.instantiate() and call the bound methods. The binding surface is small and stable. See ariki-game/scenes/animals/skinned_herd.gd for the reference backend.

    The production pipeline

    -

    The migrate_animals.py tool converts polyperfect FBX packs to game-ready GLBs — imports, cleans hierarchy, rebuilds named NLA clips from frame ranges, strips duplicate meshes, bakes into the flat assets/models/glbs/ directory. Each animal gets a catalog entry in animals_catalog.json with clip metadata, default state mapping, and an animSpeedRef for foot-sync.

    +

    The migrate_animals.py tool converts source FBX files to game-ready GLBs — imports, cleans hierarchy, rebuilds named NLA clips from frame ranges, strips duplicate meshes, bakes into the flat assets/models/glbs/ directory. Each animal gets a catalog entry in animals_catalog.json with clip metadata, default state mapping, and an animSpeedRef for foot-sync.

    At runtime, AnimalHerdRenderer spawns one skinned_herd per animal type. The herd bakes the palette from the catalog GLB's clips. AnimalAnimationLogic maps sim FSM states to clip keywords (attack → "attack"/"bite", flee → "run"/"gallop", wander → "walk"). The renderer lerps positions between sim ticks for smooth motion and writes per-instance custom data each frame. Zero per-frame CPU on the animation path.

    Where we stand vs the industry

    The bone-matrix palette technique is the same architecture used by Assassin's Creed Unity, Total War: Warhammer, and Hitman for their crowd systems. We're using the same core idea, in a Godot fork, with smaller VRAM — our low-poly animals keep textures tiny.

    diff --git a/posts/gpu-skinned-herds.md b/posts/gpu-skinned-herds.md index 143f23a..4d35a67 100644 --- a/posts/gpu-skinned-herds.md +++ b/posts/gpu-skinned-herds.md @@ -125,7 +125,7 @@ In [Ariki](https://www.arikigame.com), the sim tracks animal migration across a The sim owns all behavior — 30 data-driven animals with per-animal senses, diet, combat stats, and FSM states (graze, drink, sleep, hunt, flee, scavenge, die). The client just renders. This is the same code in single-player and multiplayer — the sim is the host. -Bird flocks use the same system. `BirdFlock.cs` runs boid flocking on top of `skinned_herd`, sharing the palette with synchronized phases (airborne flapping in unison is intentional). 25 bird species migrated from the Low Poly Bird Ultimate Pack, each a single draw call. +Bird flocks use the same system. `BirdFlock.cs` runs boid flocking on top of `skinned_herd`, sharing the palette with synchronized phases (airborne flapping in unison is intentional). 25 bird species, each a single draw call. Per-instance custom data means a walking Boar, a running Boar, an idle Boar, and an attacking Boar all share the same baked palette — they just point at different rows. The renderer groups by type, not by state. One palette, one draw call, any number of states. @@ -149,7 +149,7 @@ No C# wrapper is generated — instantiate from GDScript via `ClassDB.instantiat ## The production pipeline -The `migrate_animals.py` tool converts polyperfect FBX packs to game-ready GLBs — imports, cleans hierarchy, rebuilds named NLA clips from frame ranges, strips duplicate meshes, bakes into the flat `assets/models/glbs/` directory. Each animal gets a catalog entry in `animals_catalog.json` with clip metadata, default state mapping, and an `animSpeedRef` for foot-sync. +The `migrate_animals.py` tool converts source FBX files to game-ready GLBs — imports, cleans hierarchy, rebuilds named NLA clips from frame ranges, strips duplicate meshes, bakes into the flat `assets/models/glbs/` directory. Each animal gets a catalog entry in `animals_catalog.json` with clip metadata, default state mapping, and an `animSpeedRef` for foot-sync. At runtime, `AnimalHerdRenderer` spawns one `skinned_herd` per animal type. The herd bakes the palette from the catalog GLB's clips. `AnimalAnimationLogic` maps sim FSM states to clip keywords (attack → "attack"/"bite", flee → "run"/"gallop", wander → "walk"). The renderer lerps positions between sim ticks for smooth motion and writes per-instance custom data each frame. Zero per-frame CPU on the animation path. From 05685ba477d500d5d7b44f54a509d54c723dd83a Mon Sep 17 00:00:00 2001 From: Ozan Bozkurt Date: Mon, 15 Jun 2026 22:56:59 +0100 Subject: [PATCH 7/8] post: strip internal tooling details from pipeline section --- gpu-skinned-herds.html | 4 ++-- posts/gpu-skinned-herds.md | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/gpu-skinned-herds.html b/gpu-skinned-herds.html index 2449704..a71d73d 100644 --- a/gpu-skinned-herds.html +++ b/gpu-skinned-herds.html @@ -369,8 +369,8 @@ NORMAL = normalize((skin * vec4(NORMAL, 0.0)).xyz);

    Engine version: 4.6.5.

    No C# wrapper is generated — instantiate from GDScript via ClassDB.instantiate() and call the bound methods. The binding surface is small and stable. See ariki-game/scenes/animals/skinned_herd.gd for the reference backend.

    The production pipeline

    -

    The migrate_animals.py tool converts source FBX files to game-ready GLBs — imports, cleans hierarchy, rebuilds named NLA clips from frame ranges, strips duplicate meshes, bakes into the flat assets/models/glbs/ directory. Each animal gets a catalog entry in animals_catalog.json with clip metadata, default state mapping, and an animSpeedRef for foot-sync.

    -

    At runtime, AnimalHerdRenderer spawns one skinned_herd per animal type. The herd bakes the palette from the catalog GLB's clips. AnimalAnimationLogic maps sim FSM states to clip keywords (attack → "attack"/"bite", flee → "run"/"gallop", wander → "walk"). The renderer lerps positions between sim ticks for smooth motion and writes per-instance custom data each frame. Zero per-frame CPU on the animation path.

    +

    Each animal model ships as a game-ready GLB with baked animation clips. A catalog file maps each animal to its clips, default state, and per-animal speed reference for foot-sync.

    +

    At runtime, AnimalHerdRenderer spawns one skinned_herd per animal type. The herd bakes the palette from the model's clips. Animation logic maps sim FSM states to clip keywords (attack → attack/bite, flee → run/gallop, wander → walk). The renderer lerps positions between sim ticks for smooth motion and writes per-instance custom data each frame. Zero per-frame CPU on the animation path.

    Where we stand vs the industry

    The bone-matrix palette technique is the same architecture used by Assassin's Creed Unity, Total War: Warhammer, and Hitman for their crowd systems. We're using the same core idea, in a Godot fork, with smaller VRAM — our low-poly animals keep textures tiny.

    The platform supports three tiers by distance, all driven by the same (clip, count, speed, phase) packet:

    diff --git a/posts/gpu-skinned-herds.md b/posts/gpu-skinned-herds.md index 4d35a67..5fbb380 100644 --- a/posts/gpu-skinned-herds.md +++ b/posts/gpu-skinned-herds.md @@ -149,9 +149,9 @@ No C# wrapper is generated — instantiate from GDScript via `ClassDB.instantiat ## The production pipeline -The `migrate_animals.py` tool converts source FBX files to game-ready GLBs — imports, cleans hierarchy, rebuilds named NLA clips from frame ranges, strips duplicate meshes, bakes into the flat `assets/models/glbs/` directory. Each animal gets a catalog entry in `animals_catalog.json` with clip metadata, default state mapping, and an `animSpeedRef` for foot-sync. +Each animal model ships as a game-ready GLB with baked animation clips. A catalog file maps each animal to its clips, default state, and per-animal speed reference for foot-sync. -At runtime, `AnimalHerdRenderer` spawns one `skinned_herd` per animal type. The herd bakes the palette from the catalog GLB's clips. `AnimalAnimationLogic` maps sim FSM states to clip keywords (attack → "attack"/"bite", flee → "run"/"gallop", wander → "walk"). The renderer lerps positions between sim ticks for smooth motion and writes per-instance custom data each frame. Zero per-frame CPU on the animation path. +At runtime, `AnimalHerdRenderer` spawns one `skinned_herd` per animal type. The herd bakes the palette from the model's clips. Animation logic maps sim FSM states to clip keywords (attack → attack/bite, flee → run/gallop, wander → walk). The renderer lerps positions between sim ticks for smooth motion and writes per-instance custom data each frame. Zero per-frame CPU on the animation path. ## Where we stand vs the industry From 5997a5a56f2095e2303f26fe4883bdb5654d2e0d Mon Sep 17 00:00:00 2001 From: Ozan Bozkurt Date: Mon, 15 Jun 2026 22:58:06 +0100 Subject: [PATCH 8/8] =?UTF-8?q?post:=20only=20claim=20shipped=20tiers=20?= =?UTF-8?q?=E2=80=94=20crowd=20+=20hero,=20not=20impostor?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- gpu-skinned-herds.html | 10 +++++----- posts/gpu-skinned-herds.md | 11 ++++++----- 2 files changed, 11 insertions(+), 10 deletions(-) diff --git a/gpu-skinned-herds.html b/gpu-skinned-herds.html index a71d73d..c9ebabd 100644 --- a/gpu-skinned-herds.html +++ b/gpu-skinned-herds.html @@ -373,13 +373,13 @@ NORMAL = normalize((skin * vec4(NORMAL, 0.0)).xyz);

    At runtime, AnimalHerdRenderer spawns one skinned_herd per animal type. The herd bakes the palette from the model's clips. Animation logic maps sim FSM states to clip keywords (attack → attack/bite, flee → run/gallop, wander → walk). The renderer lerps positions between sim ticks for smooth motion and writes per-instance custom data each frame. Zero per-frame CPU on the animation path.

    Where we stand vs the industry

    The bone-matrix palette technique is the same architecture used by Assassin's Creed Unity, Total War: Warhammer, and Hitman for their crowd systems. We're using the same core idea, in a Godot fork, with smaller VRAM — our low-poly animals keep textures tiny.

    -

    The platform supports three tiers by distance, all driven by the same (clip, count, speed, phase) packet:

    +

    Stock Godot has no answer for this. Skeleton3D per character caps at ~20. MultiMesh can't skin. There is no built-in crowd animation path.

    +

    The platform runs two tiers by distance, driven by the same (clip, count, speed, phase) packet:

      -
    • Crowd tier (palette) — baked poses, GPU-driven, zero CPU. Thousands of agents.
    • -
    • Hero tier (real rigs)AnimationTree + SkeletonIK3D + PhysicalBone3D for the nearest few. Smooth gait blends, foot-lock, look-at, ragdoll.
    • -
    • Impostor tier (2D billboards) — sprite atlas indexed by view-angle and animation-frame. For very far agents.
    • +
    • Crowd tier (palette) — baked poses, GPU-driven, zero CPU. Thousands of agents in one draw call per type.
    • +
    • Hero tier (real rigs) — the nearest few agents get real Skeleton3D + AnimationTree + IK. Smooth crossfades, head look-at, ragdoll. Hidden from the palette so they don't double-render.
    -

    One abstraction, three detail levels. The same code that drives 30 animals today will drive thousands of colonists at launch.

    +

    Same code drives 30 animals today. Same code will drive thousands of colonists at launch.

    What's deliberately not here

    • No C# wrapper. Instantiate from GDScript via ClassDB.instantiate() — the binding surface is small and stable.
    • diff --git a/posts/gpu-skinned-herds.md b/posts/gpu-skinned-herds.md index 5fbb380..5097f5a 100644 --- a/posts/gpu-skinned-herds.md +++ b/posts/gpu-skinned-herds.md @@ -157,12 +157,13 @@ At runtime, `AnimalHerdRenderer` spawns one `skinned_herd` per animal type. The The bone-matrix palette technique is the same architecture used by Assassin's Creed Unity, Total War: Warhammer, and Hitman for their crowd systems. We're using the same core idea, in a Godot fork, with smaller VRAM — our low-poly animals keep textures tiny. -The platform supports three tiers by distance, all driven by the same `(clip, count, speed, phase)` packet: -- **Crowd tier (palette)** — baked poses, GPU-driven, zero CPU. Thousands of agents. -- **Hero tier (real rigs)** — `AnimationTree` + `SkeletonIK3D` + `PhysicalBone3D` for the nearest few. Smooth gait blends, foot-lock, look-at, ragdoll. -- **Impostor tier (2D billboards)** — sprite atlas indexed by view-angle and animation-frame. For very far agents. +Stock Godot has no answer for this. `Skeleton3D` per character caps at ~20. `MultiMesh` can't skin. There is no built-in crowd animation path. -One abstraction, three detail levels. The same code that drives 30 animals today will drive thousands of colonists at launch. +The platform runs two tiers by distance, driven by the same `(clip, count, speed, phase)` packet: +- **Crowd tier (palette)** — baked poses, GPU-driven, zero CPU. Thousands of agents in one draw call per type. +- **Hero tier (real rigs)** — the nearest few agents get real `Skeleton3D` + `AnimationTree` + IK. Smooth crossfades, head look-at, ragdoll. Hidden from the palette so they don't double-render. + +Same code drives 30 animals today. Same code will drive thousands of colonists at launch. ## What's deliberately not here