Files
blog/gpu-driven-crowd-animation.html
T

393 lines
20 KiB
HTML

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Zero-CPU Crowd Animation: How We Made 1,000 Animals Animate Without a Single Skeleton — Tinqs Blog</title>
<meta name="description" content="Yesterday we shipped a GPU herd renderer that used one live skeleton per animal state. Today we ripped out every live skeleton and made the GPU drive all animation itself — 1,000 agents at 60 FPS, zero per-frame CPU cost, each with its own clip, speed, and phase.">
<meta name="robots" content="index, follow">
<link rel="canonical" href="https://www.tinqs.com/blog/gpu-driven-crowd-animation">
<meta property="og:type" content="article">
<meta property="og:url" content="https://www.tinqs.com/blog/gpu-driven-crowd-animation">
<meta property="og:title" content="Zero-CPU Crowd Animation: How We Made 1,000 Animals Animate Without a Single Skeleton">
<meta property="og:description" content="1,000 animated agents, zero live skeletons, zero per-frame CPU. Our GPU-driven crowd animation platform in the Tinqs Engine fork.">
<meta property="og:image" content="https://www.tinqs.com/img/og-cover.jpg">
<meta name="twitter:card" content="summary_large_image">
<meta name="twitter:title" content="Zero-CPU Crowd Animation: How We Made 1,000 Animals Animate Without a Single Skeleton">
<meta name="twitter:description" content="1,000 animated agents, zero live skeletons, zero per-frame CPU. Our GPU-driven crowd animation platform in the Tinqs Engine fork.">
<meta name="twitter:image" content="https://www.tinqs.com/img/og-cover.jpg">
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "BlogPosting",
"headline": "Zero-CPU Crowd Animation: How We Made 1,000 Animals Animate Without a Single Skeleton",
"datePublished": "2026-06-15",
"author": {
"@type": "Person",
"name": "Ozan Bozkurt"
},
"publisher": {
"@type": "Organization",
"name": "Tinqs Limited",
"url": "https://www.tinqs.com"
},
"description": "Yesterday we shipped a GPU herd renderer that used one live skeleton per animal state. Today we ripped out every live skeleton and made the GPU drive all animation itself — 1,000 agents at 60 FPS, zero per-frame CPU cost, each with its own clip, speed, and phase."
}
</script>
<link rel="preconnect" href="https://fonts.googleapis.com">
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
<link href="https://fonts.googleapis.com/css2?family=Space+Grotesk:wght@400;500;600;700&family=JetBrains+Mono:wght@400;500&family=Inter:wght@400;500;600;700&display=swap" rel="stylesheet">
<style>
/* ── Tinqs Studio brand — post styles ── */
:root {
/* Studio near-black base */
--c-bg: #0B0C0E;
--c-bg-raised: #15171A;
/* Foreground */
--c-fg: #ECEEF1;
--c-muted: #8A95A3;
/* Family accents */
--c-lime: #B6FF3C;
--c-violet: #7C5CFF;
/* Borders */
--c-border: rgba(255,255,255,.07);
--c-border-strong: rgba(255,255,255,.12);
}
*, *::before, *::after { box-sizing: border-box; }
html { background: var(--c-bg); }
body {
margin: 0;
padding: 0;
background: var(--c-bg);
color: var(--c-fg);
font-family: 'Inter', system-ui, -apple-system, sans-serif;
font-size: 16px;
line-height: 1.6;
-webkit-font-smoothing: antialiased;
}
/* ── Post container ── */
.post {
background: var(--c-bg);
max-width: 720px;
margin: 0 auto;
padding: 48px 24px 60px;
}
/* ── Back link ── */
.post__back {
color: var(--c-muted);
text-decoration: none;
font-size: 0.875rem;
display: inline-block;
margin-bottom: 24px;
transition: color 0.15s;
}
.post__back:hover { color: var(--c-lime); }
/* ── Gradient title — lime → violet ── */
.post__title {
font-family: 'Space Grotesk', system-ui, -apple-system, sans-serif;
background: linear-gradient(90deg, var(--c-lime), var(--c-violet));
-webkit-background-clip: text;
background-clip: text;
color: transparent;
font-weight: 700;
font-size: 2.2rem;
line-height: 1.2;
margin: 0 0 16px;
}
/* ── Date pill ── */
.post__date {
display: inline-block;
font-family: 'JetBrains Mono', ui-monospace, 'SF Mono', Consolas, monospace;
font-size: 0.72rem;
letter-spacing: 0.18em;
text-transform: uppercase;
color: var(--c-muted);
border: 1px solid var(--c-border);
border-radius: 999px;
padding: 4px 14px;
margin-bottom: 16px;
}
/* ── Lead ── */
.post__lead {
color: var(--c-muted);
font-size: 1.08rem;
line-height: 1.7;
}
/* ── Body ── */
.post__body { font-size: 1rem; line-height: 1.7; }
.post__body p { margin: 14px 0; }
.post__body h2 {
font-family: 'Space Grotesk', system-ui, -apple-system, sans-serif;
font-weight: 600;
font-size: 1.6rem;
margin: 54px 0 6px;
padding-left: 16px;
border-left: 4px solid var(--c-lime);
line-height: 1.3;
}
.post__body h3 {
font-family: 'Space Grotesk', system-ui, -apple-system, sans-serif;
font-weight: 500;
color: var(--c-violet);
font-size: 1.15rem;
margin: 30px 0 4px;
}
.post__body h4, .post__body h5, .post__body h6 {
margin: 20px 0 4px;
}
/* ── Inline code ── */
.post__body code {
font-family: 'JetBrains Mono', ui-monospace, 'SF Mono', Consolas, monospace;
font-size: 0.84em;
background: var(--c-bg-raised);
color: var(--c-lime);
padding: 2px 6px;
border-radius: 4px;
border: 1px solid var(--c-border);
}
/* ── Code blocks ── */
.post__body pre {
background: var(--c-bg);
border: 1px solid var(--c-border);
border-radius: 8px;
padding: 16px 18px;
overflow-x: auto;
margin: 14px 0;
font-family: 'JetBrains Mono', ui-monospace, 'SF Mono', Consolas, monospace;
font-size: 0.83rem;
line-height: 1.55;
color: var(--c-fg);
}
.post__body pre code {
background: transparent;
padding: 0;
border: none;
font-size: inherit;
color: inherit;
border-radius: 0;
}
/* ── Blockquote ── */
.post__body blockquote {
background: rgba(124, 92, 255, 0.06);
border: 1px solid rgba(124, 92, 255, 0.15);
border-left: 4px solid var(--c-violet);
border-radius: 0 8px 8px 0;
padding: 16px 18px;
margin: 18px 0;
color: var(--c-fg);
font-size: 0.94rem;
}
/* ── Links ── */
.post__body a { color: var(--c-lime); text-decoration: underline; text-underline-offset: 3px; }
.post__body a:hover { color: var(--c-violet); }
/* ── Strong ── */
.post__body strong { color: var(--c-lime); font-weight: 600; }
/* ── HR ── */
.post__body hr {
border: none;
border-top: 1px solid var(--c-border);
margin: 32px 0;
}
/* ── Figures ── */
.post__body figure { margin: 20px 0; }
.post__body figure img {
max-width: 100%;
border-radius: 12px;
border: 1px solid var(--c-border);
}
.post__body figcaption {
color: var(--c-muted);
font-size: 0.85rem;
margin-top: 6px;
}
/* ── Lists ── */
.post__body ul, .post__body ol { padding-left: 1.5em; margin: 10px 0; }
.post__body li { margin: 4px 0; }
/* ── Author ── */
.post__author {
display: flex;
align-items: center;
gap: 14px;
margin-top: 48px;
padding-top: 24px;
border-top: 1px solid var(--c-border);
}
.post__author-avatar {
width: 48px;
height: 48px;
border-radius: 50%;
background: var(--c-violet);
color: #fff;
display: flex;
align-items: center;
justify-content: center;
font-weight: 700;
font-size: 0.85rem;
flex-shrink: 0;
}
.post__author-info {
font-size: 0.85rem;
color: var(--c-muted);
line-height: 1.4;
}
.post__author-name {
color: var(--c-fg);
font-weight: 600;
}
</style>
</head>
<body>
<!-- POST -->
<article class="post">
<a href="/blog/" class="post__back">&larr; All Posts</a>
<span class="post__date">15 June 2026</span>
<h1 class="post__title">Zero-CPU Crowd Animation: How We Made 1,000 Animals Animate Without a Single Skeleton</h1>
<p class="post__lead">Yesterday we <a href="gpu-skinned-herds" style="color: var(--c-lime);">shipped a GPU herd renderer</a> that draws 1,000 skinned animals in a handful of draw calls. It worked — 25 crocodiles confirmed, 1,000 animals projected. But it had a quiet cost: <strong>one live skeleton per animal state per type.</strong> For 30 types with 5 states each, that's 150 <code>Skeleton3D</code> nodes — each with an <code>AnimationPlayer</code>, each pushing bone matrices to the GPU every frame. The GPU was fast, but the CPU was doing real work.</p>
<div class="post__body">
<p>Today we ripped out every live skeleton. The CPU now does <strong>zero per-frame animation work.</strong> 1,000 animals at 60 FPS. Each plays its own clip at its own speed and phase — no lockstep, no copy-paste poses. Here's how.</p>
<h2>The problem: lockstep costs CPU</h2>
<p>The original <code>agent_skinned</code> module worked by <strong>sharing a live skeleton.</strong> One driver <code>Skeleton3D</code> animated, and its pose was pushed to every instance in the herd. For variation across states (walking vs idle vs attacking), you needed one herd per state — each with its own driver skeleton.</p>
<pre><code>30 animal types × 5 states = 150 live skeletons on the CPU</code></pre>
<p>Each skeleton: compute <code>global_pose</code> for every bone, run an <code>AnimationPlayer.process()</code>, push matrices into the data plane, upload the dirty texture region. The cost tracked <strong>herd count</strong>, not instance count. At 1,000 animals: ~25 FPS. At 10,000: the system crumbles.</p>
<p>The fix sounds obvious in retrospect: <strong>the GPU should compute the poses, not the CPU.</strong> Bake every animation frame into a texture once, and let each instance's vertex shader figure out which frame to sample.</p>
<h2>The bake: one texture per character type, done once</h2>
<p>At load time, the <code>skinned_herd.gd</code> backend plays every animation clip on a temporary <code>Skeleton3D</code> and records the bone matrices for every frame into the data plane. A Goat with 9 clips at 30 fps produces 496 frames. Each frame is one row in the bone-matrix texture:</p>
<pre><code>Goat: 53 bones × 496 frames = 26,288 bone matrices
Texture: 212 × 496 pixels, RGBA32F
VRAM: 212 × 496 × 16 bytes = 1.6 MB</code></pre>
<p>That's the ENTIRE animation data for a Goat — walk, run, idle, attack, death, eat, sleep — every frame of every clip, in 1.6 MB. The bake takes a few milliseconds. After that, the skeleton is destroyed. It never runs again.</p>
<p>For 30 animal types: ~48 MB total. Compare this to vertex animation textures (VAT): the same Goat would need 2,500 vertices × 496 frames × 12 bytes = <strong>14.2 MB per type, 426 MB total.</strong> Bone-matrix is 9× smaller because bones ≪ vertices.</p>
<h2>The GPU: per-instance playback, zero CPU</h2>
<p>Each MultiMesh instance carries 4 numbers in <code>INSTANCE_CUSTOM</code>:</p>
<p>| Channel | Meaning |</p>
<p>|&mdash;&mdash;&mdash;|&mdash;&mdash;&mdash;|</p>
<p>| <code>.x</code> | Which clip (start row in the palette) |</p>
<p>| <code>.y</code> | How many frames in this clip |</p>
<p>| <code>.z</code> | Playback rate (baked-fps × ground speed) |</p>
<p>| <code>.w</code> | Phase offset (0..1, golden-ratio spread) |</p>
<p>The vertex shader derives each instance's current frame from TIME:</p>
<pre><code class="language-glsl">float fcount = max(INSTANCE_CUSTOM.y, 1.0);
int start = int(INSTANCE_CUSTOM.x + 0.5);
float fpos = mod(TIME * INSTANCE_CUSTOM.z + INSTANCE_CUSTOM.w * fcount, fcount);
int f0 = int(fpos);
int f1 = int(mod(float(f0) + 1.0, fcount));
float fr = fpos - float(f0);
// Blend between two adjacent baked frames for smooth playback at low bake fps
int r0 = start + f0;
int r1 = start + f1;
mat4 m0 = mat4(
texelFetch(bone_matrices_tex, ivec2(px+0, r0), 0),
texelFetch(bone_matrices_tex, ivec2(px+1, r0), 0),
texelFetch(bone_matrices_tex, ivec2(px+2, r0), 0),
texelFetch(bone_matrices_tex, ivec2(px+3, r0), 0));
mat4 m1 = mat4( /* same for r1 */ );
skin += (m0 * (1.0 - fr) + m1 * fr) * weight;</code></pre>
<p>That's it. The CPU does nothing per frame. No skeletons. No <code>AnimationPlayer</code>. No per-instance push. Every instance computes its own frame from TIME + its custom data. A walking Boar, a running Boar, and an idle Boar all share the same baked palette — they just point at different rows.</p>
<h2>What changed in the engine</h2>
<p>The shader needed one critical change: the bone-matrix texture went from being indexed by <code>INSTANCE_ID</code> (one row per instance) to being indexed by a <strong>pose slot</strong> computed from <code>INSTANCE_CUSTOM</code> (one row per baked frame). The old code:</p>
<pre><code class="language-glsl">int inst = INSTANCE_ID; // row = instance index → lockstep</code></pre>
<p>Became:</p>
<pre><code class="language-glsl">int r0 = start + f0; // row = palette row from clip + frame → per-instance variety</code></pre>
<p>This is a 40-line shader change in the engine's <code>multi_skinned_instance_3d.cpp</code>. It's backward-compatible — slot 0 still works for the old lockstep path (which airborne bird flocks use intentionally — synchronized flapping is a feature, not a bug).</p>
<p>Engine version bumped from 4.6.4 to <strong>4.6.5</strong>.</p>
<h2>The numbers (measured, not projected)</h2>
<p>On an M1 Pro MacBook Pro (integrated GPU):</p>
<p>| Agent count | Old lockstep (4.6.4) | GPU-driven palette (4.6.5) |</p>
<p>|&mdash;&mdash;&mdash;&mdash;|&mdash;&mdash;&mdash;&mdash;&mdash;&mdash;&mdash;-|&mdash;&mdash;&mdash;&mdash;&mdash;&mdash;&mdash;&mdash;&mdash;-|</p>
<p>| 100 | ~40 FPS | <strong>60 FPS</strong> |</p>
<p>| 500 | 3139 FPS | <strong>60 FPS</strong> |</p>
<p>| 1,000 | ~25 FPS | <strong>60 FPS</strong> |</p>
<p>| 10,000 | untested | 8 FPS (unoptimized) |</p>
<p>The 10,000 number is low because we haven't done the one-herd-per-type optimization yet — 292 herds vs the planned ~30. And our distance culling still runs on the CPU (MultiMesh has no built-in culling). Both are in the roadmap.</p>
<p><strong>VRAM:</strong> 1.6 MB per animal type. 30 types = 48 MB total. A Steam Deck with 1 GB shared memory handles this comfortably. The VAT alternative would need 426 MB — nine times more.</p>
<p><strong>Draw calls:</strong> Currently ~158 (one per type × state, the lockstep holdover). After collapsing to one herd per type: ~30. After sharing palettes for rig-reuse animals: even fewer.</p>
<h2>The bug that made everything invisible</h2>
<p>The first build rendered nothing. Animals were "visible" (instance count correct), custom data correct, shader compiled, texture valid — but the screen was empty. FPS was 60 because it was drawing nothing.</p>
<p>Root cause: a <code>renderer.refresh()</code> call during setup raced the renderer's own <code>NOTIFICATION_READY</code> handler, which re-bound the shader's <code>bone_matrices_tex</code> uniform — overwriting our baked texture with an unbound (default white) one. The shader sampled white → every bone matrix was identity → the mesh collapsed to a point at origin → invisible.</p>
<p>Fix: bind the texture once on the <strong>first <code>_process</code> frame</strong>, after all nodes have had their <code>_ready</code> called. Then never touch it again. One deferred bind, zero per-frame cost. This is a classic Godot <code>_ready</code> sequencing gotcha.</p>
<h2>Where this puts us vs AAA</h2>
<p>The technique — baking bone matrices into a texture and letting the GPU drive per-instance animation — is the same architecture used by Assassin's Creed Unity, Total War: Warhammer, and Hitman for their crowd systems. We're using the same core idea, in a Godot fork, targeting a fraction of the VRAM.</p>
<p>What AAA has that we don't (yet):</p>
<ul>
<li><strong>LOD tiers</strong> — far agents become 2D impostors (billboard quads with a sprite atlas). Same <code>(clip, frame, speed, phase)</code> packet drives all tiers.</li>
<li><strong>Hero rigs</strong> — the nearest few agents get real <code>Skeleton3D</code> + <code>AnimationTree</code> + IK + ragdoll. Smooth gait blends, foot-lock, look-at.</li>
<li><strong>Offline bake pipeline</strong> — precompute palettes in the asset build, not at load time.</li>
<li><strong>GPU compute culling</strong> — frustum + distance + LOD classification on the GPU, no CPU cull loop.</li>
</ul>
<p>These are planned and designed (the platform doc is at <code>ariki-sim/wiki/plans/crowd-animation-platform-2026-06-15.md</code>), but not built yet. The foundation — the GPU-driven baked palette — is what makes all of them possible.</p>
<h2>The fork question</h2>
<p>Every time we change the engine, someone asks: "couldn't you do this without a fork?" For this feature, the answer is no — not without significant compromises. The alternatives:</p>
<ul>
<li><strong>VAT (vertex animation textures) with a Godot plugin:</strong> Works in stock Godot, but VRAM is 9× larger. For 30 animal types: 426 MB vs our 48 MB. For 5 colonist looks: 620 MB — doesn't fit on a Steam Deck. VAT also can't blend frames (hard cuts between baked frames, no smooth playback) and can't skin normals/tangents (incorrect lighting).</li>
</ul>
<ul>
<li><strong>Phase-offset drivers only:</strong> Keep the live skeletons but stagger their phases. Gives some variety, but still has N live skeletons on the CPU. Doesn't scale to thousands of colonists.</li>
</ul>
<ul>
<li><strong>Don't do crowds:</strong> The simplest answer. But Ariki needs animals and colonists. The architecture decision was made: we forked Godot to own the renderer, and this is exactly the kind of feature that justifies the fork.</li>
</ul>
<h2>What's next</h2>
<p>The 4-item immediate roadmap:</p>
<p>1. <strong>One herd per type</strong> — collapse ~158 herds to ~30 (remove the per-state batching from the lockstep era)</p>
<p>2. <strong>Distance LOD</strong> — CPU-side cull + cheaper-far shader for far instances</p>
<p>3. <strong>RGBA16F + offline bake</strong> — half the VRAM, zero load-time hitch</p>
<p>4. <strong>Hero rigs</strong> — real <code>AnimationTree</code> + IK + ragdoll for the nearest few animals</p>
<p>The far horizon: animated 2D impostors and GPU compute-cull, designed and parked. Brought forward when the load demands them.</p>
<p>The engine source lives in <a href="https://tinqs.com/tinqs/engine" style="color: var(--c-lime);"><code>tinqs/engine</code></a> (private). Pre-built editor binaries at <a href="https://tinqs.com/tinqs/builds" style="color: var(--c-lime);"><code>tinqs/builds</code></a>. The Ariki game is at <a href="https://www.arikigame.com" style="color: var(--c-lime);">arikigame.com</a>.</p>
<hr>
<p><strong>Related:</strong> <a href="gpu-skinned-herds" style="color: var(--c-lime);">GPU-Skinned Herds</a> — the original herd renderer (yesterday's post). <a href="fork-dont-build" style="color: var(--c-lime);">Fork, Don't Build</a> — why we modify existing platforms instead of building new ones. <a href="godot-optimisation" style="color: var(--c-lime);">Streaming a 12km Archipelago in Godot 4</a> — the terrain and vegetation layers that work alongside this.</p>
</div>
<div class="post__author">
<div class="post__author-avatar">OB</div>
<div class="post__author-info">
<span class="post__author-name">Ozan Bozkurt</span><br>
CTO & Developer, Tinqs
</div>
</div>
</article>
</body>
</html>