post: GPU-driven crowd animation — 1000 agents at 60 FPS, zero CPU
This commit is contained in:
@@ -0,0 +1,392 @@
|
|||||||
|
<!DOCTYPE html>
|
||||||
|
<html lang="en">
|
||||||
|
<head>
|
||||||
|
<meta charset="UTF-8">
|
||||||
|
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||||
|
|
||||||
|
<title>Zero-CPU Crowd Animation: How We Made 1,000 Animals Animate Without a Single Skeleton — Tinqs Blog</title>
|
||||||
|
<meta name="description" content="Yesterday we shipped a GPU herd renderer that used one live skeleton per animal state. Today we ripped out every live skeleton and made the GPU drive all animation itself — 1,000 agents at 60 FPS, zero per-frame CPU cost, each with its own clip, speed, and phase.">
|
||||||
|
<meta name="robots" content="index, follow">
|
||||||
|
<link rel="canonical" href="https://www.tinqs.com/blog/gpu-driven-crowd-animation">
|
||||||
|
|
||||||
|
<meta property="og:type" content="article">
|
||||||
|
<meta property="og:url" content="https://www.tinqs.com/blog/gpu-driven-crowd-animation">
|
||||||
|
<meta property="og:title" content="Zero-CPU Crowd Animation: How We Made 1,000 Animals Animate Without a Single Skeleton">
|
||||||
|
<meta property="og:description" content="1,000 animated agents, zero live skeletons, zero per-frame CPU. Our GPU-driven crowd animation platform in the Tinqs Engine fork.">
|
||||||
|
<meta property="og:image" content="https://www.tinqs.com/img/og-cover.jpg">
|
||||||
|
|
||||||
|
<meta name="twitter:card" content="summary_large_image">
|
||||||
|
<meta name="twitter:title" content="Zero-CPU Crowd Animation: How We Made 1,000 Animals Animate Without a Single Skeleton">
|
||||||
|
<meta name="twitter:description" content="1,000 animated agents, zero live skeletons, zero per-frame CPU. Our GPU-driven crowd animation platform in the Tinqs Engine fork.">
|
||||||
|
<meta name="twitter:image" content="https://www.tinqs.com/img/og-cover.jpg">
|
||||||
|
|
||||||
|
<script type="application/ld+json">
|
||||||
|
{
|
||||||
|
"@context": "https://schema.org",
|
||||||
|
"@type": "BlogPosting",
|
||||||
|
"headline": "Zero-CPU Crowd Animation: How We Made 1,000 Animals Animate Without a Single Skeleton",
|
||||||
|
"datePublished": "2026-06-15",
|
||||||
|
"author": {
|
||||||
|
"@type": "Person",
|
||||||
|
"name": "Ozan Bozkurt"
|
||||||
|
},
|
||||||
|
"publisher": {
|
||||||
|
"@type": "Organization",
|
||||||
|
"name": "Tinqs Limited",
|
||||||
|
"url": "https://www.tinqs.com"
|
||||||
|
},
|
||||||
|
"description": "Yesterday we shipped a GPU herd renderer that used one live skeleton per animal state. Today we ripped out every live skeleton and made the GPU drive all animation itself — 1,000 agents at 60 FPS, zero per-frame CPU cost, each with its own clip, speed, and phase."
|
||||||
|
}
|
||||||
|
</script>
|
||||||
|
|
||||||
|
<link rel="preconnect" href="https://fonts.googleapis.com">
|
||||||
|
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
|
||||||
|
<link href="https://fonts.googleapis.com/css2?family=Space+Grotesk:wght@400;500;600;700&family=JetBrains+Mono:wght@400;500&family=Inter:wght@400;500;600;700&display=swap" rel="stylesheet">
|
||||||
|
|
||||||
|
<style>
|
||||||
|
/* ── Tinqs Studio brand — post styles ── */
|
||||||
|
|
||||||
|
:root {
|
||||||
|
/* Studio near-black base */
|
||||||
|
--c-bg: #0B0C0E;
|
||||||
|
--c-bg-raised: #15171A;
|
||||||
|
/* Foreground */
|
||||||
|
--c-fg: #ECEEF1;
|
||||||
|
--c-muted: #8A95A3;
|
||||||
|
/* Family accents */
|
||||||
|
--c-lime: #B6FF3C;
|
||||||
|
--c-violet: #7C5CFF;
|
||||||
|
/* Borders */
|
||||||
|
--c-border: rgba(255,255,255,.07);
|
||||||
|
--c-border-strong: rgba(255,255,255,.12);
|
||||||
|
}
|
||||||
|
|
||||||
|
*, *::before, *::after { box-sizing: border-box; }
|
||||||
|
|
||||||
|
html { background: var(--c-bg); }
|
||||||
|
|
||||||
|
body {
|
||||||
|
margin: 0;
|
||||||
|
padding: 0;
|
||||||
|
background: var(--c-bg);
|
||||||
|
color: var(--c-fg);
|
||||||
|
font-family: 'Inter', system-ui, -apple-system, sans-serif;
|
||||||
|
font-size: 16px;
|
||||||
|
line-height: 1.6;
|
||||||
|
-webkit-font-smoothing: antialiased;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* ── Post container ── */
|
||||||
|
.post {
|
||||||
|
background: var(--c-bg);
|
||||||
|
max-width: 720px;
|
||||||
|
margin: 0 auto;
|
||||||
|
padding: 48px 24px 60px;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* ── Back link ── */
|
||||||
|
.post__back {
|
||||||
|
color: var(--c-muted);
|
||||||
|
text-decoration: none;
|
||||||
|
font-size: 0.875rem;
|
||||||
|
display: inline-block;
|
||||||
|
margin-bottom: 24px;
|
||||||
|
transition: color 0.15s;
|
||||||
|
}
|
||||||
|
.post__back:hover { color: var(--c-lime); }
|
||||||
|
|
||||||
|
/* ── Gradient title — lime → violet ── */
|
||||||
|
.post__title {
|
||||||
|
font-family: 'Space Grotesk', system-ui, -apple-system, sans-serif;
|
||||||
|
background: linear-gradient(90deg, var(--c-lime), var(--c-violet));
|
||||||
|
-webkit-background-clip: text;
|
||||||
|
background-clip: text;
|
||||||
|
color: transparent;
|
||||||
|
font-weight: 700;
|
||||||
|
font-size: 2.2rem;
|
||||||
|
line-height: 1.2;
|
||||||
|
margin: 0 0 16px;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* ── Date pill ── */
|
||||||
|
.post__date {
|
||||||
|
display: inline-block;
|
||||||
|
font-family: 'JetBrains Mono', ui-monospace, 'SF Mono', Consolas, monospace;
|
||||||
|
font-size: 0.72rem;
|
||||||
|
letter-spacing: 0.18em;
|
||||||
|
text-transform: uppercase;
|
||||||
|
color: var(--c-muted);
|
||||||
|
border: 1px solid var(--c-border);
|
||||||
|
border-radius: 999px;
|
||||||
|
padding: 4px 14px;
|
||||||
|
margin-bottom: 16px;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* ── Lead ── */
|
||||||
|
.post__lead {
|
||||||
|
color: var(--c-muted);
|
||||||
|
font-size: 1.08rem;
|
||||||
|
line-height: 1.7;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* ── Body ── */
|
||||||
|
.post__body { font-size: 1rem; line-height: 1.7; }
|
||||||
|
|
||||||
|
.post__body p { margin: 14px 0; }
|
||||||
|
|
||||||
|
.post__body h2 {
|
||||||
|
font-family: 'Space Grotesk', system-ui, -apple-system, sans-serif;
|
||||||
|
font-weight: 600;
|
||||||
|
font-size: 1.6rem;
|
||||||
|
margin: 54px 0 6px;
|
||||||
|
padding-left: 16px;
|
||||||
|
border-left: 4px solid var(--c-lime);
|
||||||
|
line-height: 1.3;
|
||||||
|
}
|
||||||
|
|
||||||
|
.post__body h3 {
|
||||||
|
font-family: 'Space Grotesk', system-ui, -apple-system, sans-serif;
|
||||||
|
font-weight: 500;
|
||||||
|
color: var(--c-violet);
|
||||||
|
font-size: 1.15rem;
|
||||||
|
margin: 30px 0 4px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.post__body h4, .post__body h5, .post__body h6 {
|
||||||
|
margin: 20px 0 4px;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* ── Inline code ── */
|
||||||
|
.post__body code {
|
||||||
|
font-family: 'JetBrains Mono', ui-monospace, 'SF Mono', Consolas, monospace;
|
||||||
|
font-size: 0.84em;
|
||||||
|
background: var(--c-bg-raised);
|
||||||
|
color: var(--c-lime);
|
||||||
|
padding: 2px 6px;
|
||||||
|
border-radius: 4px;
|
||||||
|
border: 1px solid var(--c-border);
|
||||||
|
}
|
||||||
|
|
||||||
|
/* ── Code blocks ── */
|
||||||
|
.post__body pre {
|
||||||
|
background: var(--c-bg);
|
||||||
|
border: 1px solid var(--c-border);
|
||||||
|
border-radius: 8px;
|
||||||
|
padding: 16px 18px;
|
||||||
|
overflow-x: auto;
|
||||||
|
margin: 14px 0;
|
||||||
|
font-family: 'JetBrains Mono', ui-monospace, 'SF Mono', Consolas, monospace;
|
||||||
|
font-size: 0.83rem;
|
||||||
|
line-height: 1.55;
|
||||||
|
color: var(--c-fg);
|
||||||
|
}
|
||||||
|
|
||||||
|
.post__body pre code {
|
||||||
|
background: transparent;
|
||||||
|
padding: 0;
|
||||||
|
border: none;
|
||||||
|
font-size: inherit;
|
||||||
|
color: inherit;
|
||||||
|
border-radius: 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* ── Blockquote ── */
|
||||||
|
.post__body blockquote {
|
||||||
|
background: rgba(124, 92, 255, 0.06);
|
||||||
|
border: 1px solid rgba(124, 92, 255, 0.15);
|
||||||
|
border-left: 4px solid var(--c-violet);
|
||||||
|
border-radius: 0 8px 8px 0;
|
||||||
|
padding: 16px 18px;
|
||||||
|
margin: 18px 0;
|
||||||
|
color: var(--c-fg);
|
||||||
|
font-size: 0.94rem;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* ── Links ── */
|
||||||
|
.post__body a { color: var(--c-lime); text-decoration: underline; text-underline-offset: 3px; }
|
||||||
|
.post__body a:hover { color: var(--c-violet); }
|
||||||
|
|
||||||
|
/* ── Strong ── */
|
||||||
|
.post__body strong { color: var(--c-lime); font-weight: 600; }
|
||||||
|
|
||||||
|
/* ── HR ── */
|
||||||
|
.post__body hr {
|
||||||
|
border: none;
|
||||||
|
border-top: 1px solid var(--c-border);
|
||||||
|
margin: 32px 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* ── Figures ── */
|
||||||
|
.post__body figure { margin: 20px 0; }
|
||||||
|
.post__body figure img {
|
||||||
|
max-width: 100%;
|
||||||
|
border-radius: 12px;
|
||||||
|
border: 1px solid var(--c-border);
|
||||||
|
}
|
||||||
|
|
||||||
|
.post__body figcaption {
|
||||||
|
color: var(--c-muted);
|
||||||
|
font-size: 0.85rem;
|
||||||
|
margin-top: 6px;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* ── Lists ── */
|
||||||
|
.post__body ul, .post__body ol { padding-left: 1.5em; margin: 10px 0; }
|
||||||
|
.post__body li { margin: 4px 0; }
|
||||||
|
|
||||||
|
/* ── Author ── */
|
||||||
|
.post__author {
|
||||||
|
display: flex;
|
||||||
|
align-items: center;
|
||||||
|
gap: 14px;
|
||||||
|
margin-top: 48px;
|
||||||
|
padding-top: 24px;
|
||||||
|
border-top: 1px solid var(--c-border);
|
||||||
|
}
|
||||||
|
|
||||||
|
.post__author-avatar {
|
||||||
|
width: 48px;
|
||||||
|
height: 48px;
|
||||||
|
border-radius: 50%;
|
||||||
|
background: var(--c-violet);
|
||||||
|
color: #fff;
|
||||||
|
display: flex;
|
||||||
|
align-items: center;
|
||||||
|
justify-content: center;
|
||||||
|
font-weight: 700;
|
||||||
|
font-size: 0.85rem;
|
||||||
|
flex-shrink: 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
.post__author-info {
|
||||||
|
font-size: 0.85rem;
|
||||||
|
color: var(--c-muted);
|
||||||
|
line-height: 1.4;
|
||||||
|
}
|
||||||
|
|
||||||
|
.post__author-name {
|
||||||
|
color: var(--c-fg);
|
||||||
|
font-weight: 600;
|
||||||
|
}
|
||||||
|
</style>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
|
||||||
|
<!-- POST -->
|
||||||
|
<article class="post">
|
||||||
|
<a href="/blog/" class="post__back">← All Posts</a>
|
||||||
|
<span class="post__date">15 June 2026</span>
|
||||||
|
<h1 class="post__title">Zero-CPU Crowd Animation: How We Made 1,000 Animals Animate Without a Single Skeleton</h1>
|
||||||
|
<p class="post__lead">Yesterday we <a href="gpu-skinned-herds" style="color: var(--c-lime);">shipped a GPU herd renderer</a> that draws 1,000 skinned animals in a handful of draw calls. It worked — 25 crocodiles confirmed, 1,000 animals projected. But it had a quiet cost: <strong>one live skeleton per animal state per type.</strong> For 30 types with 5 states each, that's 150 <code>Skeleton3D</code> nodes — each with an <code>AnimationPlayer</code>, each pushing bone matrices to the GPU every frame. The GPU was fast, but the CPU was doing real work.</p>
|
||||||
|
|
||||||
|
<div class="post__body">
|
||||||
|
<p>Today we ripped out every live skeleton. The CPU now does <strong>zero per-frame animation work.</strong> 1,000 animals at 60 FPS. Each plays its own clip at its own speed and phase — no lockstep, no copy-paste poses. Here's how.</p>
|
||||||
|
<h2>The problem: lockstep costs CPU</h2>
|
||||||
|
<p>The original <code>agent_skinned</code> module worked by <strong>sharing a live skeleton.</strong> One driver <code>Skeleton3D</code> animated, and its pose was pushed to every instance in the herd. For variation across states (walking vs idle vs attacking), you needed one herd per state — each with its own driver skeleton.</p>
|
||||||
|
<pre><code>30 animal types × 5 states = 150 live skeletons on the CPU</code></pre>
|
||||||
|
<p>Each skeleton: compute <code>global_pose</code> for every bone, run an <code>AnimationPlayer.process()</code>, push matrices into the data plane, upload the dirty texture region. The cost tracked <strong>herd count</strong>, not instance count. At 1,000 animals: ~25 FPS. At 10,000: the system crumbles.</p>
|
||||||
|
<p>The fix sounds obvious in retrospect: <strong>the GPU should compute the poses, not the CPU.</strong> Bake every animation frame into a texture once, and let each instance's vertex shader figure out which frame to sample.</p>
|
||||||
|
<h2>The bake: one texture per character type, done once</h2>
|
||||||
|
<p>At load time, the <code>skinned_herd.gd</code> backend plays every animation clip on a temporary <code>Skeleton3D</code> and records the bone matrices for every frame into the data plane. A Goat with 9 clips at 30 fps produces 496 frames. Each frame is one row in the bone-matrix texture:</p>
|
||||||
|
<pre><code>Goat: 53 bones × 496 frames = 26,288 bone matrices
|
||||||
|
Texture: 212 × 496 pixels, RGBA32F
|
||||||
|
VRAM: 212 × 496 × 16 bytes = 1.6 MB</code></pre>
|
||||||
|
<p>That's the ENTIRE animation data for a Goat — walk, run, idle, attack, death, eat, sleep — every frame of every clip, in 1.6 MB. The bake takes a few milliseconds. After that, the skeleton is destroyed. It never runs again.</p>
|
||||||
|
<p>For 30 animal types: ~48 MB total. Compare this to vertex animation textures (VAT): the same Goat would need 2,500 vertices × 496 frames × 12 bytes = <strong>14.2 MB per type, 426 MB total.</strong> Bone-matrix is 9× smaller because bones ≪ vertices.</p>
|
||||||
|
<h2>The GPU: per-instance playback, zero CPU</h2>
|
||||||
|
<p>Each MultiMesh instance carries 4 numbers in <code>INSTANCE_CUSTOM</code>:</p>
|
||||||
|
<p>| Channel | Meaning |</p>
|
||||||
|
<p>|———|———|</p>
|
||||||
|
<p>| <code>.x</code> | Which clip (start row in the palette) |</p>
|
||||||
|
<p>| <code>.y</code> | How many frames in this clip |</p>
|
||||||
|
<p>| <code>.z</code> | Playback rate (baked-fps × ground speed) |</p>
|
||||||
|
<p>| <code>.w</code> | Phase offset (0..1, golden-ratio spread) |</p>
|
||||||
|
<p>The vertex shader derives each instance's current frame from TIME:</p>
|
||||||
|
<pre><code class="language-glsl">float fcount = max(INSTANCE_CUSTOM.y, 1.0);
|
||||||
|
int start = int(INSTANCE_CUSTOM.x + 0.5);
|
||||||
|
float fpos = mod(TIME * INSTANCE_CUSTOM.z + INSTANCE_CUSTOM.w * fcount, fcount);
|
||||||
|
|
||||||
|
int f0 = int(fpos);
|
||||||
|
int f1 = int(mod(float(f0) + 1.0, fcount));
|
||||||
|
float fr = fpos - float(f0);
|
||||||
|
|
||||||
|
// Blend between two adjacent baked frames for smooth playback at low bake fps
|
||||||
|
int r0 = start + f0;
|
||||||
|
int r1 = start + f1;
|
||||||
|
|
||||||
|
mat4 m0 = mat4(
|
||||||
|
texelFetch(bone_matrices_tex, ivec2(px+0, r0), 0),
|
||||||
|
texelFetch(bone_matrices_tex, ivec2(px+1, r0), 0),
|
||||||
|
texelFetch(bone_matrices_tex, ivec2(px+2, r0), 0),
|
||||||
|
texelFetch(bone_matrices_tex, ivec2(px+3, r0), 0));
|
||||||
|
mat4 m1 = mat4( /* same for r1 */ );
|
||||||
|
|
||||||
|
skin += (m0 * (1.0 - fr) + m1 * fr) * weight;</code></pre>
|
||||||
|
<p>That's it. The CPU does nothing per frame. No skeletons. No <code>AnimationPlayer</code>. No per-instance push. Every instance computes its own frame from TIME + its custom data. A walking Boar, a running Boar, and an idle Boar all share the same baked palette — they just point at different rows.</p>
|
||||||
|
<h2>What changed in the engine</h2>
|
||||||
|
<p>The shader needed one critical change: the bone-matrix texture went from being indexed by <code>INSTANCE_ID</code> (one row per instance) to being indexed by a <strong>pose slot</strong> computed from <code>INSTANCE_CUSTOM</code> (one row per baked frame). The old code:</p>
|
||||||
|
<pre><code class="language-glsl">int inst = INSTANCE_ID; // row = instance index → lockstep</code></pre>
|
||||||
|
<p>Became:</p>
|
||||||
|
<pre><code class="language-glsl">int r0 = start + f0; // row = palette row from clip + frame → per-instance variety</code></pre>
|
||||||
|
<p>This is a 40-line shader change in the engine's <code>multi_skinned_instance_3d.cpp</code>. It's backward-compatible — slot 0 still works for the old lockstep path (which airborne bird flocks use intentionally — synchronized flapping is a feature, not a bug).</p>
|
||||||
|
<p>Engine version bumped from 4.6.4 to <strong>4.6.5</strong>.</p>
|
||||||
|
<h2>The numbers (measured, not projected)</h2>
|
||||||
|
<p>On an M1 Pro MacBook Pro (integrated GPU):</p>
|
||||||
|
<p>| Agent count | Old lockstep (4.6.4) | GPU-driven palette (4.6.5) |</p>
|
||||||
|
<p>|————|———————-|—————————-|</p>
|
||||||
|
<p>| 100 | ~40 FPS | <strong>60 FPS</strong> |</p>
|
||||||
|
<p>| 500 | 31–39 FPS | <strong>60 FPS</strong> |</p>
|
||||||
|
<p>| 1,000 | ~25 FPS | <strong>60 FPS</strong> |</p>
|
||||||
|
<p>| 10,000 | untested | 8 FPS (unoptimized) |</p>
|
||||||
|
<p>The 10,000 number is low because we haven't done the one-herd-per-type optimization yet — 292 herds vs the planned ~30. And our distance culling still runs on the CPU (MultiMesh has no built-in culling). Both are in the roadmap.</p>
|
||||||
|
<p><strong>VRAM:</strong> 1.6 MB per animal type. 30 types = 48 MB total. A Steam Deck with 1 GB shared memory handles this comfortably. The VAT alternative would need 426 MB — nine times more.</p>
|
||||||
|
<p><strong>Draw calls:</strong> Currently ~158 (one per type × state, the lockstep holdover). After collapsing to one herd per type: ~30. After sharing palettes for rig-reuse animals: even fewer.</p>
|
||||||
|
<h2>The bug that made everything invisible</h2>
|
||||||
|
<p>The first build rendered nothing. Animals were "visible" (instance count correct), custom data correct, shader compiled, texture valid — but the screen was empty. FPS was 60 because it was drawing nothing.</p>
|
||||||
|
<p>Root cause: a <code>renderer.refresh()</code> call during setup raced the renderer's own <code>NOTIFICATION_READY</code> handler, which re-bound the shader's <code>bone_matrices_tex</code> uniform — overwriting our baked texture with an unbound (default white) one. The shader sampled white → every bone matrix was identity → the mesh collapsed to a point at origin → invisible.</p>
|
||||||
|
<p>Fix: bind the texture once on the <strong>first <code>_process</code> frame</strong>, after all nodes have had their <code>_ready</code> called. Then never touch it again. One deferred bind, zero per-frame cost. This is a classic Godot <code>_ready</code> sequencing gotcha.</p>
|
||||||
|
<h2>Where this puts us vs AAA</h2>
|
||||||
|
<p>The technique — baking bone matrices into a texture and letting the GPU drive per-instance animation — is the same architecture used by Assassin's Creed Unity, Total War: Warhammer, and Hitman for their crowd systems. We're using the same core idea, in a Godot fork, targeting a fraction of the VRAM.</p>
|
||||||
|
<p>What AAA has that we don't (yet):</p>
|
||||||
|
<ul>
|
||||||
|
<li><strong>LOD tiers</strong> — far agents become 2D impostors (billboard quads with a sprite atlas). Same <code>(clip, frame, speed, phase)</code> packet drives all tiers.</li>
|
||||||
|
<li><strong>Hero rigs</strong> — the nearest few agents get real <code>Skeleton3D</code> + <code>AnimationTree</code> + IK + ragdoll. Smooth gait blends, foot-lock, look-at.</li>
|
||||||
|
<li><strong>Offline bake pipeline</strong> — precompute palettes in the asset build, not at load time.</li>
|
||||||
|
<li><strong>GPU compute culling</strong> — frustum + distance + LOD classification on the GPU, no CPU cull loop.</li>
|
||||||
|
</ul>
|
||||||
|
<p>These are planned and designed (the platform doc is at <code>ariki-sim/wiki/plans/crowd-animation-platform-2026-06-15.md</code>), but not built yet. The foundation — the GPU-driven baked palette — is what makes all of them possible.</p>
|
||||||
|
<h2>The fork question</h2>
|
||||||
|
<p>Every time we change the engine, someone asks: "couldn't you do this without a fork?" For this feature, the answer is no — not without significant compromises. The alternatives:</p>
|
||||||
|
<ul>
|
||||||
|
<li><strong>VAT (vertex animation textures) with a Godot plugin:</strong> Works in stock Godot, but VRAM is 9× larger. For 30 animal types: 426 MB vs our 48 MB. For 5 colonist looks: 620 MB — doesn't fit on a Steam Deck. VAT also can't blend frames (hard cuts between baked frames, no smooth playback) and can't skin normals/tangents (incorrect lighting).</li>
|
||||||
|
</ul>
|
||||||
|
<ul>
|
||||||
|
<li><strong>Phase-offset drivers only:</strong> Keep the live skeletons but stagger their phases. Gives some variety, but still has N live skeletons on the CPU. Doesn't scale to thousands of colonists.</li>
|
||||||
|
</ul>
|
||||||
|
<ul>
|
||||||
|
<li><strong>Don't do crowds:</strong> The simplest answer. But Ariki needs animals and colonists. The architecture decision was made: we forked Godot to own the renderer, and this is exactly the kind of feature that justifies the fork.</li>
|
||||||
|
</ul>
|
||||||
|
<h2>What's next</h2>
|
||||||
|
<p>The 4-item immediate roadmap:</p>
|
||||||
|
<p>1. <strong>One herd per type</strong> — collapse ~158 herds to ~30 (remove the per-state batching from the lockstep era)</p>
|
||||||
|
<p>2. <strong>Distance LOD</strong> — CPU-side cull + cheaper-far shader for far instances</p>
|
||||||
|
<p>3. <strong>RGBA16F + offline bake</strong> — half the VRAM, zero load-time hitch</p>
|
||||||
|
<p>4. <strong>Hero rigs</strong> — real <code>AnimationTree</code> + IK + ragdoll for the nearest few animals</p>
|
||||||
|
<p>The far horizon: animated 2D impostors and GPU compute-cull, designed and parked. Brought forward when the load demands them.</p>
|
||||||
|
<p>The engine source lives in <a href="https://tinqs.com/tinqs/engine" style="color: var(--c-lime);"><code>tinqs/engine</code></a> (private). Pre-built editor binaries at <a href="https://tinqs.com/tinqs/builds" style="color: var(--c-lime);"><code>tinqs/builds</code></a>. The Ariki game is at <a href="https://www.arikigame.com" style="color: var(--c-lime);">arikigame.com</a>.</p>
|
||||||
|
<hr>
|
||||||
|
<p><strong>Related:</strong> <a href="gpu-skinned-herds" style="color: var(--c-lime);">GPU-Skinned Herds</a> — the original herd renderer (yesterday's post). <a href="fork-dont-build" style="color: var(--c-lime);">Fork, Don't Build</a> — why we modify existing platforms instead of building new ones. <a href="godot-optimisation" style="color: var(--c-lime);">Streaming a 12km Archipelago in Godot 4</a> — the terrain and vegetation layers that work alongside this.</p>
|
||||||
|
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div class="post__author">
|
||||||
|
<div class="post__author-avatar">OB</div>
|
||||||
|
<div class="post__author-info">
|
||||||
|
<span class="post__author-name">Ozan Bozkurt</span><br>
|
||||||
|
CTO & Developer, Tinqs
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</article>
|
||||||
|
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
|
After
|
@@ -187,6 +187,13 @@
|
|||||||
<span class="blog-card__read">Read →</span>
|
<span class="blog-card__read">Read →</span>
|
||||||
</a>
|
</a>
|
||||||
|
|
||||||
|
<a href="gpu-driven-crowd-animation" class="blog-card">
|
||||||
|
<span class="blog-card__date">15 June 2026</span>
|
||||||
|
<h2 class="blog-card__title">Zero-CPU Crowd Animation: How We Made 1,000 Animals Animate Without a Single Skeleton</h2>
|
||||||
|
<p class="blog-card__excerpt">We rebuilt our crowd renderer to be fully GPU-driven — bake every animation frame into a bone-matrix palette once, then let each instance compute its own pose in the vertex shader. 1,000 animals: 60 FPS. CPU: idle. This is how AAA does crowds, and now it runs in our Godot fork.</p>
|
||||||
|
<span class="blog-card__read">Read →</span>
|
||||||
|
</a>
|
||||||
|
|
||||||
<a href="gpu-skinned-herds" class="blog-card">
|
<a href="gpu-skinned-herds" class="blog-card">
|
||||||
<span class="blog-card__date">14 June 2026</span>
|
<span class="blog-card__date">14 June 2026</span>
|
||||||
<h2 class="blog-card__title">GPU-Skinned Herds: One Draw Call for 1,000 Animated Characters in Godot</h2>
|
<h2 class="blog-card__title">GPU-Skinned Herds: One Draw Call for 1,000 Animated Characters in Godot</h2>
|
||||||
|
|||||||
|
Before
After
|
@@ -0,0 +1,160 @@
|
|||||||
|
---
|
||||||
|
title: "Zero-CPU Crowd Animation: How We Made 1,000 Animals Animate Without a Single Skeleton"
|
||||||
|
slug: gpu-driven-crowd-animation
|
||||||
|
date: "2026-06-15"
|
||||||
|
description: "Yesterday we shipped a GPU herd renderer that used one live skeleton per animal state. Today we ripped out every live skeleton and made the GPU drive all animation itself — 1,000 agents at 60 FPS, zero per-frame CPU cost, each with its own clip, speed, and phase."
|
||||||
|
og_description: "1,000 animated agents, zero live skeletons, zero per-frame CPU. Our GPU-driven crowd animation platform in the Tinqs Engine fork."
|
||||||
|
og_image: "https://www.tinqs.com/img/og-cover.jpg"
|
||||||
|
excerpt: "We rebuilt our crowd renderer to be fully GPU-driven — bake every animation frame into a bone-matrix palette once, then let each instance compute its own pose in the vertex shader. 1,000 animals: 60 FPS. CPU: idle. This is how AAA does crowds, and now it runs in our Godot fork."
|
||||||
|
author: "Ozan Bozkurt"
|
||||||
|
author_initials: "OB"
|
||||||
|
author_role: "CTO & Developer, Tinqs"
|
||||||
|
---
|
||||||
|
Yesterday we [shipped a GPU herd renderer](gpu-skinned-herds) that draws 1,000 skinned animals in a handful of draw calls. It worked — 25 crocodiles confirmed, 1,000 animals projected. But it had a quiet cost: **one live skeleton per animal state per type.** For 30 types with 5 states each, that's 150 `Skeleton3D` nodes — each with an `AnimationPlayer`, each pushing bone matrices to the GPU every frame. The GPU was fast, but the CPU was doing real work.
|
||||||
|
|
||||||
|
Today we ripped out every live skeleton. The CPU now does **zero per-frame animation work.** 1,000 animals at 60 FPS. Each plays its own clip at its own speed and phase — no lockstep, no copy-paste poses. Here's how.
|
||||||
|
|
||||||
|
## The problem: lockstep costs CPU
|
||||||
|
|
||||||
|
The original `agent_skinned` module worked by **sharing a live skeleton.** One driver `Skeleton3D` animated, and its pose was pushed to every instance in the herd. For variation across states (walking vs idle vs attacking), you needed one herd per state — each with its own driver skeleton.
|
||||||
|
|
||||||
|
```
|
||||||
|
30 animal types × 5 states = 150 live skeletons on the CPU
|
||||||
|
```
|
||||||
|
|
||||||
|
Each skeleton: compute `global_pose` for every bone, run an `AnimationPlayer.process()`, push matrices into the data plane, upload the dirty texture region. The cost tracked **herd count**, not instance count. At 1,000 animals: ~25 FPS. At 10,000: the system crumbles.
|
||||||
|
|
||||||
|
The fix sounds obvious in retrospect: **the GPU should compute the poses, not the CPU.** Bake every animation frame into a texture once, and let each instance's vertex shader figure out which frame to sample.
|
||||||
|
|
||||||
|
## The bake: one texture per character type, done once
|
||||||
|
|
||||||
|
At load time, the `skinned_herd.gd` backend plays every animation clip on a temporary `Skeleton3D` and records the bone matrices for every frame into the data plane. A Goat with 9 clips at 30 fps produces 496 frames. Each frame is one row in the bone-matrix texture:
|
||||||
|
|
||||||
|
```
|
||||||
|
Goat: 53 bones × 496 frames = 26,288 bone matrices
|
||||||
|
Texture: 212 × 496 pixels, RGBA32F
|
||||||
|
VRAM: 212 × 496 × 16 bytes = 1.6 MB
|
||||||
|
```
|
||||||
|
|
||||||
|
That's the ENTIRE animation data for a Goat — walk, run, idle, attack, death, eat, sleep — every frame of every clip, in 1.6 MB. The bake takes a few milliseconds. After that, the skeleton is destroyed. It never runs again.
|
||||||
|
|
||||||
|
For 30 animal types: ~48 MB total. Compare this to vertex animation textures (VAT): the same Goat would need 2,500 vertices × 496 frames × 12 bytes = **14.2 MB per type, 426 MB total.** Bone-matrix is 9× smaller because bones ≪ vertices.
|
||||||
|
|
||||||
|
## The GPU: per-instance playback, zero CPU
|
||||||
|
|
||||||
|
Each MultiMesh instance carries 4 numbers in `INSTANCE_CUSTOM`:
|
||||||
|
|
||||||
|
| Channel | Meaning |
|
||||||
|
|---------|---------|
|
||||||
|
| `.x` | Which clip (start row in the palette) |
|
||||||
|
| `.y` | How many frames in this clip |
|
||||||
|
| `.z` | Playback rate (baked-fps × ground speed) |
|
||||||
|
| `.w` | Phase offset (0..1, golden-ratio spread) |
|
||||||
|
|
||||||
|
The vertex shader derives each instance's current frame from TIME:
|
||||||
|
|
||||||
|
```glsl
|
||||||
|
float fcount = max(INSTANCE_CUSTOM.y, 1.0);
|
||||||
|
int start = int(INSTANCE_CUSTOM.x + 0.5);
|
||||||
|
float fpos = mod(TIME * INSTANCE_CUSTOM.z + INSTANCE_CUSTOM.w * fcount, fcount);
|
||||||
|
|
||||||
|
int f0 = int(fpos);
|
||||||
|
int f1 = int(mod(float(f0) + 1.0, fcount));
|
||||||
|
float fr = fpos - float(f0);
|
||||||
|
|
||||||
|
// Blend between two adjacent baked frames for smooth playback at low bake fps
|
||||||
|
int r0 = start + f0;
|
||||||
|
int r1 = start + f1;
|
||||||
|
|
||||||
|
mat4 m0 = mat4(
|
||||||
|
texelFetch(bone_matrices_tex, ivec2(px+0, r0), 0),
|
||||||
|
texelFetch(bone_matrices_tex, ivec2(px+1, r0), 0),
|
||||||
|
texelFetch(bone_matrices_tex, ivec2(px+2, r0), 0),
|
||||||
|
texelFetch(bone_matrices_tex, ivec2(px+3, r0), 0));
|
||||||
|
mat4 m1 = mat4( /* same for r1 */ );
|
||||||
|
|
||||||
|
skin += (m0 * (1.0 - fr) + m1 * fr) * weight;
|
||||||
|
```
|
||||||
|
|
||||||
|
That's it. The CPU does nothing per frame. No skeletons. No `AnimationPlayer`. No per-instance push. Every instance computes its own frame from TIME + its custom data. A walking Boar, a running Boar, and an idle Boar all share the same baked palette — they just point at different rows.
|
||||||
|
|
||||||
|
## What changed in the engine
|
||||||
|
|
||||||
|
The shader needed one critical change: the bone-matrix texture went from being indexed by `INSTANCE_ID` (one row per instance) to being indexed by a **pose slot** computed from `INSTANCE_CUSTOM` (one row per baked frame). The old code:
|
||||||
|
|
||||||
|
```glsl
|
||||||
|
int inst = INSTANCE_ID; // row = instance index → lockstep
|
||||||
|
```
|
||||||
|
|
||||||
|
Became:
|
||||||
|
|
||||||
|
```glsl
|
||||||
|
int r0 = start + f0; // row = palette row from clip + frame → per-instance variety
|
||||||
|
```
|
||||||
|
|
||||||
|
This is a 40-line shader change in the engine's `multi_skinned_instance_3d.cpp`. It's backward-compatible — slot 0 still works for the old lockstep path (which airborne bird flocks use intentionally — synchronized flapping is a feature, not a bug).
|
||||||
|
|
||||||
|
Engine version bumped from 4.6.4 to **4.6.5**.
|
||||||
|
|
||||||
|
## The numbers (measured, not projected)
|
||||||
|
|
||||||
|
On an M1 Pro MacBook Pro (integrated GPU):
|
||||||
|
|
||||||
|
| Agent count | Old lockstep (4.6.4) | GPU-driven palette (4.6.5) |
|
||||||
|
|------------|----------------------|----------------------------|
|
||||||
|
| 100 | ~40 FPS | **60 FPS** |
|
||||||
|
| 500 | 31–39 FPS | **60 FPS** |
|
||||||
|
| 1,000 | ~25 FPS | **60 FPS** |
|
||||||
|
| 10,000 | untested | 8 FPS (unoptimized) |
|
||||||
|
|
||||||
|
The 10,000 number is low because we haven't done the one-herd-per-type optimization yet — 292 herds vs the planned ~30. And our distance culling still runs on the CPU (MultiMesh has no built-in culling). Both are in the roadmap.
|
||||||
|
|
||||||
|
**VRAM:** 1.6 MB per animal type. 30 types = 48 MB total. A Steam Deck with 1 GB shared memory handles this comfortably. The VAT alternative would need 426 MB — nine times more.
|
||||||
|
|
||||||
|
**Draw calls:** Currently ~158 (one per type × state, the lockstep holdover). After collapsing to one herd per type: ~30. After sharing palettes for rig-reuse animals: even fewer.
|
||||||
|
|
||||||
|
## The bug that made everything invisible
|
||||||
|
|
||||||
|
The first build rendered nothing. Animals were "visible" (instance count correct), custom data correct, shader compiled, texture valid — but the screen was empty. FPS was 60 because it was drawing nothing.
|
||||||
|
|
||||||
|
Root cause: a `renderer.refresh()` call during setup raced the renderer's own `NOTIFICATION_READY` handler, which re-bound the shader's `bone_matrices_tex` uniform — overwriting our baked texture with an unbound (default white) one. The shader sampled white → every bone matrix was identity → the mesh collapsed to a point at origin → invisible.
|
||||||
|
|
||||||
|
Fix: bind the texture once on the **first `_process` frame**, after all nodes have had their `_ready` called. Then never touch it again. One deferred bind, zero per-frame cost. This is a classic Godot `_ready` sequencing gotcha.
|
||||||
|
|
||||||
|
## Where this puts us vs AAA
|
||||||
|
|
||||||
|
The technique — baking bone matrices into a texture and letting the GPU drive per-instance animation — is the same architecture used by Assassin's Creed Unity, Total War: Warhammer, and Hitman for their crowd systems. We're using the same core idea, in a Godot fork, targeting a fraction of the VRAM.
|
||||||
|
|
||||||
|
What AAA has that we don't (yet):
|
||||||
|
- **LOD tiers** — far agents become 2D impostors (billboard quads with a sprite atlas). Same `(clip, frame, speed, phase)` packet drives all tiers.
|
||||||
|
- **Hero rigs** — the nearest few agents get real `Skeleton3D` + `AnimationTree` + IK + ragdoll. Smooth gait blends, foot-lock, look-at.
|
||||||
|
- **Offline bake pipeline** — precompute palettes in the asset build, not at load time.
|
||||||
|
- **GPU compute culling** — frustum + distance + LOD classification on the GPU, no CPU cull loop.
|
||||||
|
|
||||||
|
These are planned and designed (the platform doc is at `ariki-sim/wiki/plans/crowd-animation-platform-2026-06-15.md`), but not built yet. The foundation — the GPU-driven baked palette — is what makes all of them possible.
|
||||||
|
|
||||||
|
## The fork question
|
||||||
|
|
||||||
|
Every time we change the engine, someone asks: "couldn't you do this without a fork?" For this feature, the answer is no — not without significant compromises. The alternatives:
|
||||||
|
|
||||||
|
- **VAT (vertex animation textures) with a Godot plugin:** Works in stock Godot, but VRAM is 9× larger. For 30 animal types: 426 MB vs our 48 MB. For 5 colonist looks: 620 MB — doesn't fit on a Steam Deck. VAT also can't blend frames (hard cuts between baked frames, no smooth playback) and can't skin normals/tangents (incorrect lighting).
|
||||||
|
|
||||||
|
- **Phase-offset drivers only:** Keep the live skeletons but stagger their phases. Gives some variety, but still has N live skeletons on the CPU. Doesn't scale to thousands of colonists.
|
||||||
|
|
||||||
|
- **Don't do crowds:** The simplest answer. But Ariki needs animals and colonists. The architecture decision was made: we forked Godot to own the renderer, and this is exactly the kind of feature that justifies the fork.
|
||||||
|
|
||||||
|
## What's next
|
||||||
|
|
||||||
|
The 4-item immediate roadmap:
|
||||||
|
1. **One herd per type** — collapse ~158 herds to ~30 (remove the per-state batching from the lockstep era)
|
||||||
|
2. **Distance LOD** — CPU-side cull + cheaper-far shader for far instances
|
||||||
|
3. **RGBA16F + offline bake** — half the VRAM, zero load-time hitch
|
||||||
|
4. **Hero rigs** — real `AnimationTree` + IK + ragdoll for the nearest few animals
|
||||||
|
|
||||||
|
The far horizon: animated 2D impostors and GPU compute-cull, designed and parked. Brought forward when the load demands them.
|
||||||
|
|
||||||
|
The engine source lives in [`tinqs/engine`](https://tinqs.com/tinqs/engine) (private). Pre-built editor binaries at [`tinqs/builds`](https://tinqs.com/tinqs/builds). The Ariki game is at [arikigame.com](https://www.arikigame.com).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Related:** [GPU-Skinned Herds](gpu-skinned-herds) — the original herd renderer (yesterday's post). [Fork, Don't Build](fork-dont-build) — why we modify existing platforms instead of building new ones. [Streaming a 12km Archipelago in Godot 4](godot-optimisation) — the terrain and vegetation layers that work alongside this.
|
||||||
Reference in New Issue
Block a user