<title>Zero-CPU Crowd Animation: 1,000 Animals, One Draw Call, No Skeletons — Tinqs Blog</title>
<metaname="description"content="We built a GPU-driven crowd animation platform into Tinqs Engine that renders 1,000 animated animals at 60 FPS with zero per-frame CPU cost. Each agent plays its own clip, speed, and phase — no live skeletons, no lockstep, no compromises.">
<metaproperty="og:title"content="Zero-CPU Crowd Animation: 1,000 Animals, One Draw Call, No Skeletons">
<metaproperty="og:description"content="1,000 animated agents, zero live skeletons, zero per-frame CPU. A GPU-driven crowd animation platform in the Tinqs Engine fork of Godot.">
<metaname="twitter:title"content="Zero-CPU Crowd Animation: 1,000 Animals, One Draw Call, No Skeletons">
<metaname="twitter:description"content="1,000 animated agents, zero live skeletons, zero per-frame CPU. A GPU-driven crowd animation platform in the Tinqs Engine fork of Godot.">
"description":"We built a GPU-driven crowd animation platform into Tinqs Engine that renders 1,000 animated animals at 60 FPS with zero per-frame CPU cost. Each agent plays its own clip, speed, and phase — no live skeletons, no lockstep, no compromises."
<h1class="post__title">Zero-CPU Crowd Animation: 1,000 Animals, One Draw Call, No Skeletons</h1>
<pclass="post__lead">Godot gives you one <code>Skeleton3D</code> per character. Want 200 animated animals? That's 200 skeleton nodes, 200 draw calls, and 200 <code>AnimationPlayer</code> ticks every frame. Want 1,000? You're measuring in seconds per frame.</p>
<p>We built a GPU-driven crowd animation platform into Tinqs Engine that doesn't use skeletons at all. It bakes every animation frame into a bone-matrix palette texture once, and the GPU drives every instance's playback from then on. 1,000 animals at 60 FPS on integrated graphics. Each plays its own clip at its own speed and phase. Zero per-frame CPU cost. This is how AAA engines do crowds — and now it runs in our Godot fork.</p>
<h2>Why not skeletons?</h2>
<p>The standard approach — one skeleton per character, one <code>AnimationPlayer</code>, one draw call — breaks at crowd scale. Computing <code>global_pose</code> for 1,000 skeletons at 60 bones each is 60,000 matrix multiplications per frame on the main thread. Each is its own draw call. Each <code>AnimationPlayer</code> ticks independently. No CPU can keep up.</p>
<p>Vertex animation textures (VAT) can solve this — bake every vertex position into a texture and sample it in the shader. But that stores <strong>vertices × frames</strong>, not bones × frames. A 2,500-vertex animal with 500 animation frames needs 14 MB of VAT data. For 30 animal types: 426 MB. That doesn't fit on a Steam Deck. And VAT can't blend frames for smooth playback, can't skin normals for correct lighting, and locks you into one animation per bake.</p>
<p>Our answer: <strong>bone-matrix palette.</strong> Bake every bone pose into a texture, keep the skinning in the shader. The GPU samples the bone matrices and skins the mesh itself — same 4-bone linear blend as a real skeleton, same correct normals and tangents. But the CPU never touches a bone.</p>
<h2>How it works</h2>
<p>At load time, we play every animation clip on a temporary skeleton and record the bone matrices for every frame into a single texture. A Goat with 9 clips at 30 fps produces 496 frames:</p>
<p>That's every frame of every clip — walk, run, idle, attack, death, eat, sleep — in 1.6 MB. Across 30 animal types: 48 MB total. Compare to VAT at 426 MB. Bone-matrix is 9× smaller because bones ≪ vertices.</p>
<p>After the bake, the skeleton is destroyed. It never runs again.</p>
<p>Each MultiMesh instance gets 4 numbers packed into <code>INSTANCE_CUSTOM</code>:</p>
<pre><code>.x = which clip (start row in the palette)
<p>The blend between two adjacent frames means we can bake at a low fps and stay smooth — the shader interpolates. The golden-ratio phase spread means every animal in a herd reads a different frame. One draw call per animal type. Zero CPU. Per-instance clip, speed, and phase — all in the GPU.</p>
<h2>The numbers</h2>
<p>Measured on an M1 Pro MacBook Pro (integrated GPU), not a desktop gaming rig:</p>
<p><strong>VRAM:</strong> 1.6 MB per animal type. 30 types = 48 MB total. A Steam Deck with 1 GB shared memory fits the entire roster with room for colonists, terrain, vegetation, and UI.</p>
<p><strong>Draw calls:</strong> One per animal type. 30 types = 30 draw calls for every animated animal on screen. Add colonists, same deal — one draw call per colonist look.</p>
<h2>The engine change</h2>
<p>The module lives in <code>modules/agent_skinned/</code> inside Tinqs Engine — our fork of Godot 4.6. The core is two classes:</p>
<p><strong><code>MultiSkinnedMeshInstance3D</code></strong> — the data plane. Holds the bone-matrix palette. API: <code>set_max_bones()</code>, <code>set_max_instances()</code>, <code>set_instance_pose_bones()</code>. At bake time, we fill one row per animation frame. At render time, it sits idle — the texture is static.</p>
<p><strong><code>MultiSkinnedInstance3D</code></strong> — the renderer. A <code>MultiMeshInstance3D</code> subclass. Points its multimesh at the skinned mesh and its <code>data_source_path</code> at the data plane. <code>refresh()</code> uploads the bone texture into the shader's uniform once. The MultiMesh handles instance transforms. The shader handles the rest.</p>
<p>The shader uses <code>INSTANCE_CUSTOM</code> to pick the palette row — not <code>INSTANCE_ID</code>. This is the key: the texture's rows are baked animation frames, not per-instance slots. Many instances share the same rows (a synchronized airborne flock) or each pick their own (a varied herd). One abstraction, two behaviors.</p>
<p>The engine change is 40 lines of shader code in <code>multi_skinned_instance_3d.cpp</code>. Engine version: <strong>4.6.5.</strong></p>
<h2>The production pipeline</h2>
<p>In Ariki, <code>AnimalHerdRenderer.cs</code> groups sim <code>ViewerState.animals</code> by type, feeds world positions and yaw rotations to <code>skinned_herd.gd</code> — the reusable per-type herd backend. The herd bakes the palette once at setup, then <code>set_positions()</code> updates transforms each sim tick. <code>set_clip_for_state()</code> switches the active clip block in the custom data when the sim FSM changes state (idle → walk → flee → attack). <code>set_speed_scale()</code> adjusts the per-instance playback rate to match ground speed — feet stay planted.</p>
<p>Bird flocks use the same system. <code>BirdFlock.cs</code> runs boid flocking on top of <code>skinned_herd</code>, sharing the palette with synchronized phases (airborne flapping in unison is intentional). 25 bird species migrated from the Low Poly Bird Ultimate Pack, each a single draw call.</p>
<p>The sim owns all behavior — 30 data-driven animals with per-animal senses, diet, combat stats, and FSM states. The client just renders. The same system will drive thousands of colonists at launch.</p>
<h2>Where we stand vs the industry</h2>
<p>The bone-matrix palette technique is the same architecture used by Assassin's Creed Unity, Total War: Warhammer, and Hitman for their crowd systems. We're using the same core idea, in a Godot fork, with smaller VRAM (our low-poly animals keep textures tiny).</p>
<p>The platform supports three tiers by distance:</p>
<li><strong>Crowd tier (palette)</strong> — baked poses, GPU-driven, zero CPU. Thousands of agents.</li>
<li><strong>Hero tier (real rigs)</strong> — <code>AnimationTree</code> + <code>SkeletonIK3D</code> + <code>PhysicalBone3D</code> for the nearest few. Smooth gait blends, foot-lock, look-at, ragdoll.</li>
<li><strong>Impostor tier (2D billboards)</strong> — sprite atlas indexed by view-angle and animation-frame, driven by the same <code>(clip, frame, speed, phase)</code> packet. For very far agents.</li>
<p>All builds at <ahref="https://tinqs.com/tinqs/builds"style="color: var(--c-lime);"><code>tinqs/builds</code></a>. Engine source at <ahref="https://tinqs.com/tinqs/engine"style="color: var(--c-lime);"><code>tinqs/engine</code></a> (private).</p>
<p>The game's <code>animal_perf_test.tscn</code> spawns 10/100/1,000/10,000 animals and reports live FPS. The <code>animal_viewer.tscn</code> lets you inspect any animal type, toggle clips, and switch between single and herd mode.</p>
<p><strong>Related:</strong><ahref="gpu-skinned-herds"style="color: var(--c-lime);">GPU-Skinned Herds</a> — the original <code>agent_skinned</code> module design. <ahref="fork-dont-build"style="color: var(--c-lime);">Fork, Don't Build</a> — why we modify existing platforms. <ahref="godot-optimisation"style="color: var(--c-lime);">Streaming a 12km Archipelago in Godot 4</a> — the terrain and vegetation layers.</p>