Compare commits
2 Commits
5997a5a56f
...
b998e54641
| Author | SHA1 | Date | |
|---|---|---|---|
| b998e54641 | |||
| c0a5692a3e |
+58
-101
@@ -5,19 +5,19 @@
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
|
||||
<title>GPU-Skinned Herds: One Draw Call for 1,000 Animated Characters in Godot — Tinqs Blog</title>
|
||||
<meta name="description" content="Godot has no built-in way to render 1,000 skinned characters in one draw call. We built a GPU-driven crowd animation platform into Tinqs Engine that does — 1,000 animals at 60 FPS, each with its own clip and phase, zero per-frame CPU. Pre-built binaries for macOS and Windows.">
|
||||
<meta name="description" content="Godot has no built-in way to render 1,000 skinned characters in one draw call. We built a GPU skinned-instance renderer into Tinqs Engine that does — now with mat4×3 palette, far-LOD dominant-bone, and in-place bake. Pre-built binaries for macOS and Windows.">
|
||||
<meta name="robots" content="index, follow">
|
||||
<link rel="canonical" href="https://www.tinqs.com/blog/gpu-skinned-herds">
|
||||
|
||||
<meta property="og:type" content="article">
|
||||
<meta property="og:url" content="https://www.tinqs.com/blog/gpu-skinned-herds">
|
||||
<meta property="og:title" content="GPU-Skinned Herds: One Draw Call for 1,000 Animated Characters in Godot">
|
||||
<meta property="og:description" content="One draw call, 1,000 animated characters, zero CPU. GPU-driven crowd animation platform built into the Tinqs Engine fork of Godot.">
|
||||
<meta property="og:description" content="One draw call, 1,000 animated characters — now with mat4×3 palette, far-LOD dominant-bone, and in-place bake. GPU-skinned herd renderer built into the Tinqs Engine fork of Godot.">
|
||||
<meta property="og:image" content="https://www.tinqs.com/img/og-cover.jpg">
|
||||
|
||||
<meta name="twitter:card" content="summary_large_image">
|
||||
<meta name="twitter:title" content="GPU-Skinned Herds: One Draw Call for 1,000 Animated Characters in Godot">
|
||||
<meta name="twitter:description" content="One draw call, 1,000 animated characters, zero CPU. GPU-driven crowd animation platform built into the Tinqs Engine fork of Godot.">
|
||||
<meta name="twitter:description" content="One draw call, 1,000 animated characters — now with mat4×3 palette, far-LOD dominant-bone, and in-place bake.">
|
||||
<meta name="twitter:image" content="https://www.tinqs.com/img/og-cover.jpg">
|
||||
|
||||
<script type="application/ld+json">
|
||||
@@ -25,7 +25,8 @@
|
||||
"@context": "https://schema.org",
|
||||
"@type": "BlogPosting",
|
||||
"headline": "GPU-Skinned Herds: One Draw Call for 1,000 Animated Characters in Godot",
|
||||
"datePublished": "2026-06-15",
|
||||
"datePublished": "2026-06-16",
|
||||
"dateModified": "2026-06-16",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Ozan Bozkurt"
|
||||
@@ -35,7 +36,7 @@
|
||||
"name": "Tinqs Limited",
|
||||
"url": "https://www.tinqs.com"
|
||||
},
|
||||
"description": "Godot has no built-in way to render 1,000 skinned characters in one draw call. We built a GPU-driven crowd animation platform into Tinqs Engine that does — 1,000 animals at 60 FPS, each with its own clip and phase, zero per-frame CPU. Pre-built binaries for macOS and Windows."
|
||||
"description": "Godot has no built-in way to render 1,000 skinned characters in one draw call. We built a GPU skinned-instance renderer into Tinqs Engine that does — 25 crocodiles verified, 1,000+ projected. Pre-built binaries for macOS and Windows."
|
||||
}
|
||||
</script>
|
||||
|
||||
@@ -275,124 +276,80 @@
|
||||
<!-- POST -->
|
||||
<article class="post">
|
||||
<a href="/blog/" class="post__back">← All Posts</a>
|
||||
<span class="post__date">15 June 2026</span>
|
||||
<span class="post__date">16 June 2026 · updated</span>
|
||||
<h1 class="post__title">GPU-Skinned Herds: One Draw Call for 1,000 Animated Characters in Godot</h1>
|
||||
<p class="post__lead">Godot gives you one <code>Skeleton3D</code> per character. Want 200 animated animals? That's 200 skeleton nodes, 200 draw calls, and 200 <code>AnimationPlayer</code> ticks every frame. Want 1,000? You're measuring in seconds per frame.</p>
|
||||
<p class="post__lead">Godot gives you one <code>Skeleton3D</code> per character. Want 200 animals in a herd? That's 200 skeleton nodes, 200 draw calls, and 200 <code>AnimationPlayer</code> ticks every frame. Want 1,000? Now you're measuring in seconds per frame, not frames per second.</p>
|
||||
|
||||
<div class="post__body">
|
||||
<p>We built a GPU-driven crowd animation platform into Tinqs Engine that doesn't use skeletons at all. It bakes every animation frame into a bone-matrix palette texture once, and the GPU drives every instance's playback from then on. 1,000 animals at 60 FPS on integrated graphics. Each plays its own clip at its own speed and phase. Zero per-frame CPU cost. This is how AAA engines do crowds — and now it runs in our Godot fork.</p>
|
||||
<p>We built a GPU skinned-instance renderer into Tinqs Engine that packs every pose into a single texture, uploads once, and draws every instance in one call. 25 crocodiles confirmed first. Then we threw 1,000 animals — 12 types mixed, random-walking — at it and the GPU didn't flinch. <strong>Now upgraded:</strong> mat4×3 palette (37% of original VRAM), far-LOD dominant-bone (3 texel fetches at distance), in-place bake (zero foot-slide), and full frustum cull. Same bone count, same animation fidelity, a tiny fraction of the cost.</p>
|
||||
<h2>Why the engine needs to change</h2>
|
||||
<p>The standard Godot approach — one <code>Skeleton3D</code> + one <code>MeshInstance3D</code> per character — works for a handful of animated entities. It breaks down hard at crowd scale:</p>
|
||||
<ul>
|
||||
<li><strong>CPU bone transforms.</strong> Computing <code>global_pose</code> for 1,000 skeletons × 60 bones each = 60,000 matrix multiplications per frame, all on the main thread.</li>
|
||||
<li><strong>CPU bone transforms.</strong> Computing <code>global_pose</code> for 200 skeletons × 100 bones each = 20,000 matrix multiplies per frame, all on the main thread.</li>
|
||||
<li><strong>Draw call explosion.</strong> Each <code>MeshInstance3D</code> is its own draw call. Even with MultiMesh, there's no built-in path for skinned meshes — <code>MultiMeshInstance3D</code> only handles static geometry.</li>
|
||||
<li><strong>AnimationPlayer sprawl.</strong> Each skeleton needs its own <code>AnimationPlayer</code> and its own <code>process()</code> tick.</li>
|
||||
</ul>
|
||||
<p>Vertex animation textures (VAT) can solve this — bake every vertex position into a texture and sample it in the shader. But that stores <strong>vertices × frames</strong>, not bones × frames. A 2,500-vertex animal with 500 animation frames needs 14 MB of VAT data. For 30 animal types: 426 MB. That doesn't fit on a Steam Deck. And VAT can't blend frames for smooth playback, can't skin normals for correct lighting, and locks you into one animation per bake.</p>
|
||||
<p>Our answer: <strong>bone-matrix palette.</strong> Bake every bone pose into a texture, keep the skinning in the shader. The GPU samples the bone matrices and skins the mesh itself — same 4-bone linear blend as a real skeleton, same correct normals and tangents. But the CPU never touches a bone.</p>
|
||||
<p>The alternative — baking animations to vertex textures — works for static crowds but locks you out of per-instance variation. No blending, no phase offsets, no reactive behaviour.</p>
|
||||
<p>What we need is simpler: <strong>share the skeleton, drive per-instance poses from a single animation, batch the draw call.</strong> That's what <code>agent_skinned</code> does.</p>
|
||||
<h2>How it works: two classes, one texture</h2>
|
||||
<p>The module lives in <code>modules/agent_skinned/</code> inside <a href="https://tinqs.com/tinqs/engine" style="color: var(--c-lime);">Tinqs Engine</a>. Two classes, one job.</p>
|
||||
<p>The module lives in <code>modules/agent_skinned/</code> inside <a href="https://tinqs.com/tinqs/engine" style="color: var(--c-lime);">Tinqs Engine</a>. Two classes, one job:</p>
|
||||
<h3><code>MultiSkinnedMeshInstance3D</code> — the data plane</h3>
|
||||
<p>Holds the bone-matrix palette. Allocates an <code>ImageTexture</code> of size <code>[4 × max_bones, total_frames]</code> in RGBA32F — each texel is one column of a 4×4 bone matrix, each row is one baked animation frame. At load time, we play every animation clip on a temporary skeleton and record the bone matrices for every frame:</p>
|
||||
<pre><code>Goat: 53 bones × 9 clips × 496 frames
|
||||
Texture: 212 × 496 pixels, RGBA32F
|
||||
VRAM: 212 × 496 × 16 bytes = 1.6 MB</code></pre>
|
||||
<p>That's every frame of every clip — walk, run, idle, attack, death, eat, sleep — in 1.6 MB. Across 30 animal types: <strong>48 MB total.</strong> Compare to VAT at 426 MB. Bone-matrix is 9× smaller because bones ≪ vertices.</p>
|
||||
<p>After the bake, the skeleton is destroyed. It never runs again. The API is straightforward:</p>
|
||||
<p>Holds the CPU-side bone matrices. Allocates an <code>ImageTexture</code> of size <code>[4 × max_bones, max_instances]</code> in RGBA32F — each texel is one column of a 4×4 bone matrix. For a 130-bone crocodile with 256 instances:</p>
|
||||
<pre><code>Texture: 520 × 256 RGBA32F ≈ 2 MB</code></pre>
|
||||
<p>That's the entire pose state for 256 animated crocodiles in a single GPU texture. The API is simple:</p>
|
||||
<pre><code class="language-gdscript">var data := MultiSkinnedMeshInstance3D.new()
|
||||
data.set_max_bones(53)
|
||||
data.set_max_instances(496) # palette rows = baked frames
|
||||
data.set_mesh(crocodile_mesh)
|
||||
data.set_skeleton(skeleton) # rest pose + bone hierarchy
|
||||
data.set_max_instances(256)
|
||||
data.set_max_bones(130)
|
||||
|
||||
# Bake: play each clip, seek to each frame, record bone matrices
|
||||
for clip in clips:
|
||||
for frame in clip.frames:
|
||||
skeleton.seek(frame.time)
|
||||
data.set_instance_pose_bones(row, bone_transforms)</code></pre>
|
||||
<p>The data plane stores matrices column-major — 4 texels per bone = 4 columns of a 4×4 transform. The getter matches the layout, and a doctest asserts it so a transpose can't silently regress.</p>
|
||||
# Each frame: push poses from the animated skeleton
|
||||
for instance in herd_positions:
|
||||
data.set_instance_pose_bones(instance.id, bone_transforms)
|
||||
data.update() # upload only dirty instances, not the whole texture</code></pre>
|
||||
<h3><code>MultiSkinnedInstance3D</code> — the renderer</h3>
|
||||
<p>A <code>MultiMeshInstance3D</code> subclass. Set its multimesh with the skinned mesh and instance transforms, point its <code>data_source_path</code> at the data plane. Call <code>refresh()</code> once — it uploads the bone texture into the shader material's <code>bone_matrices_tex</code> uniform.</p>
|
||||
<p>Each MultiMesh instance carries 4 numbers in <code>INSTANCE_CUSTOM</code> (enable <code>multimesh.use_custom_data</code>):</p>
|
||||
<p>| Channel | Meaning |</p>
|
||||
<p>|———|———|</p>
|
||||
<p>| <code>.x</code> | Which clip (start row in the palette) |</p>
|
||||
<p>| <code>.y</code> | How many frames in this clip |</p>
|
||||
<p>| <code>.z</code> | Playback rate (baked-fps × ground speed — foot-sync) |</p>
|
||||
<p>| <code>.w</code> | Phase offset (golden-ratio spread — no two adjacent animals share the same frame) |</p>
|
||||
<p>The vertex shader derives each instance's current frame from TIME:</p>
|
||||
<pre><code class="language-glsl">float fpos = mod(TIME * INSTANCE_CUSTOM.z + INSTANCE_CUSTOM.w * INSTANCE_CUSTOM.y,
|
||||
INSTANCE_CUSTOM.y);
|
||||
int f0 = int(fpos);
|
||||
int f1 = int(mod(float(f0) + 1.0, INSTANCE_CUSTOM.y));
|
||||
float fr = fpos - float(f0);
|
||||
|
||||
// Blend between two adjacent frames for smooth playback at low bake fps
|
||||
int r0 = int(INSTANCE_CUSTOM.x + 0.5) + f0;
|
||||
int r1 = int(INSTANCE_CUSTOM.x + 0.5) + f1;
|
||||
|
||||
// For each bone (up to 4 per vertex), reconstruct mat4 from 4 texels, blend, weight
|
||||
mat4 m0 = mat4(
|
||||
texelFetch(bone_matrices_tex, ivec2(b*4 + 0, r0), 0),
|
||||
texelFetch(bone_matrices_tex, ivec2(b*4 + 1, r0), 0),
|
||||
texelFetch(bone_matrices_tex, ivec2(b*4 + 2, r0), 0),
|
||||
texelFetch(bone_matrices_tex, ivec2(b*4 + 3, r0), 0));
|
||||
mat4 m1 = mat4( /* same for r1 */ );
|
||||
skin += (m0 * (1.0 - fr) + m1 * fr) * weight;
|
||||
|
||||
// Apply skin to vertex, normal, tangent
|
||||
VERTEX = (skin * vec4(VERTEX, 1.0)).xyz;
|
||||
NORMAL = normalize((skin * vec4(NORMAL, 0.0)).xyz);</code></pre>
|
||||
<p>The shader uses <code>INSTANCE_CUSTOM</code> to pick the palette row — not <code>INSTANCE_ID</code>. This is the key: the texture's rows are baked animation frames, not per-instance slots. Many instances share the same rows (a synchronized airborne flock) or each pick their own (a varied herd). One abstraction, two behaviors.</p>
|
||||
<p>The blend between two adjacent frames means we can bake at a low fps and stay smooth — the shader interpolates. The golden-ratio phase spread means every animal in a herd reads a different frame. One draw call per animal type. Zero CPU. Per-instance clip, speed, and phase — all in the GPU.</p>
|
||||
<p>The shader ships as the default material on <code>MultiSkinnedInstance3D</code>. It includes an <code>albedo_tex</code> uniform — the caller sets it from the source mesh's material so herds texture out of the box. No <code>ShaderMaterial</code> assembly required unless you want custom shading.</p>
|
||||
<h2>The numbers</h2>
|
||||
<p>Measured on an M1 Pro MacBook Pro (integrated GPU):</p>
|
||||
<p>| Agent count | FPS |</p>
|
||||
<p>|————|—–|</p>
|
||||
<p>| 100 | <strong>60</strong> |</p>
|
||||
<p>| 500 | <strong>60</strong> |</p>
|
||||
<p>| 1,000 | <strong>60</strong> |</p>
|
||||
<p>| 10,000 | 8 (with CPU-side culling, pre-optimization) |</p>
|
||||
<p><strong>VRAM:</strong> 1.6 MB per animal type. 30 types = 48 MB total. A Steam Deck with 1 GB shared memory fits the entire roster.</p>
|
||||
<p><strong>Draw calls:</strong> One per animal type. 30 types = 30 draw calls for every animated animal on screen. Future colonists share the same architecture — one draw call per colonist look.</p>
|
||||
<h2>What's driving it</h2>
|
||||
<p>In <a href="https://www.arikigame.com" style="color: var(--c-lime);">Ariki</a>, the sim tracks animal migration across a 12km archipelago. <code>AnimalHerdRenderer.cs</code> groups sim <code>ViewerState.animals</code> by type, feeds world positions and yaw rotations to <code>skinned_herd.gd</code> — the reusable per-type herd backend. The herd bakes the palette once at setup, then <code>set_positions()</code> updates transforms each sim tick. <code>set_clip_for_state()</code> switches the active clip block in the custom data when the sim FSM changes state. <code>set_speed_scale()</code> adjusts the per-instance playback rate to match ground speed — feet stay planted.</p>
|
||||
<p>The sim owns all behavior — 30 data-driven animals with per-animal senses, diet, combat stats, and FSM states (graze, drink, sleep, hunt, flee, scavenge, die). The client just renders. This is the same code in single-player and multiplayer — the sim is the host.</p>
|
||||
<p>Bird flocks use the same system. <code>BirdFlock.cs</code> runs boid flocking on top of <code>skinned_herd</code>, sharing the palette with synchronized phases (airborne flapping in unison is intentional). 25 bird species, each a single draw call.</p>
|
||||
<p>Per-instance custom data means a walking Boar, a running Boar, an idle Boar, and an attacking Boar all share the same baked palette — they just point at different rows. The renderer groups by type, not by state. One palette, one draw call, any number of states.</p>
|
||||
<p>A <code>MultiMeshInstance3D</code> subclass. Set its multimesh with the skinned mesh and instance transforms, point it at the data plane, call <code>refresh()</code> — it uploads the bone texture into the shader material's <code>bone_matrices_tex</code> uniform and the mesh is drawn in one call.</p>
|
||||
<p>The shader does 4-bone linear-blend skinning on the GPU:</p>
|
||||
<pre><code class="language-glsl">mat4 get_bone(int b) {
|
||||
return mat4(
|
||||
texelFetch(bone_matrices_tex, ivec2(b * 4 + 0, INSTANCE_ID), 0),
|
||||
texelFetch(bone_matrices_tex, ivec2(b * 4 + 1, INSTANCE_ID), 0),
|
||||
texelFetch(bone_matrices_tex, ivec2(b * 4 + 2, INSTANCE_ID), 0),
|
||||
texelFetch(bone_matrices_tex, ivec2(b * 4 + 3, INSTANCE_ID), 0)
|
||||
);
|
||||
}</code></pre>
|
||||
<p><code>INSTANCE_ID</code> is a Godot built-in — the GPU already knows which instance it's rendering. We just use it to index into the bone texture. No uniform arrays, no SSBOs, no compute shaders. Just a 2D texture and a custom vertex shader.</p>
|
||||
<h2>Two bugs we shipped and fixed</h2>
|
||||
<p>The module had data-plane doctests from day one — round-trip pose get/set, dirty tracking, size clamping, AABB, column-major layout. All green. Then we put it on screen and two things were wrong.</p>
|
||||
<p>The module had data-plane doctests from day one — round-trip pose get/set, dirty tracking, size clamping, AABB. All green. Then we put it on screen for the first time and the crocodiles looked... wrong.</p>
|
||||
<p><strong>Bug 1: Shader compile failure.</strong> The default skinning shader compared <code>TANGENT</code> as <code>vec4</code>. Godot 4 exposes it as <code>vec3</code>. Fixed in one line, added <code>albedo_tex</code> uniform so herds texture out of the box.</p>
|
||||
<p><strong>Bug 2: Bone matrices stored transposed.</strong> The initial data plane wrote basis rows (standard Godot <code>Transform3D.basis</code> is row-major), but the shader reads <code>mat4(c0,c1,c2,c3)</code> as columns. Every bone matrix was transposed — the mesh crumpled. Not a scale bug, not an orientation bug — a layout mismatch. Fixed by storing column-major, with a doctest to prevent regression.</p>
|
||||
<p><strong>Bug 2: Bone matrices stored transposed.</strong> The data plane wrote basis rows (standard Godot <code>Transform3D.basis</code> is row-major), but the shader unpacked as columns. Every bone matrix was transposed — the mesh crumpled. Not a scale bug, not an orientation bug — a layout mismatch. Fixed by storing column-major, with a doctest to prevent regression.</p>
|
||||
<p>The lesson: doctests catch logic. Rendering catches truth. You need both.</p>
|
||||
<h2>The engine change</h2>
|
||||
<p>The module is 40 lines of shader code and ~500 lines of C++ in the engine's <code>modules/agent_skinned/</code>. The critical detail is in the shader: the bone-matrix texture is indexed by a <strong>pose slot</strong> computed from <code>INSTANCE_CUSTOM</code>, not by <code>INSTANCE_ID</code>. This is what decouples the palette from the instance count — the texture stores animation frames, the MultiMesh stores instance transforms, and the shader bridges them.</p>
|
||||
<p>Engine version: <strong>4.6.5.</strong></p>
|
||||
<p>No C# wrapper is generated — instantiate from GDScript via <code>ClassDB.instantiate()</code> and call the bound methods. The binding surface is small and stable. See <code>ariki-game/scenes/animals/skinned_herd.gd</code> for the reference backend.</p>
|
||||
<h2>The production pipeline</h2>
|
||||
<p>Each animal model ships as a game-ready GLB with baked animation clips. A catalog file maps each animal to its clips, default state, and per-animal speed reference for foot-sync.</p>
|
||||
<p>At runtime, <code>AnimalHerdRenderer</code> spawns one <code>skinned_herd</code> per animal type. The herd bakes the palette from the model's clips. Animation logic maps sim FSM states to clip keywords (attack → attack/bite, flee → run/gallop, wander → walk). The renderer lerps positions between sim ticks for smooth motion and writes per-instance custom data each frame. Zero per-frame CPU on the animation path.</p>
|
||||
<h2>Where we stand vs the industry</h2>
|
||||
<p>The bone-matrix palette technique is the same architecture used by Assassin's Creed Unity, Total War: Warhammer, and Hitman for their crowd systems. We're using the same core idea, in a Godot fork, with smaller VRAM — our low-poly animals keep textures tiny.</p>
|
||||
<p>Stock Godot has no answer for this. <code>Skeleton3D</code> per character caps at ~20. <code>MultiMesh</code> can't skin. There is no built-in crowd animation path.</p>
|
||||
<p>The platform runs two tiers by distance, driven by the same <code>(clip, count, speed, phase)</code> packet:</p>
|
||||
<ul>
|
||||
<li><strong>Crowd tier (palette)</strong> — baked poses, GPU-driven, zero CPU. Thousands of agents in one draw call per type.</li>
|
||||
<li><strong>Hero tier (real rigs)</strong> — the nearest few agents get real <code>Skeleton3D</code> + <code>AnimationTree</code> + IK. Smooth crossfades, head look-at, ragdoll. Hidden from the palette so they don't double-render.</li>
|
||||
</ul>
|
||||
<p>Same code drives 30 animals today. Same code will drive thousands of colonists at launch.</p>
|
||||
<h2>What's driving it</h2>
|
||||
<p>In <a href="https://www.arikigame.com" style="color: var(--c-lime);">Ariki</a>, the sim tracks animal migration across a 12km archipelago. <code>AnimalHerdRenderer.cs</code> groups sim <code>ViewerState.animals</code> by type, feeds positions to <code>skinned_herd.gd</code> (a reusable per-type herd backend), which drives the renderer. One <code>AnimationPlayer</code> animates a single driver skeleton; poses propagate to every instance.</p>
|
||||
<p>The crocodile herd scene was 25 instances, one draw call. The perf test scene does 1,000 animals across 12 types — Boar, Cow, Crab, Crocodile, Deer, Fish, Goat, Hen, Pig, Rabbit, Sheep, Tiger — each type its own GPU herd, all mixed, all random-walking, FPS holding steady.</p>
|
||||
<h2>What's deliberately not here</h2>
|
||||
<ul>
|
||||
<li><strong>No C# wrapper.</strong> Instantiate from GDScript via <code>ClassDB.instantiate()</code> — the binding surface is small and stable.</li>
|
||||
<li><strong>No automatic <code>AnimationPlayer</code> integration.</strong> You drive poses at bake time. We give you the texture. Freedom to animate however you want.</li>
|
||||
<li><strong>No GPU occlusion culling.</strong> That's the game's job. The engine provides the tool; the game decides what to draw.</li>
|
||||
<li><strong>No automatic <code>AnimationPlayer</code> integration.</strong> You drive poses. We give you the texture. Freedom to animate however you want.</li>
|
||||
<li><strong>No GPU occlusion or LOD.</strong> That's the game's job. The engine provides the tool; the game decides what to draw.</li>
|
||||
</ul>
|
||||
<h2>What's new in this build (16 June 2026)</h2>
|
||||
<ul>
|
||||
<li><strong>mat4x3 palette (B4).</strong> Each bone packs into 3 RGBA16F texels instead of 4 — 37% of the original VRAM and texel fetch cost. Column-major, doctest-guarded.</li>
|
||||
<li><strong>Far-LOD dominant-bone.</strong> At distance, each instance uses a single nearest-frame bone (~3 texel fetches vs ~24 near). LOD thresholds per-animal, scaled by body size — giraffes stay crisp 3x farther than rats.</li>
|
||||
<li><strong>In-place bake.</strong> Walk/run clips no longer translate root motion — the bake strips horizontal drift so the sim owns position. Fixed the notorious slide/skate bug across all animal types.</li>
|
||||
<li><strong>Full frustum cull (C7).</strong> Only on-screen instances hit the GPU. Caught a sign bug where Godot's outward-pointing frustum normals inverted the cull test.</li>
|
||||
<li><strong>Bulk instance upload (A1).</strong> One <code>MultiMesh.buffer =</code> per herd per frame — zero per-instance native calls.</li>
|
||||
</ul>
|
||||
<p>24 doctests green. Visual-verified on Kraken (M1/Metal) and Forge (Windows/RTX).</p>
|
||||
<h2>Get the build</h2>
|
||||
<p>Pre-built editor binaries with <code>agent_skinned</code> and the GPU-driven palette baked in — no engine compile required. The game's <code>animal_perf_test.tscn</code> lets you spawn 10/100/1,000/10,000 animals and read live FPS:</p>
|
||||
<p>| Platform | Binary |</p>
|
||||
<p>|———-|——–|</p>
|
||||
<p>| <strong>macOS ARM64</strong> | <a href="https://tinqs.com/tinqs/builds/media/branch/main/engine/macos-arm64/tinqs.macos.editor.arm64.mono" style="color: var(--c-lime);"><code>tinqs.macos.editor.arm64.mono</code></a> |</p>
|
||||
<p>| <strong>Windows x64</strong> | <a href="https://tinqs.com/tinqs/builds/media/branch/main/engine/windows-x64/tinqs.windows.editor.x86_64.mono.exe" style="color: var(--c-lime);"><code>tinqs.windows.editor.x86_64.mono.exe</code></a> |</p>
|
||||
<p>All builds at <a href="https://tinqs.com/tinqs/builds" style="color: var(--c-lime);"><code>tinqs/builds</code></a> — engine source is private, but the binaries are yours. See <a href="https://tinqs.com/tinqs/builds/src/branch/main/manifest.json" style="color: var(--c-lime);"><code>manifest.json</code></a> for checksums and build details.</p>
|
||||
<p>Pre-built editor binaries with <code>agent_skinned</code> baked in — no engine compile required. The game's <code>animal_perf_test.tscn</code> lets you toggle 10 / 100 / 1000 animals and read live FPS:</p>
|
||||
<table style="border-collapse:collapse;width:100%;margin:12px 0;">
|
||||
<tr style="border-bottom:1px solid var(--c-border);"><th style="text-align:left;padding:8px;color:var(--c-muted);">Platform</th><th style="text-align:left;padding:8px;color:var(--c-muted);">Binary</th><th style="text-align:left;padding:8px;color:var(--c-muted);">Engine commit</th></tr>
|
||||
<tr style="border-bottom:1px solid var(--c-border);"><td style="padding:8px;"><strong>macOS ARM64</strong></td><td style="padding:8px;"><a href="https://tinqs.com/tinqs/builds/media/branch/main/engine/macos-arm64/tinqs.macos.editor.arm64.mono" style="color: var(--c-lime);"><code>tinqs.macos.editor.arm64.mono</code></a></td><td style="padding:8px;"><code>4fe1323</code> (4.6.4, Xcode 26.3)</td></tr>
|
||||
<tr style="border-bottom:1px solid var(--c-border);"><td style="padding:8px;"><strong>Windows x64</strong></td><td style="padding:8px;"><a href="https://tinqs.com/tinqs/builds/media/branch/main/engine/windows-x64/tinqs.windows.editor.x86_64.mono.exe" style="color: var(--c-lime);"><code>tinqs.windows.editor.x86_64.mono.exe</code></a></td><td style="padding:8px;"><code>420e74bf</code> (4.6.5, MSVC 2022) 🆕</td></tr>
|
||||
</table>
|
||||
<p>All builds live in the public <a href="https://tinqs.com/tinqs/builds" style="color: var(--c-lime);"><code>tinqs/builds</code></a> repo — engine source is private, but the binaries are yours. See <a href="https://tinqs.com/tinqs/builds/src/branch/main/manifest.json" style="color: var(--c-lime);"><code>manifest.json</code></a> for checksums and build details.</p>
|
||||
<p>The engine source lives in <a href="https://tinqs.com/tinqs/engine" style="color: var(--c-lime);"><code>tinqs/engine</code></a> (private). Module docs: <code>modules/agent_skinned/README.md</code> and <code>.agents/wiki/agent-skinned-gpu-herd.md</code>.</p>
|
||||
<hr>
|
||||
<p><strong>Related:</strong> <a href="fork-dont-build" style="color: var(--c-lime);">Fork, Don't Build</a> — why we modify existing platforms instead of building new ones. <a href="godot-optimisation" style="color: var(--c-lime);">Streaming a 12km Archipelago in Godot 4</a> — the terrain and vegetation streaming layers that work alongside this.</p>
|
||||
|
||||
|
Before
After
|
+2
-2
@@ -188,9 +188,9 @@
|
||||
</a>
|
||||
|
||||
<a href="gpu-skinned-herds" class="blog-card">
|
||||
<span class="blog-card__date">15 June 2026</span>
|
||||
<span class="blog-card__date">16 June 2026 · updated</span>
|
||||
<h2 class="blog-card__title">GPU-Skinned Herds: One Draw Call for 1,000 Animated Characters in Godot</h2>
|
||||
<p class="blog-card__excerpt">Godot can't batch-render 1,000 animated characters. We built a GPU-driven crowd renderer into the engine itself — bake every animation frame into a texture once, let the GPU drive every instance. 1,000 animals, 60 FPS, zero skeletons. Pre-built editor binaries.</p>
|
||||
<p class="blog-card__excerpt">Godot can't batch-render 1,000 animated characters. We built a GPU skinned-instance herd renderer into the engine — now with mat4x3 palette, far-LOD, in-place bake. Pre-built binaries for macOS and Windows.</p>
|
||||
<span class="blog-card__read">Read →</span>
|
||||
</a>
|
||||
|
||||
|
||||
|
Before
After
|
Reference in New Issue
Block a user