GPU-Skinned Herds: One Draw Call for 1,000 Animated Characters in Godot
Godot gives you one Skeleton3D per character. Want 200 animals in a herd? That's 200 skeleton nodes, 200 draw calls, and 200 AnimationPlayer ticks every frame. Want 1,000? Now you're measuring in seconds per frame, not frames per second.
We built a GPU skinned-instance renderer into Tinqs Engine that packs every pose into a single texture, uploads once, and draws every instance in one call. 25 crocodiles confirmed first. Then we threw 1,000 animals — 12 types mixed, random-walking — at it and the GPU didn't flinch. Now upgraded: mat4×3 palette (37% of original VRAM), far-LOD dominant-bone (3 texel fetches at distance), in-place bake (zero foot-slide), and full frustum cull. Same bone count, same animation fidelity, a tiny fraction of the cost.
Why the engine needs to change
The standard Godot approach — one Skeleton3D + one MeshInstance3D per character — works for a handful of animated entities. It breaks down hard at crowd scale:
- CPU bone transforms. Computing
global_posefor 200 skeletons × 100 bones each = 20,000 matrix multiplies per frame, all on the main thread. - Draw call explosion. Each
MeshInstance3Dis its own draw call. Even with MultiMesh, there's no built-in path for skinned meshes —MultiMeshInstance3Donly handles static geometry. - AnimationPlayer sprawl. Each skeleton needs its own
AnimationPlayerand its ownprocess()tick.
The alternative — baking animations to vertex textures — works for static crowds but locks you out of per-instance variation. No blending, no phase offsets, no reactive behaviour.
What we need is simpler: share the skeleton, drive per-instance poses from a single animation, batch the draw call. That's what agent_skinned does.
How it works: two classes, one texture
The module lives in modules/agent_skinned/ inside Tinqs Engine. Two classes, one job:
MultiSkinnedMeshInstance3D — the data plane
Holds the CPU-side bone matrices. Allocates an ImageTexture of size [4 × max_bones, max_instances] in RGBA32F — each texel is one column of a 4×4 bone matrix. For a 130-bone crocodile with 256 instances:
Texture: 520 × 256 RGBA32F ≈ 2 MB
That's the entire pose state for 256 animated crocodiles in a single GPU texture. The API is simple:
var data := MultiSkinnedMeshInstance3D.new()
data.set_mesh(crocodile_mesh)
data.set_skeleton(skeleton) # rest pose + bone hierarchy
data.set_max_instances(256)
data.set_max_bones(130)
# Each frame: push poses from the animated skeleton
for instance in herd_positions:
data.set_instance_pose_bones(instance.id, bone_transforms)
data.update() # upload only dirty instances, not the whole texture
MultiSkinnedInstance3D — the renderer
A MultiMeshInstance3D subclass. Set its multimesh with the skinned mesh and instance transforms, point it at the data plane, call refresh() — it uploads the bone texture into the shader material's bone_matrices_tex uniform and the mesh is drawn in one call.
The shader does 4-bone linear-blend skinning on the GPU:
mat4 get_bone(int b) {
return mat4(
texelFetch(bone_matrices_tex, ivec2(b * 4 + 0, INSTANCE_ID), 0),
texelFetch(bone_matrices_tex, ivec2(b * 4 + 1, INSTANCE_ID), 0),
texelFetch(bone_matrices_tex, ivec2(b * 4 + 2, INSTANCE_ID), 0),
texelFetch(bone_matrices_tex, ivec2(b * 4 + 3, INSTANCE_ID), 0)
);
}
INSTANCE_ID is a Godot built-in — the GPU already knows which instance it's rendering. We just use it to index into the bone texture. No uniform arrays, no SSBOs, no compute shaders. Just a 2D texture and a custom vertex shader.
Two bugs we shipped and fixed
The module had data-plane doctests from day one — round-trip pose get/set, dirty tracking, size clamping, AABB. All green. Then we put it on screen for the first time and the crocodiles looked... wrong.
Bug 1: Shader compile failure. The default skinning shader compared TANGENT as vec4. Godot 4 exposes it as vec3. Fixed in one line, added albedo_tex uniform so herds texture out of the box.
Bug 2: Bone matrices stored transposed. The data plane wrote basis rows (standard Godot Transform3D.basis is row-major), but the shader unpacked as columns. Every bone matrix was transposed — the mesh crumpled. Not a scale bug, not an orientation bug — a layout mismatch. Fixed by storing column-major, with a doctest to prevent regression.
The lesson: doctests catch logic. Rendering catches truth. You need both.
What's driving it
In Ariki, the sim tracks animal migration across a 12km archipelago. AnimalHerdRenderer.cs groups sim ViewerState.animals by type, feeds positions to skinned_herd.gd (a reusable per-type herd backend), which drives the renderer. One AnimationPlayer animates a single driver skeleton; poses propagate to every instance.
The crocodile herd scene was 25 instances, one draw call. The perf test scene does 1,000 animals across 12 types — Boar, Cow, Crab, Crocodile, Deer, Fish, Goat, Hen, Pig, Rabbit, Sheep, Tiger — each type its own GPU herd, all mixed, all random-walking, FPS holding steady.
What's deliberately not here
- No C# wrapper. Instantiate from GDScript via
ClassDB.instantiate()— the binding surface is small and stable. - No automatic
AnimationPlayerintegration. You drive poses. We give you the texture. Freedom to animate however you want. - No GPU occlusion or LOD. That's the game's job. The engine provides the tool; the game decides what to draw.
What's new in this build (16 June 2026)
- mat4x3 palette (B4). Each bone packs into 3 RGBA16F texels instead of 4 — 37% of the original VRAM and texel fetch cost. Column-major, doctest-guarded.
- Far-LOD dominant-bone. At distance, each instance uses a single nearest-frame bone (~3 texel fetches vs ~24 near). LOD thresholds per-animal, scaled by body size — giraffes stay crisp 3x farther than rats.
- In-place bake. Walk/run clips no longer translate root motion — the bake strips horizontal drift so the sim owns position. Fixed the notorious slide/skate bug across all animal types.
- Full frustum cull (C7). Only on-screen instances hit the GPU. Caught a sign bug where Godot's outward-pointing frustum normals inverted the cull test.
- Bulk instance upload (A1). One
MultiMesh.buffer =per herd per frame — zero per-instance native calls.
24 doctests green. Visual-verified on Kraken (M1/Metal) and Forge (Windows/RTX).
Get the build
Pre-built editor binaries with agent_skinned baked in — no engine compile required. The game's animal_perf_test.tscn lets you toggle 10 / 100 / 1000 animals and read live FPS:
| Platform | Binary | Engine commit |
|---|---|---|
| macOS ARM64 | tinqs.macos.editor.arm64.mono | 4fe1323 (4.6.4, Xcode 26.3) |
| Windows x64 | tinqs.windows.editor.x86_64.mono.exe | 420e74bf (4.6.5, MSVC 2022) 🆕 |
All builds live in the public tinqs/builds repo — engine source is private, but the binaries are yours. See manifest.json for checksums and build details.
The engine source lives in tinqs/engine (private). Module docs: modules/agent_skinned/README.md and .agents/wiki/agent-skinned-gpu-herd.md.
Related: Fork, Don't Build — why we modify existing platforms instead of building new ones. Streaming a 12km Archipelago in Godot 4 — the terrain and vegetation streaming layers that work alongside this.