← All Posts 16 June 2026 · updated

GPU-Skinned Herds: One Draw Call for 1,000 Animated Characters in Godot

Godot gives you one Skeleton3D per character. Want 200 animals in a herd? That's 200 skeleton nodes, 200 draw calls, and 200 AnimationPlayer ticks every frame. Want 1,000? Now you're measuring in seconds per frame, not frames per second.

We built a GPU skinned-instance renderer into Tinqs Engine that packs every pose into a single texture, uploads once, and draws every instance in one call. 25 crocodiles confirmed first. Then we threw 1,000 animals — 12 types mixed, random-walking — at it and the GPU didn't flinch. Now upgraded: mat4×3 palette (37% of original VRAM), far-LOD dominant-bone (3 texel fetches at distance), in-place bake (zero foot-slide), and full frustum cull. Same bone count, same animation fidelity, a tiny fraction of the cost.

Why the engine needs to change

The standard Godot approach — one Skeleton3D + one MeshInstance3D per character — works for a handful of animated entities. It breaks down hard at crowd scale:

The alternative — baking animations to vertex textures — works for static crowds but locks you out of per-instance variation. No blending, no phase offsets, no reactive behaviour.

What we need is simpler: share the skeleton, drive per-instance poses from a single animation, batch the draw call. That's what agent_skinned does.

How it works: two classes, one texture

The module lives in modules/agent_skinned/ inside Tinqs Engine. Two classes, one job:

MultiSkinnedMeshInstance3D — the data plane

Holds the CPU-side bone matrices. Allocates an ImageTexture of size [4 × max_bones, max_instances] in RGBA32F — each texel is one column of a 4×4 bone matrix. For a 130-bone crocodile with 256 instances:

Texture: 520 × 256 RGBA32F ≈ 2 MB

That's the entire pose state for 256 animated crocodiles in a single GPU texture. The API is simple:

var data := MultiSkinnedMeshInstance3D.new()
data.set_mesh(crocodile_mesh)
data.set_skeleton(skeleton)       # rest pose + bone hierarchy
data.set_max_instances(256)
data.set_max_bones(130)

# Each frame: push poses from the animated skeleton
for instance in herd_positions:
    data.set_instance_pose_bones(instance.id, bone_transforms)
data.update()   # upload only dirty instances, not the whole texture

MultiSkinnedInstance3D — the renderer

A MultiMeshInstance3D subclass. Set its multimesh with the skinned mesh and instance transforms, point it at the data plane, call refresh() — it uploads the bone texture into the shader material's bone_matrices_tex uniform and the mesh is drawn in one call.

The shader does 4-bone linear-blend skinning on the GPU:

mat4 get_bone(int b) {
    return mat4(
        texelFetch(bone_matrices_tex, ivec2(b * 4 + 0, INSTANCE_ID), 0),
        texelFetch(bone_matrices_tex, ivec2(b * 4 + 1, INSTANCE_ID), 0),
        texelFetch(bone_matrices_tex, ivec2(b * 4 + 2, INSTANCE_ID), 0),
        texelFetch(bone_matrices_tex, ivec2(b * 4 + 3, INSTANCE_ID), 0)
    );
}

INSTANCE_ID is a Godot built-in — the GPU already knows which instance it's rendering. We just use it to index into the bone texture. No uniform arrays, no SSBOs, no compute shaders. Just a 2D texture and a custom vertex shader.

Two bugs we shipped and fixed

The module had data-plane doctests from day one — round-trip pose get/set, dirty tracking, size clamping, AABB. All green. Then we put it on screen for the first time and the crocodiles looked... wrong.

Bug 1: Shader compile failure. The default skinning shader compared TANGENT as vec4. Godot 4 exposes it as vec3. Fixed in one line, added albedo_tex uniform so herds texture out of the box.

Bug 2: Bone matrices stored transposed. The data plane wrote basis rows (standard Godot Transform3D.basis is row-major), but the shader unpacked as columns. Every bone matrix was transposed — the mesh crumpled. Not a scale bug, not an orientation bug — a layout mismatch. Fixed by storing column-major, with a doctest to prevent regression.

The lesson: doctests catch logic. Rendering catches truth. You need both.

What's driving it

In Ariki, the sim tracks animal migration across a 12km archipelago. AnimalHerdRenderer.cs groups sim ViewerState.animals by type, feeds positions to skinned_herd.gd (a reusable per-type herd backend), which drives the renderer. One AnimationPlayer animates a single driver skeleton; poses propagate to every instance.

The crocodile herd scene was 25 instances, one draw call. The perf test scene does 1,000 animals across 12 types — Boar, Cow, Crab, Crocodile, Deer, Fish, Goat, Hen, Pig, Rabbit, Sheep, Tiger — each type its own GPU herd, all mixed, all random-walking, FPS holding steady.

What's deliberately not here

What's new in this build (16 June 2026)

24 doctests green. Visual-verified on Kraken (M1/Metal) and Forge (Windows/RTX).

Get the build

Pre-built editor binaries with agent_skinned baked in — no engine compile required. The game's animal_perf_test.tscn lets you toggle 10 / 100 / 1000 animals and read live FPS:

PlatformBinaryEngine commit
macOS ARM64tinqs.macos.editor.arm64.mono4fe1323 (4.6.4, Xcode 26.3)
Windows x64tinqs.windows.editor.x86_64.mono.exe420e74bf (4.6.5, MSVC 2022) 🆕

All builds live in the public tinqs/builds repo — engine source is private, but the binaries are yours. See manifest.json for checksums and build details.

The engine source lives in tinqs/engine (private). Module docs: modules/agent_skinned/README.md and .agents/wiki/agent-skinned-gpu-herd.md.


Related: Fork, Don't Build — why we modify existing platforms instead of building new ones. Streaming a 12km Archipelago in Godot 4 — the terrain and vegetation streaming layers that work alongside this.