post: remove internal roadmap — public-facing only

This commit is contained in:
2026-06-15 22:50:19 +01:00
parent 85a6db41c5
commit 6524ac3597
2 changed files with 10 additions and 49 deletions
+6 -17
View File
@@ -380,23 +380,12 @@ NORMAL = normalize((skin * vec4(NORMAL, 0.0)).xyz);</code></pre>
<li><strong>Impostor tier (2D billboards)</strong> — sprite atlas indexed by view-angle and animation-frame. For very far agents.</li> <li><strong>Impostor tier (2D billboards)</strong> — sprite atlas indexed by view-angle and animation-frame. For very far agents.</li>
</ul> </ul>
<p>One abstraction, three detail levels. The same code that drives 30 animals today will drive thousands of colonists at launch.</p> <p>One abstraction, three detail levels. The same code that drives 30 animals today will drive thousands of colonists at launch.</p>
<h2>The engine roadmap — where we push next</h2> <h2>What's deliberately not here</h2>
<p>The 10,000-agent load test pointed at exactly where the engine can still win. The bottleneck at extreme scale isn't the GPU skinning — it's the CPU path feeding it: C# builds arrays, marshals Variants into GDScript, GDScript loops per-instance <code>set_instance_transform</code> / <code>set_instance_custom_data</code>. Three layers of per-instance overhead, all on the main thread. The fixes are engine-deep.</p> <ul>
<h3>Tier A — kill the per-frame CPU path</h3> <li><strong>No C# wrapper.</strong> Instantiate from GDScript via <code>ClassDB.instantiate()</code> — the binding surface is small and stable.</li>
<p><strong>Bulk instance-upload API.</strong> Add <code>set_instance_data_bulk()</code> that does a single memcpy into the MultiMesh buffer instead of N scripted per-instance calls. One marshalled call + one copy per herd per frame instead of thousands.</p> <li><strong>No automatic <code>AnimationPlayer</code> integration.</strong> You drive poses at bake time. We give you the texture. Freedom to animate however you want.</li>
<p><strong>GPU-driven cull + indirect multi-draw.</strong> A compute pass classifies frustum, distance, and LOD-tier on the GPU and writes an indirect draw buffer per tier — the CPU stops iterating instances entirely. Pairs with bulk upload: together the main thread does ~zero per-instance work.</p> <li><strong>No GPU occlusion culling.</strong> That's the game's job. The engine provides the tool; the game decides what to draw.</li>
<p><strong>GPU dead-reckoning of position.</strong> Store per-instance velocity in custom data. Advance transforms from <code>TIME</code> in the vertex shader. The CPU only touches an instance on a sim snapshot (~every 0.4s), not every frame.</p> </ul>
<h3>Tier B — skinning core upgrades</h3>
<p><strong>Dual-quaternion skinning.</strong> 2 texels per bone instead of 4. Halves palette VRAM, halves per-vertex texel fetches, and fixes the "candy-wrapper" collapse on twisting joints that linear blend skinning has. A real engine-grade upgrade.</p>
<p><strong>Mat4x3 storage.</strong> The 4th column of a bone matrix is always <code>(0,0,0,1)</code> — dropping it saves 25% VRAM with zero quality loss. A quick win if dual-quat is too big a step.</p>
<p><strong>Reduced-bone far LOD.</strong> Drop fingers, tail, and face bones for the far tier — fewer fetches where detail isn't visible.</p>
<h3>Tier C — visibility & render passes</h3>
<p><strong>Frustum-cull integration.</strong> MultiMesh draws everything in <code>visible_instance_count</code> — wire per-instance frustum culling into the engine's visibility system.</p>
<p><strong>Shadow-pass LOD.</strong> The skinning shader runs again in the depth/shadow pass. Skip skinning or drop shadow casters beyond a distance — often a hidden ~2× vertex cost.</p>
<h3>Tier D — quality & pipeline</h3>
<p><strong>In-shader clip cross-fade.</strong> Blend two clip blocks per instance (second custom slot + blend factor) instead of hard state cuts — brings hero-rig smoothness to the whole crowd without a real skeleton.</p>
<p><strong>Threaded bake.</strong> Move the palette bake to a worker thread so first-encounter of a new animal type never hitches the main thread.</p>
<p>The recommended order: bulk upload (directly fixes the measured bottleneck, small, low-risk) → mat4x3 storage (immediate VRAM win) → GPU-driven cull + indirect draw (removes CPU from the loop entirely, unlocks tens of thousands) → dual-quaternion skinning (the skinning-quality leap). The first two are a day each and compounding; the latter two are the deep engine investments that make this a genuinely AAA crowd platform.</p>
<h2>Get the build</h2> <h2>Get the build</h2>
<p>Pre-built editor binaries with <code>agent_skinned</code> and the GPU-driven palette baked in — no engine compile required. The game's <code>animal_perf_test.tscn</code> lets you spawn 10/100/1,000/10,000 animals and read live FPS:</p> <p>Pre-built editor binaries with <code>agent_skinned</code> and the GPU-driven palette baked in — no engine compile required. The game's <code>animal_perf_test.tscn</code> lets you spawn 10/100/1,000/10,000 animals and read live FPS:</p>
<p>| Platform | Binary |</p> <p>| Platform | Binary |</p>
Before
After
+4 -32
View File
@@ -164,39 +164,11 @@ The platform supports three tiers by distance, all driven by the same `(clip, co
One abstraction, three detail levels. The same code that drives 30 animals today will drive thousands of colonists at launch. One abstraction, three detail levels. The same code that drives 30 animals today will drive thousands of colonists at launch.
## The engine roadmap — where we push next ## What's deliberately not here
The 10,000-agent load test pointed at exactly where the engine can still win. The bottleneck at extreme scale isn't the GPU skinning — it's the CPU path feeding it: C# builds arrays, marshals Variants into GDScript, GDScript loops per-instance `set_instance_transform` / `set_instance_custom_data`. Three layers of per-instance overhead, all on the main thread. The fixes are engine-deep. - **No C# wrapper.** Instantiate from GDScript via `ClassDB.instantiate()` — the binding surface is small and stable.
- **No automatic `AnimationPlayer` integration.** You drive poses at bake time. We give you the texture. Freedom to animate however you want.
### Tier A — kill the per-frame CPU path - **No GPU occlusion culling.** That's the game's job. The engine provides the tool; the game decides what to draw.
**Bulk instance-upload API.** Add `set_instance_data_bulk()` that does a single memcpy into the MultiMesh buffer instead of N scripted per-instance calls. One marshalled call + one copy per herd per frame instead of thousands.
**GPU-driven cull + indirect multi-draw.** A compute pass classifies frustum, distance, and LOD-tier on the GPU and writes an indirect draw buffer per tier — the CPU stops iterating instances entirely. Pairs with bulk upload: together the main thread does ~zero per-instance work.
**GPU dead-reckoning of position.** Store per-instance velocity in custom data. Advance transforms from `TIME` in the vertex shader. The CPU only touches an instance on a sim snapshot (~every 0.4s), not every frame.
### Tier B — skinning core upgrades
**Dual-quaternion skinning.** 2 texels per bone instead of 4. Halves palette VRAM, halves per-vertex texel fetches, and fixes the "candy-wrapper" collapse on twisting joints that linear blend skinning has. A real engine-grade upgrade.
**Mat4x3 storage.** The 4th column of a bone matrix is always `(0,0,0,1)` — dropping it saves 25% VRAM with zero quality loss. A quick win if dual-quat is too big a step.
**Reduced-bone far LOD.** Drop fingers, tail, and face bones for the far tier — fewer fetches where detail isn't visible.
### Tier C — visibility & render passes
**Frustum-cull integration.** MultiMesh draws everything in `visible_instance_count` — wire per-instance frustum culling into the engine's visibility system.
**Shadow-pass LOD.** The skinning shader runs again in the depth/shadow pass. Skip skinning or drop shadow casters beyond a distance — often a hidden ~2× vertex cost.
### Tier D — quality & pipeline
**In-shader clip cross-fade.** Blend two clip blocks per instance (second custom slot + blend factor) instead of hard state cuts — brings hero-rig smoothness to the whole crowd without a real skeleton.
**Threaded bake.** Move the palette bake to a worker thread so first-encounter of a new animal type never hitches the main thread.
The recommended order: bulk upload (directly fixes the measured bottleneck, small, low-risk) → mat4x3 storage (immediate VRAM win) → GPU-driven cull + indirect draw (removes CPU from the loop entirely, unlocks tens of thousands) → dual-quaternion skinning (the skinning-quality leap). The first two are a day each and compounding; the latter two are the deep engine investments that make this a genuinely AAA crowd platform.
## Get the build ## Get the build