Sanitize all posts for public repo
- Remove classified agent names and internal codenames - Remove Tailscale references - Generalise internal details (machine names, team specifics) - Frame everything around Tinqs Studio as the platform - fal.ai post references the image-generation skill - README updated with Studio positioning Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
+43
-51
@@ -2,10 +2,10 @@
|
||||
title: "Streaming a 12km Archipelago in Godot 4"
|
||||
slug: godot-optimisation
|
||||
date: "2026-05-22"
|
||||
description: "How we built four streaming layers, async resource loading, and memory-safe caches to run a 12km open world in Godot 4 with C# --- without a single memory leak."
|
||||
og_description: "Four streaming layers, async loading, and zero memory leaks --- how we optimise Godot for a survival colony sim."
|
||||
description: "How we built four streaming layers, async resource loading, and memory-safe caches to run a 12km open world in Godot 4 with C#."
|
||||
og_description: "Four streaming layers, async loading, and zero memory leaks --- optimising Godot for a large open world."
|
||||
og_image: "https://www.tinqs.com/img/og-cover.jpg"
|
||||
excerpt: "Four streaming layers, async resource loading, memory-safe caches, and zero leaks. How we optimise Godot to run a 12km open world with C# and Terrain3D."
|
||||
excerpt: "Four streaming layers, async resource loading, memory-safe caches, and zero leaks. How we built a 12km open world in Godot 4 with C#."
|
||||
author: "Ozan Bozkurt"
|
||||
author_initials: "OB"
|
||||
author_role: "CTO & Developer, Tinqs"
|
||||
@@ -14,106 +14,98 @@ Godot has no built-in asset streaming. Our game is a 12km x 12km archipelago wit
|
||||
|
||||
## The Problem
|
||||
|
||||
Ariki is a survival colony sim set across 9 islands in a Polynesian-inspired archipelago. The total world is roughly 12km x 12km. Each island is 4km across with its own terrain heightmap, biome textures, vegetation prototypes, and building grids. The player can travel between islands by canoe.
|
||||
We're building a survival colony sim set across 9 islands. The total world is roughly 12km x 12km. Each island is 4km across with its own terrain heightmap, biome textures, vegetation prototypes, and building grids. The player can travel between islands by canoe.
|
||||
|
||||
Godot 4 is a fantastic engine, but it wasn't designed for this scale. There's no terrain streaming, no asset LOD pipeline, no distance-based loading. If you load everything at startup, you run out of VRAM before the player sees the main menu. So we built four streaming layers on top of Godot, all in C#.
|
||||
|
||||
## Layer 1: Terrain3D Regions
|
||||
## Layer 1: Terrain Regions
|
||||
|
||||
We use **Terrain3D** for our heightmaps --- a GDExtension that gives us a clipmap renderer with 7 LOD levels. Internally, Terrain3D divides each island into 512m x 512m regions. A 4km island has 64 internal regions. Across 9 islands, that's 576 regions total.
|
||||
We use **Terrain3D** for heightmaps --- a GDExtension that gives us a clipmap renderer with 7 LOD levels. Internally, Terrain3D divides each island into 512m x 512m regions. A 4km island has 64 regions. Across 9 islands, that's 576 regions total.
|
||||
|
||||
The key insight: **don't create all 9 Terrain3D nodes at startup.** Each node allocates a clipmap mesh, collision structures, and materials even when hidden. Our original code created all 9 in `_Ready()` and just toggled visibility. This wasted hundreds of megabytes on islands the player hadn't visited yet.
|
||||
The key insight: **don't create all 9 terrain nodes at startup.** Each node allocates a clipmap mesh, collision structures, and materials even when hidden. Our original code created all 9 in `_Ready()` and just toggled visibility. This wasted hundreds of megabytes on islands the player hadn't visited yet.
|
||||
|
||||
The fix was lazy instantiation. We create the current island's terrain on startup and defer the rest to `TravelToIsland()`. When the player gets in a canoe and sails to a new island, we create that island's Terrain3D node on demand, import the heightmap, and start async texture loading --- all while a loading screen covers the transition.
|
||||
The fix was lazy instantiation. We create the current island's terrain on startup and defer the rest. When the player gets in a canoe and sails to a new island, we create that island's terrain node on demand, import the heightmap, and start async texture loading --- all while a loading screen covers the transition.
|
||||
|
||||
## Layer 2: Vegetation Chunks (128m Grid)
|
||||
|
||||
This is the main prop streaming system and where most of the complexity lives. Every island's vegetation --- trees, rocks, grasses, shrubs --- is divided into a spatial grid of 128m x 128m chunks.
|
||||
This is the main prop streaming system. Every island's vegetation --- trees, rocks, grasses, shrubs --- is divided into a spatial grid of 128m x 128m chunks.
|
||||
|
||||
The camera position is checked every 0.5 seconds. When it crosses a chunk boundary, we calculate which chunks should be active within a 400m radius (roughly 39 chunks in a circle), `QueueFree` chunks that fell out of range, and build new chunks that entered range.
|
||||
|
||||
Each chunk groups vegetation instances by prototype, creates a **MultiMesh** per group, and places instances using Terrain3D height queries. This means a chunk with 50 palm trees and 30 rocks becomes 2 MultiMesh draw calls, not 80 individual nodes.
|
||||
Each chunk groups vegetation instances by prototype, creates a **MultiMesh** per group, and places instances using height queries. A chunk with 50 palm trees and 30 rocks becomes 2 MultiMesh draw calls, not 80 individual nodes.
|
||||
|
||||
### The cache problem
|
||||
|
||||
Vegetation meshes and materials are cached in dictionaries keyed by prototype name or texture path. The problem: these caches are **append-only**. Visit all 9 islands and you accumulate every mesh and material variant permanently. With 155 unique prototypes across the archipelago, that's a lot of GPU memory that never gets freed.
|
||||
Vegetation meshes and materials are cached in dictionaries keyed by prototype name. The problem: these caches are **append-only**. Visit all 9 islands and you accumulate every mesh and material variant permanently. With 155 unique prototypes across the archipelago, that's a lot of GPU memory that never gets freed.
|
||||
|
||||
The fix is island-scoped eviction. When the player leaves an island via `TravelToIsland()`, we call `ClearCaches()` on the vegetation grid. Meshes and materials for the departed island are released. If the player returns, they reload from disk (a cache miss, not a crash). The loading screen covers this cost.
|
||||
The fix is island-scoped eviction. When the player leaves an island, we clear the vegetation caches. Meshes and materials for the departed island are released. If the player returns, they reload from disk. The loading screen covers this cost.
|
||||
|
||||
## Layer 3: Async Resource Loading
|
||||
|
||||
Godot's `GD.Load()` is synchronous. It blocks the main thread. When you call it during gameplay, the frame freezes. We audited the entire codebase and found **26 resource load calls across 13 files**, and only 1 was async.
|
||||
Godot's `GD.Load()` is synchronous. It blocks the main thread. During gameplay, the frame freezes. We audited the entire codebase and found **26 resource load calls across 13 files**, and only 1 was async.
|
||||
|
||||
The worst offender was `VegetationGrid.GetMeshForProto()`. As the player walks across an island for the first time, every new vegetation prototype triggers a synchronous `ResourceLoader.Load()` call. With 155 prototypes, the first traversal stutters visibly.
|
||||
The worst offender was `GetMeshForProto()` in the vegetation grid. As the player walks across an island for the first time, every new vegetation prototype triggers a synchronous load. With 155 prototypes, the first traversal stutters visibly.
|
||||
|
||||
We addressed this in two ways:
|
||||
We fixed this in two ways:
|
||||
|
||||
- **Pre-warm during loading screens.** When an island is imported, we kick off background loads for all known prototypes. By the time the player gains control, most meshes are already cached.
|
||||
- **Async loading for biome textures.** Terrain3D textures use `ResourceLoader.LoadThreadedRequest()` with `_Process()` polling. The terrain renders immediately with autoshader colours, and biome textures pop in when ready. The player never notices.
|
||||
- **Async loading for biome textures.** Terrain textures use `ResourceLoader.LoadThreadedRequest()` with `_Process()` polling. The terrain renders immediately with autoshader colours, and biome textures pop in when ready. The player never notices.
|
||||
|
||||
### The Godot ResourceLoader cache trap
|
||||
### The ResourceLoader cache trap
|
||||
|
||||
On top of our own caches, Godot maintains an internal resource cache. Every `GD.Load()` call caches the result globally. There's no API to query the cache size or evict entries.
|
||||
|
||||
This means if you load an FBX as a `PackedScene`, instantiate it to extract a mesh, then free the instance --- the PackedScene **stays cached**. The mesh you extracted is fine (it's a Resource, not a Node), but the discarded scene wastes memory forever.
|
||||
If you load an FBX as a `PackedScene`, instantiate it to extract a mesh, then free the instance --- the PackedScene **stays cached**. The mesh you extracted is fine (it's a Resource, not a Node), but the discarded scene wastes memory forever.
|
||||
|
||||
The rule: use `ResourceLoader.Load(path, "", CacheMode.Ignore)` for one-shot loads where you extract data and discard the container. Use `GD.Load()` only for things that should persist (shaders, shared textures).
|
||||
|
||||
## Layer 4: Entity Rendering
|
||||
|
||||
Dynamic entities --- colonists, animals, buildings, VFX --- are event-driven, not streamed. They update when the sim pushes new state, not per frame.
|
||||
Dynamic entities --- colonists, animals, buildings, VFX --- are event-driven, not streamed. They update when the simulation pushes new state, not per frame.
|
||||
|
||||
- **Crowd rendering:** Single MultiMesh for up to 2000 colonists. Positions lerped per frame from pre-allocated arrays. Labels distance-culled, capped at 20. This is how you do crowds in Godot --- no individual nodes, no per-frame allocation.
|
||||
- **Animals:** One MultiMesh per type (boar, deer, bird, fish). Max 500 per type. Updates only on state change, not per frame.
|
||||
- **Buildings:** Tracked by ID from sim state. `QueueFree` when the sim says they're gone. Self-cleaning.
|
||||
- **VFX:** Capped at 50 active particle systems. Worst case: 10,000 GPU particles. Trivial for modern hardware.
|
||||
- **Crowd rendering:** Single MultiMesh for up to 2000 colonists. Positions lerped per frame from pre-allocated arrays. Labels distance-culled, capped at 20. No individual nodes, no per-frame allocation.
|
||||
- **Animals:** One MultiMesh per type. Max 500 per type. Updates only on state change, not per frame.
|
||||
- **Buildings:** Tracked by ID from sim state. `QueueFree` when removed. Self-cleaning.
|
||||
- **VFX:** Capped at 50 active particle systems. Worst case: 10,000 GPU particles.
|
||||
|
||||
## Memory Safety: Zero Leaks
|
||||
|
||||
We audited every `QueueFree()` call in the codebase --- 47 calls across 17 files. **Zero `RemoveChild()` calls without a corresponding `QueueFree()`.** The codebase is clean.
|
||||
We audited every `QueueFree()` call in the codebase --- 47 calls across 17 files. **Zero `RemoveChild()` calls without a corresponding `QueueFree()`.** Three patterns we follow everywhere:
|
||||
|
||||
Three patterns we follow everywhere:
|
||||
**Pattern 1: Chunk streaming** --- Deactivate out-of-range chunks by iterating the active dict, calling `QueueFree()`, collecting keys to remove, then removing them after iteration. Never modify a dictionary while iterating it.
|
||||
|
||||
**Pattern 1: Chunk streaming with spatial grid**
|
||||
**Pattern 2: Extract data from PackedScene** --- Instantiate a scene, extract the mesh, `QueueFree()` the temporary instance. The mesh survives because it's a Resource, not a Node.
|
||||
|
||||
Deactivate out-of-range chunks by iterating the active dict, calling `QueueFree()`, collecting keys to remove, then removing them after iteration. Never modify a dictionary while iterating it.
|
||||
|
||||
**Pattern 2: Extract data from PackedScene**
|
||||
|
||||
Instantiate a scene, extract the mesh or data you need, `QueueFree()` the temporary instance. The mesh survives because it's a Resource, not a Node. Used by VegetationGrid, TreeTypeRegistry, TreeRenderer, PlayerController.
|
||||
|
||||
**Pattern 3: UI rebuild**
|
||||
|
||||
`QueueFree()` all children, then build new content. Safe because `QueueFree` is deferred --- new children are added in the same frame before old ones are freed.
|
||||
**Pattern 3: UI rebuild** --- `QueueFree()` all children, then build new content. Safe because `QueueFree` is deferred --- new children are added in the same frame before old ones are freed.
|
||||
|
||||
## What Runs Every Frame
|
||||
|
||||
We're strict about what goes in `_Process()`. Here's the complete list:
|
||||
We're strict about what goes in `_Process()`:
|
||||
|
||||
- **VegetationGrid:** Camera chunk check (0.5s throttle, early-exits if same chunk)
|
||||
- **Terrain3DManager:** Poll async texture loads (loop pending list, check status)
|
||||
- **CrowdRenderer:** Lerp 2000 colonist positions (math-only, pre-allocated arrays)
|
||||
- **DayNightController:** Rotate sun, adjust light energy
|
||||
- **ThirdPersonCamera:** Follow + zoom smoothing
|
||||
- **SimBridge:** Drain WebSocket message queue
|
||||
- **Vegetation grid:** Camera chunk check (0.5s throttle, early-exits if same chunk)
|
||||
- **Terrain manager:** Poll async texture loads (loop pending list, check status)
|
||||
- **Crowd renderer:** Lerp 2000 colonist positions (math-only, pre-allocated arrays)
|
||||
- **Day/night:** Rotate sun, adjust light energy
|
||||
- **Camera:** Follow + zoom smoothing
|
||||
- **Sim bridge:** Drain WebSocket message queue
|
||||
|
||||
Total per-frame overhead is dominated by the crowd lerp and the message queue drain. No heap allocation in any of these.
|
||||
No heap allocation in any of these. Total per-frame overhead is dominated by the crowd lerp and the message queue drain.
|
||||
|
||||
## Shaders We Watch
|
||||
|
||||
Two of our 6 custom shaders are flagged as performance-sensitive:
|
||||
Two custom shaders are performance-sensitive:
|
||||
|
||||
**Ocean shader** --- 4 Gerstner wave calculations in the vertex stage, applied to a 12,000m plane with 16,641 vertices. Fragment stage does depth reconstruction, caustics (4x sin ops), foam masking, and two normal map lookups. It looks beautiful but it's the heaviest thing in the render pipeline. We pre-warm it during the loading screen to avoid shader compilation stutter on first frame.
|
||||
**Ocean shader** --- 4 Gerstner wave calculations in the vertex stage, applied to a 12,000m plane. Fragment stage does depth reconstruction, caustics, foam masking, and two normal map lookups. It's the heaviest thing in the render pipeline. We pre-warm it during the loading screen to avoid shader compilation stutter.
|
||||
|
||||
**Wind sway shader** --- 6 trig ops per vertex on every vegetation mesh within 400m. The sway is invisible beyond 100m but the shader runs at full cost regardless. Future optimisation: disable sway on distant chunks or switch to a single-axis approximation.
|
||||
**Wind sway shader** --- 6 trig ops per vertex on every vegetation mesh within 400m. The sway is invisible beyond 100m but the shader runs at full cost regardless. Future optimisation: disable sway on distant chunks.
|
||||
|
||||
## The Target: RTX 3060
|
||||
|
||||
Our early access target is an RTX 3060 with 8GB VRAM. The rule is simple:
|
||||
Our early access target is an RTX 3060 with 8GB VRAM:
|
||||
|
||||
- If main island + full vegetation < 4GB VRAM --- ship it, we have 4GB headroom
|
||||
- If approaching 6--8GB --- implement lazy terrain nodes + cache eviction
|
||||
- If exceeding 8GB --- implement everything through vegetation LOD and region-level streaming
|
||||
- Main island + full vegetation < 4GB VRAM --- ship it, we have headroom
|
||||
- Approaching 6--8GB --- implement lazy terrain nodes + cache eviction
|
||||
- Exceeding 8GB --- implement vegetation LOD and region-level streaming
|
||||
|
||||
**Always measure before optimising.** We added VRAM logging before writing a single line of optimisation code. Half the "problems" we expected turned out to be non-issues. The other half were worse than expected. Profiling isn't optional.
|
||||
|
||||
@@ -121,4 +113,4 @@ Our early access target is an RTX 3060 with 8GB VRAM. The rule is simple:
|
||||
|
||||
Godot 4 can handle open worlds at this scale, but it won't do it for you. You need to build streaming, manage your own caches, audit your resource loading, and be disciplined about what runs per frame. The engine gives you the primitives --- MultiMesh, `LoadThreadedRequest`, `QueueFree` --- and it's up to you to wire them into a system that scales.
|
||||
|
||||
We're building Ariki with these systems and shipping to early access. If you're building something large-scale in Godot, we hope this is useful.
|
||||
We're building with these systems and developing the game using [Tinqs Studio](https://tinqs.com). If you're building something large-scale in Godot, we hope this is useful.
|
||||
|
||||
Reference in New Issue
Block a user