Blog posts covering agentic workflows, Gitea fork, Godot optimisation, studio CLI, and fal.ai image generation for game dev. Skills: image-generation (fal.ai), concept-art-pipeline, sora2-video, tripo-browser-workflow, blog authoring. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
9.5 KiB
title, slug, date, description, og_description, og_image, excerpt, author, author_initials, author_role
| title | slug | date | description | og_description | og_image | excerpt | author | author_initials | author_role |
|---|---|---|---|---|---|---|---|---|---|
| Streaming a 12km Archipelago in Godot 4 | godot-optimisation | 2026-05-22 | How we built four streaming layers, async resource loading, and memory-safe caches to run a 12km open world in Godot 4 with C# --- without a single memory leak. | Four streaming layers, async loading, and zero memory leaks --- how we optimise Godot for a survival colony sim. | https://www.tinqs.com/img/og-cover.jpg | Four streaming layers, async resource loading, memory-safe caches, and zero leaks. How we optimise Godot to run a 12km open world with C# and Terrain3D. | Ozan Bozkurt | OB | CTO & Developer, Tinqs |
Godot has no built-in asset streaming. Our game is a 12km x 12km archipelago with 9 islands, thousands of trees, hundreds of buildings, and an ocean that never ends. Here's how we made it run.
The Problem
Ariki is a survival colony sim set across 9 islands in a Polynesian-inspired archipelago. The total world is roughly 12km x 12km. Each island is 4km across with its own terrain heightmap, biome textures, vegetation prototypes, and building grids. The player can travel between islands by canoe.
Godot 4 is a fantastic engine, but it wasn't designed for this scale. There's no terrain streaming, no asset LOD pipeline, no distance-based loading. If you load everything at startup, you run out of VRAM before the player sees the main menu. So we built four streaming layers on top of Godot, all in C#.
Layer 1: Terrain3D Regions
We use Terrain3D for our heightmaps --- a GDExtension that gives us a clipmap renderer with 7 LOD levels. Internally, Terrain3D divides each island into 512m x 512m regions. A 4km island has 64 internal regions. Across 9 islands, that's 576 regions total.
The key insight: don't create all 9 Terrain3D nodes at startup. Each node allocates a clipmap mesh, collision structures, and materials even when hidden. Our original code created all 9 in _Ready() and just toggled visibility. This wasted hundreds of megabytes on islands the player hadn't visited yet.
The fix was lazy instantiation. We create the current island's terrain on startup and defer the rest to TravelToIsland(). When the player gets in a canoe and sails to a new island, we create that island's Terrain3D node on demand, import the heightmap, and start async texture loading --- all while a loading screen covers the transition.
Layer 2: Vegetation Chunks (128m Grid)
This is the main prop streaming system and where most of the complexity lives. Every island's vegetation --- trees, rocks, grasses, shrubs --- is divided into a spatial grid of 128m x 128m chunks.
The camera position is checked every 0.5 seconds. When it crosses a chunk boundary, we calculate which chunks should be active within a 400m radius (roughly 39 chunks in a circle), QueueFree chunks that fell out of range, and build new chunks that entered range.
Each chunk groups vegetation instances by prototype, creates a MultiMesh per group, and places instances using Terrain3D height queries. This means a chunk with 50 palm trees and 30 rocks becomes 2 MultiMesh draw calls, not 80 individual nodes.
The cache problem
Vegetation meshes and materials are cached in dictionaries keyed by prototype name or texture path. The problem: these caches are append-only. Visit all 9 islands and you accumulate every mesh and material variant permanently. With 155 unique prototypes across the archipelago, that's a lot of GPU memory that never gets freed.
The fix is island-scoped eviction. When the player leaves an island via TravelToIsland(), we call ClearCaches() on the vegetation grid. Meshes and materials for the departed island are released. If the player returns, they reload from disk (a cache miss, not a crash). The loading screen covers this cost.
Layer 3: Async Resource Loading
Godot's GD.Load() is synchronous. It blocks the main thread. When you call it during gameplay, the frame freezes. We audited the entire codebase and found 26 resource load calls across 13 files, and only 1 was async.
The worst offender was VegetationGrid.GetMeshForProto(). As the player walks across an island for the first time, every new vegetation prototype triggers a synchronous ResourceLoader.Load() call. With 155 prototypes, the first traversal stutters visibly.
We addressed this in two ways:
- Pre-warm during loading screens. When an island is imported, we kick off background loads for all known prototypes. By the time the player gains control, most meshes are already cached.
- Async loading for biome textures. Terrain3D textures use
ResourceLoader.LoadThreadedRequest()with_Process()polling. The terrain renders immediately with autoshader colours, and biome textures pop in when ready. The player never notices.
The Godot ResourceLoader cache trap
On top of our own caches, Godot maintains an internal resource cache. Every GD.Load() call caches the result globally. There's no API to query the cache size or evict entries.
This means if you load an FBX as a PackedScene, instantiate it to extract a mesh, then free the instance --- the PackedScene stays cached. The mesh you extracted is fine (it's a Resource, not a Node), but the discarded scene wastes memory forever.
The rule: use ResourceLoader.Load(path, "", CacheMode.Ignore) for one-shot loads where you extract data and discard the container. Use GD.Load() only for things that should persist (shaders, shared textures).
Layer 4: Entity Rendering
Dynamic entities --- colonists, animals, buildings, VFX --- are event-driven, not streamed. They update when the sim pushes new state, not per frame.
- Crowd rendering: Single MultiMesh for up to 2000 colonists. Positions lerped per frame from pre-allocated arrays. Labels distance-culled, capped at 20. This is how you do crowds in Godot --- no individual nodes, no per-frame allocation.
- Animals: One MultiMesh per type (boar, deer, bird, fish). Max 500 per type. Updates only on state change, not per frame.
- Buildings: Tracked by ID from sim state.
QueueFreewhen the sim says they're gone. Self-cleaning. - VFX: Capped at 50 active particle systems. Worst case: 10,000 GPU particles. Trivial for modern hardware.
Memory Safety: Zero Leaks
We audited every QueueFree() call in the codebase --- 47 calls across 17 files. Zero RemoveChild() calls without a corresponding QueueFree(). The codebase is clean.
Three patterns we follow everywhere:
Pattern 1: Chunk streaming with spatial grid
Deactivate out-of-range chunks by iterating the active dict, calling QueueFree(), collecting keys to remove, then removing them after iteration. Never modify a dictionary while iterating it.
Pattern 2: Extract data from PackedScene
Instantiate a scene, extract the mesh or data you need, QueueFree() the temporary instance. The mesh survives because it's a Resource, not a Node. Used by VegetationGrid, TreeTypeRegistry, TreeRenderer, PlayerController.
Pattern 3: UI rebuild
QueueFree() all children, then build new content. Safe because QueueFree is deferred --- new children are added in the same frame before old ones are freed.
What Runs Every Frame
We're strict about what goes in _Process(). Here's the complete list:
- VegetationGrid: Camera chunk check (0.5s throttle, early-exits if same chunk)
- Terrain3DManager: Poll async texture loads (loop pending list, check status)
- CrowdRenderer: Lerp 2000 colonist positions (math-only, pre-allocated arrays)
- DayNightController: Rotate sun, adjust light energy
- ThirdPersonCamera: Follow + zoom smoothing
- SimBridge: Drain WebSocket message queue
Total per-frame overhead is dominated by the crowd lerp and the message queue drain. No heap allocation in any of these.
Shaders We Watch
Two of our 6 custom shaders are flagged as performance-sensitive:
Ocean shader --- 4 Gerstner wave calculations in the vertex stage, applied to a 12,000m plane with 16,641 vertices. Fragment stage does depth reconstruction, caustics (4x sin ops), foam masking, and two normal map lookups. It looks beautiful but it's the heaviest thing in the render pipeline. We pre-warm it during the loading screen to avoid shader compilation stutter on first frame.
Wind sway shader --- 6 trig ops per vertex on every vegetation mesh within 400m. The sway is invisible beyond 100m but the shader runs at full cost regardless. Future optimisation: disable sway on distant chunks or switch to a single-axis approximation.
The Target: RTX 3060
Our early access target is an RTX 3060 with 8GB VRAM. The rule is simple:
- If main island + full vegetation < 4GB VRAM --- ship it, we have 4GB headroom
- If approaching 6--8GB --- implement lazy terrain nodes + cache eviction
- If exceeding 8GB --- implement everything through vegetation LOD and region-level streaming
Always measure before optimising. We added VRAM logging before writing a single line of optimisation code. Half the "problems" we expected turned out to be non-issues. The other half were worse than expected. Profiling isn't optional.
Godot 4 can handle open worlds at this scale, but it won't do it for you. You need to build streaming, manage your own caches, audit your resource loading, and be disciplined about what runs per frame. The engine gives you the primitives --- MultiMesh, LoadThreadedRequest, QueueFree --- and it's up to you to wire them into a system that scales.
We're building Ariki with these systems and shipping to early access. If you're building something large-scale in Godot, we hope this is useful.