blog/posts/fal-image-generation.md at 6be7664ca8f10eb6ac866bf26b76d759591e913d

tinqs/blog

Files

T

ozan 6be7664ca8 Initial blog repo: 5 posts, 5 skills, CC BY 4.0

Blog posts covering agentic workflows, Gitea fork, Godot optimisation,
studio CLI, and fal.ai image generation for game dev.

Skills: image-generation (fal.ai), concept-art-pipeline, sora2-video,
tripo-browser-workflow, blog authoring.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-05-25 22:41:40 +01:00

11 KiB

Raw Blame History

title, slug, date, description, og_description, og_image, excerpt, author, author_initials, author_role

title	slug	date	description	og_description	og_image	excerpt	author	author_initials	author_role
AI Art at Scale: Using fal.ai Flux for Game Asset Generation	fal-image-generation	2026-05-25	How we use fal.ai Flux models to generate concept art, trailer frames, and UI assets for our game --- with a 4-layer prompt pattern that actually works.	fal.ai Flux for game art: 4-layer prompts, $0.01/image, and a pipeline that replaced our concept art bottleneck.	https://www.tinqs.com/img/og-cover.jpg	We generate concept art, trailer frames, and UI icons with fal.ai Flux models at $0.01 per image. Here's the prompt engineering pattern that makes it work for game dev.	Ozan Bozkurt	OB	CTO & Developer, Tinqs

We're a 4-person indie studio building a survival colony sim. We don't have a concept artist on staff. Every piece of character art, trailer frame, and UI icon in our game was generated with fal.ai Flux models --- at roughly a penny per image.

The Problem with AI Art for Games

Most AI image generators produce beautiful images that are completely useless for game development. They look great on Twitter but fall apart when you need consistency: the same character from four angles, a UI icon that reads at 64x64, a trailer frame that matches your game's art style rather than whatever Midjourney thinks looks cool today.

The issue isn't the models --- Flux is genuinely good. The issue is prompting. When you write "Polynesian warrior on a beach," you get a different art style every time. Different skin tones, different proportions, different lighting. You can't build a game from that.

We spent three months iterating on prompt patterns before we found something that works consistently. The result is a 4-layer system that anchors the model to your art direction and produces images you can actually ship.

Why fal.ai

We evaluated Midjourney, DALL-E 3, Stable Diffusion (self-hosted), and fal.ai. The decision came down to:

API-first. Midjourney is Discord-only. DALL-E's API works but the model makes everything look like a stock photo. Stable Diffusion self-hosted means maintaining GPU infrastructure. fal.ai gives you Flux models behind a simple REST API --- POST a prompt, GET an image URL.

Cost. $0.01 per image with flux-2-pro. $0.004 with schnell for rapid iteration. A full character design session --- 12 variants across 3 rounds of refinement --- costs $0.12. A 20-frame trailer storyboard costs $0.20. At these prices, the bottleneck is creative direction, not budget.

Speed. flux/schnell returns an image in 4 seconds. flux-2-pro in 15 seconds. Fast enough that the AI agent can generate, display, get feedback, and regenerate in a single conversation turn.

No subscription. Pay per image. No monthly fee, no credit packs that expire, no tier-gated features.

The 4-Layer Prompt Pattern

This is the pattern that made AI art actually usable for our game. Each layer adds specificity, and the combination anchors the model to a consistent output.

Layer 1: Design Context

This is the most important layer and the one most people skip. It sets the overall art direction for everything that follows:

Art direction: stylized 3D render for a survival colony sim set in a 
Polynesian archipelago. Warm earthy palette — browns, tans, dark reds, 
cream, ocean blues. Carved wood textures, koru spirals, woven pandanus 
patterns. Moana-meets-Valheim aesthetic. Game engine quality, not 
photorealistic.

This paragraph appears at the start of every prompt. It's the same paragraph whether we're generating a character, a landscape, or an icon. It anchors the model to our art style.

The key insight: write this once, paste it everywhere. It's your art bible compressed into 50 words. Every time we skipped it --- "just a quick test" --- the output drifted into generic fantasy art.

Layer 2: Scene Description

Describe exactly what should appear, element by element:

Full body character in T-pose, front view. Young Polynesian woman, 
mid-20s. Wearing a woven pandanus wrap skirt (mid-thigh length) and 
a fitted tapa cloth top. Cowrie shell necklace with a carved bone 
pendant. Single bone bracelet on left wrist. Hair swept back over 
right shoulder, decorated with a red hibiscus. Bare feet. 
Matte skin, warm brown tones. Neutral confident expression — 
not smiling, not angry. Dark grey background.

Notice the specificity. Not "tribal clothing" but "woven pandanus wrap skirt." Not "jewelry" but "cowrie shell necklace with a carved bone pendant." Not "looks determined" but "neutral confident expression --- not smiling, not angry."

Vague prompts produce vague results. Specific prompts produce usable assets.

Layer 3: Negative Prompt

Always include what you don't want:

Do not include: cartoon style, anime style, photorealistic render, 
extra text or taglines, watermark, deformed elements, modern or 
sci-fi, European crown or castle motifs. No extra fingers, no 
merged limbs, no floating accessories.

We extend this per-subject. For characters: "no grass skirts, no feather headdresses, no Disney-adjacent designs." For environments: "no modern buildings, no metal structures." The negative prompt is as important as the positive one.

Layer 4: Reference Images

When you need consistency across multiple images --- the same character from different angles, or a new character matching an existing one --- pass a reference image:

result = fal_client.subscribe("fal-ai/flux-2-pro", arguments={
    "prompt": "Same character, side view, same clothing and accessories...",
    "image_url": "https://your-approved-front-view.png",
    "image_size": "square_hd",
})

This is how we maintain consistency. The first approved image becomes the reference for all subsequent views. Without it, you get a different person every time.

The Model Lineup

We use four models for different purposes:

Model	Cost	Speed	When
`flux-2-pro`	$0.01	~15s	Final art. Our default for anything we'll ship.
`flux/schnell`	$0.004	~4s	Exploration and iteration. Generate 5 variants fast.
`ideogram/v2`	$0.008	~5s	Anything with readable text --- logos, UI, posters.
`flux-pro/v1.1-ultra`	$0.015	~8s	Highest quality, but can hang. We mostly avoid it.

The workflow: explore with schnell, refine with flux-2-pro, add text with ideogram/v2.

How This Fits Our Pipeline

We don't use fal.ai in isolation. It's the first step in a pipeline that goes from idea to in-game asset:

Brief → fal.ai (2D concept art) → Tripo Studio (3D model) → Blender (decimate) → Godot (in-game)

Brief. The designer describes the character: "Young woman, navigator role, practical clothing, distinctive hair."
2D generation. We generate 3 variants with flux-2-pro, score each on a rubric (style match, cultural accuracy, silhouette, expression, technical animatability), and pick the best.
Reference sheet. We generate front, side, three-quarter, and head closeup views using the winner as a reference image.
3D model. The approved front-view concept art goes into Tripo Studio for image-to-3D generation. Tripo outputs a ~1.5M face mesh with full PBR textures.
Decimation. Blender CLI decimates to 25,000 faces for LOD0.
Rigging. Mixamo auto-rigs the body (hair separated first if it's large).
In-game. Import into Godot, set up materials, done.

The entire pipeline from "I want a character" to "character walking around in the game" takes about 2 hours. No concept artist required. No 3D modeller required. The quality isn't AAA, but for an indie game with a stylised art style, it's more than good enough.

What We Learned

The design context layer is everything. Without it, every image is a one-off. With it, every image belongs to the same game. We tried generating without the context block "just to see what happens." The result was beautiful art that looked nothing like our game. The 50-word context block is worth more than the rest of the prompt combined.

Negative prompts prevent drift. AI models have strong defaults --- they want to make things shiny, symmetrical, and photorealistic. If your game isn't those things, you need to say so explicitly. Our "no metallic sheen, no Disney-adjacent, no photorealistic" negatives are load-bearing.

Score and iterate, don't accept the first output. We generate 3 variants, score each on 5 criteria (style, culture, expression, silhouette, technical), and only approve scores of 8+. The first generation is rarely the best. Three attempts at $0.01 each is $0.03 --- cheaper than the time spent working around a mediocre image.

Reference images are the consistency mechanism. Without them, every generation is independent. With them, every generation builds on the last approved output. This is how you get a roster of 10 characters that look like they belong in the same game.

Fast models for exploration, quality models for output. schnell at $0.004 and 4 seconds is perfect for "what if we tried..." iterations. flux-2-pro at $0.01 and 15 seconds is for "yes, this is the one." Never use your final model for exploratory work.

The AI agent is the art director. We don't manually craft prompts. Our AI agent (running in Cursor) has a skill file that encodes the entire 4-layer pattern, our art style guide, and our cultural guardrails. We tell the agent "design a navigator character" and it writes the full prompt, generates the images, displays them inline, and asks for scores. The human's job is creative direction: "more asymmetric accessories, less jewelry, hair over the other shoulder." The agent handles the prompt engineering.

The Numbers

Characters designed: 10 (full roster for early access)
Total images generated: ~400 across all iterations
Total cost: ~$6 in fal.ai credits
Time per character: ~30 minutes from brief to approved reference sheet
Pipeline time: ~2 hours from approved concept art to in-game model
Models used: flux-2-pro (80%), schnell (15%), ideogram/v2 (5%)

Publishing Our Skills

We've open-sourced the skill files that power this workflow. A skill is a markdown document that teaches an AI agent a specific procedure --- like a runbook, but the reader is an LLM.

You can find them in our blog repo:

Image Generation --- the fal.ai integration with the 4-layer prompt pattern
Concept Art Pipeline --- the full 2D-to-3D workflow
Tripo 3D --- text-to-3D and image-to-3D model generation
Sora 2 Video --- trailer clip generation

Drop any of these into your .cursor/skills/ directory and your AI agent can follow them. Adapt the design context block to your game's art style and you're good to go.

AI image generation isn't magic and it isn't free. But at a penny per image, with the right prompt structure, it replaces the most expensive bottleneck in indie game development: the gap between "I know what this should look like" and "I have an image I can actually use." For a team of four with no dedicated artist, that gap used to be weeks. Now it's minutes.

11 KiB Raw Blame History