Compare commits
2 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 074fa08bb7 | |||
| 69d74be6ba |
+8
-1
@@ -163,7 +163,14 @@
|
||||
<a href="pi-flow-native-brain" class="blog-card">
|
||||
<span class="blog-card__date">4 June 2026</span>
|
||||
<h2 class="blog-card__title">How Pi Agents Build, Test, and Ship Game Code with Oracle-Backed Flows</h2>
|
||||
<p class="blog-card__excerpt">We type a slash command, agents fan out through five oracle gates, the game-builder fixes 19 red tests while vision judges check the live game — and it all runs as one autonomous flow.</p>
|
||||
<p class="blog-card__excerpt">A flow spawns, agents fan out through five oracle gates, the game-builder fixes 19 red tests while vision judges check the live game — and it all runs as one autonomous flow.</p>
|
||||
<span class="blog-card__read">Read →</span>
|
||||
</a>
|
||||
|
||||
<a href="voice-missing-input-game-dev" class="blog-card">
|
||||
<span class="blog-card__date">10 June 2026</span>
|
||||
<h2 class="blog-card__title">Why Voice Is the Missing Input for Game Development</h2>
|
||||
<p class="blog-card__excerpt">Speaking a bug while looking at the screen beats typing it from memory ten minutes later. Here's how voice-to-agent pipelines work, why game dev is the ideal use case, and what changes when you stop typing bug reports.</p>
|
||||
<span class="blog-card__read">Read →</span>
|
||||
</a>
|
||||
|
||||
|
||||
|
Before
After
|
@@ -457,7 +457,7 @@
|
||||
</div>
|
||||
<div class="kitchen-col">
|
||||
<span class="kitchen-col__title kitchen-col__title--reality">In the Flow</span>
|
||||
<p><span class="gate gate--visual">G5 · Visual</span> Captures 8 frames at 100ms intervals, grids them, feeds to <code>gemini-2.5-flash</code>. Checks: T-pose? Foot-slide? Frozen animation? Wrong clip? Missing transitions?</p>
|
||||
<p><span class="gate gate--visual">G5 · Visual</span> Captures 8 frames at 100ms intervals, grids them, feeds to <code>minimax-latest</code>. Checks: T-pose? Foot-slide? Frozen animation? Wrong clip? Missing transitions?</p>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
@@ -570,7 +570,7 @@ steps:
|
||||
# G0: Pre-flight — validate vision CAN run before any build work
|
||||
- id: preflight
|
||||
agent: vision-preflight
|
||||
task: Check GEMINI_API_KEY is set AND game_frames reaches a live instance.
|
||||
task: Check MINIMAX_API_KEY is set AND game_frames reaches a live instance.
|
||||
If EITHER fails, STOP — vision is not optional.
|
||||
|
||||
# Context + plan
|
||||
@@ -596,7 +596,7 @@ steps:
|
||||
- id: tests → agent: test-runner
|
||||
- id: behavior → agent: behavioral-prober (drives LIVE game via drive_game)
|
||||
- id: feel → agent: feel-judge (apex, airtime, latency, rise/fall)
|
||||
- id: visual → agent: animation-vision-judge (multimodal gemini-2.5-flash)
|
||||
- id: visual → agent: animation-vision-judge (multimodal minimax-latest)
|
||||
|
||||
# Self-recurring fix-loop: bounded loop back to implement with evidence
|
||||
- id: fix-loop
|
||||
@@ -612,13 +612,13 @@ steps:
|
||||
|
||||
<p>Eighteen steps, seven cooks, five inspection points, one head chef. Triggered by a single order ticket.</p>
|
||||
|
||||
<p>Here's how the brigade actually worked. The <strong>vision-preflight</strong> agent — the chef who checks the gas is on before anyone starts cooking — verified <code>GEMINI_API_KEY</code> was set and <code>game_frames</code> could reach the live game. Both green in under a second. Without this, the whole kitchen would prep for an hour only to discover the oven doesn't work.</p>
|
||||
<p>Here's how the brigade actually worked. The <strong>vision-preflight</strong> agent — the chef who checks the gas is on before anyone starts cooking — verified <code>MINIMAX_API_KEY</code> was set and <code>game_frames</code> could reach the live game. Both green in under a second. Without this, the whole kitchen would prep for an hour only to discover the oven doesn't work.</p>
|
||||
|
||||
<p>The <strong>project-context-reader</strong> — the commis who reads the entire recipe book — ingested <code>PlayerController.cs</code>, <code>PlayerAnimController.cs</code>, <code>PlayerAnimationLogic.cs</code>, the test files, the manifest. The <strong>feature-planner</strong> — the sous-chef who breaks down the order into station tasks — decomposed 19 failures into four fix groups: vegetation manifest (146 broken <code>prefabPath</code> items), animation controller (crouch parameter not plumbed), jump physics (coyote time, variable height, air control — all missing), and animation tree (entire state machine absent).</p>
|
||||
|
||||
<p>Then the <strong>game-builder</strong> — the line cook at the hot station — read each test failure like a dish ticket, traced it to the source, and started cooking. Coyote time: 100ms grace period after feet leave the ground. Variable jump height: velocity scaled by hold duration, tap gives 3.5, full hold gives 6.5. Air control: horizontal speed cut 40% while airborne. Jump phases: minimum 0.15s on jump_start before transitioning up. Landing timer: wait the full animation length, not length-minus-blend. Animation tree: <code>jump_start → jump → jump_land</code> states with 0.1s blends.</p>
|
||||
|
||||
<p>Then the inspection line: <strong>build-verifier</strong> compiled. <strong>Test-runner</strong> ran the suite. <strong>Behavioral-prober</strong> sent <code>{"jump":true}</code> to the live game and sampled the player body. <strong>Feel-judge</strong> measured apex, airtime, liftoff latency. <strong>Animation-vision-judge</strong> captured 8 frames, gridded them, had <code>gemini-2.5-flash</code> scan for T-poses and foot-slide.</p>
|
||||
<p>Then the inspection line: <strong>build-verifier</strong> compiled. <strong>Test-runner</strong> ran the suite. <strong>Behavioral-prober</strong> sent <code>{"jump":true}</code> to the live game and sampled the player body. <strong>Feel-judge</strong> measured apex, airtime, liftoff latency. <strong>Animation-vision-judge</strong> captured 8 frames, gridded them, had <code>minimax-latest</code> scan for T-poses and foot-slide.</p>
|
||||
|
||||
<p>Anything red → ticket back to the cook with the specific failure → fix → re-enter the line. Bounded to 3 returns. Anything green → falls through. All green → <strong>game-judge</strong> gives the final verdict.</p>
|
||||
|
||||
@@ -751,14 +751,14 @@ Context: ${{input.context}}</code></pre>
|
||||
<tr style="border-bottom:1px solid #1c2230;"><td style="padding:7px 12px;color:#e6edf3;"><code>@planning</code></td><td style="padding:7px 12px;color:#f59e0b;">DeepSeek V4</td><td style="padding:7px 12px;color:#cdd7e2;"><strong>Boning knife</strong> — precision decomposition. Breaks tasks into steps, designs DAGs. Flow architect, feature planner.</td></tr>
|
||||
<tr style="border-bottom:1px solid #1c2230;"><td style="padding:7px 12px;color:#e6edf3;"><code>@fast</code></td><td style="padding:7px 12px;color:#38bdf8;">DeepSeek V4 Flash</td><td style="padding:7px 12px;color:#cdd7e2;"><strong>Paring knife</strong> — quick, decisive cuts. Gate pass/fail, fork choices, loop exits. No overthinking.</td></tr>
|
||||
<tr style="border-bottom:1px solid #1c2230;"><td style="padding:7px 12px;color:#e6edf3;"><code>@research</code></td><td style="padding:7px 12px;color:#f59e0b;">DeepSeek V4</td><td style="padding:7px 12px;color:#cdd7e2;"><strong>Fillet knife</strong> — flexible, follows contours. Reads codebase, traces patterns, finds what matters.</td></tr>
|
||||
<tr style="border-bottom:1px solid #1c2230;"><td style="padding:7px 12px;color:#e6edf3;"><code>@vision</code></td><td style="padding:7px 12px;color:#a855f7;">Gemini 2.5 Flash</td><td style="padding:7px 12px;color:#cdd7e2;"><strong>The inspector's eyes</strong> — the only knife that sees. Multimodal frame judging: T-poses, foot-slide, frozen anims.</td></tr>
|
||||
<tr style="border-bottom:1px solid #1c2230;"><td style="padding:7px 12px;color:#e6edf3;"><code>@vision</code></td><td style="padding:7px 12px;color:#a855f7;">MiniMax latest</td><td style="padding:7px 12px;color:#cdd7e2;"><strong>The inspector's eyes</strong> — the only knife that sees. Multimodal frame judging: T-poses, foot-slide, frozen anims.</td></tr>
|
||||
<tr><td style="padding:7px 12px;color:#e6edf3;"><code>@compact</code></td><td style="padding:7px 12px;color:#38bdf8;">DeepSeek V4 Flash</td><td style="padding:7px 12px;color:#cdd7e2;"><strong>Kitchen shears</strong> — lightweight, versatile. Summaries, verdicts, post-processing. Fast and cheap.</td></tr>
|
||||
</tbody>
|
||||
</table>
|
||||
|
||||
<div class="callout callout--amber">
|
||||
<span class="callout__kicker">Why DeepSeek?</span>
|
||||
<p>Two reasons. <strong>It's free</strong> — no usage limits, which matters when your game-builder reads 800-line files and writes 200-line diffs ten times a session. <strong>It's genuinely good at C# and Godot</strong> — I've had it write a full lighting module for our Godot fork by reading Unity API docs and adapting patterns. No agent had pulled that off before. DeepSeek can't do multimodal, so vision goes to Gemini — but for everything else, it's the chef's knife you reach for 90% of the time.</p>
|
||||
<p>Two reasons. <strong>It's free</strong> — no usage limits, which matters when your game-builder reads 800-line files and writes 200-line diffs ten times a session. <strong>It's genuinely good at C# and Godot</strong> — I've had it write a full lighting module for our Godot fork by reading Unity API docs and adapting patterns. No agent had pulled that off before. DeepSeek V4 now has multimodal, but for vision we use MiniMax latest — it's sharper at frame-by-frame animation judging and costs less per image. For everything else, DeepSeek is the chef's knife you reach for 90% of the time.</p>
|
||||
</div>
|
||||
|
||||
<p>The point of the knife rack: you configure this <strong>once</strong>. Every agent declares <code>model: @coding</code> and gets DeepSeek V4 automatically. Swap models globally without touching any flow or agent file. The right blade, every time, no thinking required.</p>
|
||||
|
||||
|
Before
After
|
@@ -0,0 +1,111 @@
|
||||
---
|
||||
title: "Why Voice Is the Missing Input for Game Development"
|
||||
slug: voice-missing-input-game-dev
|
||||
date: "2026-06-10"
|
||||
description: "Speaking a bug while you're looking at the screen beats typing it from memory ten minutes later. Voice-to-agent pipelines collapse the gap between noticing a problem and tracking it — and game dev is the perfect use case."
|
||||
og_description: "Voice is the missing input for game dev — speak bugs while you play, let agents file them."
|
||||
og_image: "https://www.tinqs.com/img/og-cover.jpg"
|
||||
excerpt: "Speaking a bug while looking at the screen beats typing it from memory ten minutes later. Here's how voice-to-agent pipelines work, why game dev is the ideal use case, and what changes when you stop typing bug reports."
|
||||
author: "Ozan Bozkurt"
|
||||
author_initials: "OB"
|
||||
author_role: "CTO & Developer, Tinqs"
|
||||
---
|
||||
Every game developer knows this moment. You're playtesting, running through the world, and you see something wrong — a tree floating two meters above the terrain, a UI element clipping, an animation that stutters on frame 14. You make a mental note. Ten minutes later, back at the editor, you try to file it. The coordinates are fuzzy. The exact reproduction steps are gone. You type something vague like "tree floating on west beach maybe" and hope you remember more tomorrow.
|
||||
|
||||
Voice changes this entirely. Speak the bug while you're looking at it, and an agent turns your words into a structured issue — with a screenshot, a vision-model description, coordinates, and a severity estimate. No keyboard. No context switch. No memory loss.
|
||||
|
||||
## The latency that kills bug reports
|
||||
|
||||
The distance between seeing a bug and filing it is a memory decay curve. Every second that passes, your recollection loses precision:
|
||||
|
||||
| Elapsed time | What you remember |
|
||||
|---|---|
|
||||
| 0 seconds | Exact position, camera angle, what you were doing, what's on screen |
|
||||
| 30 seconds | "There was a tree... somewhere west... maybe floating?" |
|
||||
| 5 minutes | "I think there was a rendering issue? Or was it yesterday?" |
|
||||
|
||||
Typed bug reports are reconstructions from decaying memory. Voice bug reports are real-time captures. The difference in quality isn't marginal — it's the difference between a fix you can act on immediately and a ticket that sits in the backlog for three months while someone tries to reproduce it.
|
||||
|
||||
## The pipeline: voice → text → structured issue
|
||||
|
||||
Here's what actually happens when you speak a bug during playtesting:
|
||||
|
||||
```
|
||||
1. You speak: "There's a tree floating two meters above the terrain
|
||||
on the west beach, near the big rock formation. Happens after
|
||||
the vegetation culling pass kicks in around sunset."
|
||||
|
||||
2. Microphone → transcription (Whisper, local or API, ~500ms)
|
||||
|
||||
3. Transcription → agent context window (~100ms)
|
||||
|
||||
4. Agent parses the raw text and extracts:
|
||||
- What: tree floating above terrain
|
||||
- Where: west beach, near rock formation (camera coordinates auto-captured)
|
||||
- When: after vegetation culling, sunset
|
||||
- Severity: medium (visual, not blocking)
|
||||
- Screenshot: captured from the running game engine
|
||||
|
||||
5. Agent files a structured issue with all of the above,
|
||||
tags the rendering engineer, and posts the digest to team chat.
|
||||
|
||||
Total latency: under 2 seconds. You keep playing.
|
||||
```
|
||||
|
||||
This isn't theoretical. The pipeline runs on our own game project, and it's caught bugs that would have slipped through playtesting entirely — the ones you see, make a mental note about, and forget by the time you alt-tab.
|
||||
|
||||
## Why game dev is the perfect voice use case
|
||||
|
||||
**You're already looking at the screen.** Voice input doesn't require switching windows or breaking flow. You're playtesting — your hands are on the controller or WASD, your eyes are on the game. Speaking is the only input channel that doesn't interrupt the thing you're actually doing.
|
||||
|
||||
**Game bugs are spatial and visual.** "The crafting UI text overflows on items with names longer than 20 characters" is something you see, not something you calculate. Describing it verbally while looking at it produces a far richer bug report than typing from memory.
|
||||
|
||||
**Reproduction is half the battle.** When you speak the bug at the moment of occurrence, you naturally include the context: what you were doing, what just happened, what the game state was. You don't have to reconstruct it later.
|
||||
|
||||
**Voice scales to the whole team.** Artists see visual bugs. Designers see balance issues. Producers see UX friction. Not everyone on a game team is a fast typist or comfortable with issue trackers. Everyone can speak.
|
||||
|
||||
## What the agent adds beyond transcription
|
||||
|
||||
Raw transcription is useful — it's a notepad you don't have to type. But the agent layer is what makes voice input a pipeline rather than a dictation tool:
|
||||
|
||||
**Screenshot coordination.** The agent calls the game engine's HTTP API, captures the current frame, and attaches it to the issue. You don't take screenshots. The agent does.
|
||||
|
||||
**Vision model description.** The screenshot goes through a vision model that writes a text description of what's on screen. Future-you searching the issue tracker for "floating tree" finds it even if the transcription was garbled.
|
||||
|
||||
**Coordinates and context.** The game engine provides the player's world position, camera angle, and current game state. The agent bakes these into the issue. A developer can teleport directly to the bug location.
|
||||
|
||||
**Severity and routing.** The agent estimates severity from context ("floating" is visual, "crash" is critical) and tags the right team member. An artist doesn't get pinged for a shader bug. A rendering engineer doesn't get pinged for a UI text overflow.
|
||||
|
||||
## The numbers
|
||||
|
||||
| Method | Time from observation to filed issue | Information loss |
|
||||
|---|---|---|
|
||||
| Mental note → type later | 5-30 minutes | High (positions, steps, context) |
|
||||
| Alt-tab → type immediately | 30-60 seconds | Medium (screenshots missed, flow broken) |
|
||||
| Voice → agent pipeline | 2 seconds | Low (screenshot + position captured automatically) |
|
||||
|
||||
The throughput difference compounds. A 30-minute playtest session with keyboard-only bug filing might yield 3-4 issues, half of them vague. The same session with voice-to-agent produces 10-15 issues, all with screenshots, positions, and reproduction context.
|
||||
|
||||
## Setup is simpler than you think
|
||||
|
||||
You need three things, all of which you probably already have:
|
||||
|
||||
1. **A microphone.** The one in your headset is fine. Transcription models handle suboptimal audio surprisingly well.
|
||||
2. **Transcription.** Whisper runs locally and is free. Cloud APIs are sub-cent per minute. Both work.
|
||||
3. **An agent that speaks your game engine's API.** If your engine has an HTTP interface for screenshots and game state, the agent can wire the rest together. If it doesn't — add one. It's a weekend project.
|
||||
|
||||
The agent itself doesn't need to be custom-built. Any coding agent with tool access can be told "watch the game, transcribe voice input, file issues in the tracker." It's a skill file, not a product.
|
||||
|
||||
## What changes when you stop typing bugs
|
||||
|
||||
The most surprising effect isn't the speed. It's the coverage. When filing a bug costs two seconds of speaking, you file bugs you would have previously ignored. The minor visual glitch. The slight animation hitch. The UI element that's two pixels misaligned.
|
||||
|
||||
Individually these are low-priority. Collectively they're the difference between a game that feels polished and one that feels rough. And they only get caught when the cost of reporting approaches zero.
|
||||
|
||||
The second effect is that playtesting becomes a primary input channel. Instead of structured QA sessions with checklists and forms, you just play the game. The agent captures everything. When you're done, you have a list of filed issues with screenshots and context — generated from your spoken observations in real time.
|
||||
|
||||
Voice isn't a gimmick for game development. It's the input channel that matches the way we actually work — looking at the screen, noticing things, and talking about them. The tools exist. The latency is sub-second. The cost is negligible. The only thing missing is the habit.
|
||||
|
||||
---
|
||||
|
||||
*We build [Tinqs Studio](https://tinqs.com) — a game dev platform with built-in AI agents, git hosting, and creative pipelines. [Ariki](https://arikigame.com) is the survival colony sim we're building with every tool described here.*
|
||||
@@ -0,0 +1,343 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
|
||||
<title>Why Voice Is the Missing Input for Game Development — Tinqs Blog</title>
|
||||
<meta name="description" content="Speaking a bug while you're looking at the screen beats typing it from memory ten minutes later. Voice-to-agent pipelines collapse the gap between noticing a problem and tracking it — and game dev is the perfect use case.">
|
||||
<meta name="robots" content="index, follow">
|
||||
<link rel="canonical" href="https://www.tinqs.com/blog/voice-missing-input-game-dev">
|
||||
|
||||
<meta property="og:type" content="article">
|
||||
<meta property="og:url" content="https://www.tinqs.com/blog/voice-missing-input-game-dev">
|
||||
<meta property="og:title" content="Why Voice Is the Missing Input for Game Development">
|
||||
<meta property="og:description" content="Voice is the missing input for game dev — speak bugs while you play, let agents file them.">
|
||||
<meta property="og:image" content="https://www.tinqs.com/img/og-cover.jpg">
|
||||
|
||||
<meta name="twitter:card" content="summary_large_image">
|
||||
<meta name="twitter:title" content="Why Voice Is the Missing Input for Game Development">
|
||||
<meta name="twitter:description" content="Voice is the missing input for game dev — speak bugs while you play, let agents file them.">
|
||||
<meta name="twitter:image" content="https://www.tinqs.com/img/og-cover.jpg">
|
||||
|
||||
<script type="application/ld+json">
|
||||
{
|
||||
"@context": "https://schema.org",
|
||||
"@type": "BlogPosting",
|
||||
"headline": "Why Voice Is the Missing Input for Game Development",
|
||||
"datePublished": "2026-06-10",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Ozan Bozkurt"
|
||||
},
|
||||
"publisher": {
|
||||
"@type": "Organization",
|
||||
"name": "Tinqs Limited",
|
||||
"url": "https://www.tinqs.com"
|
||||
},
|
||||
"description": "Speaking a bug while you're looking at the screen beats typing it from memory ten minutes later. Voice-to-agent pipelines collapse the gap between noticing a problem and tracking it — and game dev is the perfect use case."
|
||||
}
|
||||
</script>
|
||||
|
||||
<style>
|
||||
/* ── Self-contained post styles (Studio provides site chrome) ── */
|
||||
|
||||
:root {
|
||||
--c-accent: #c9935a;
|
||||
--c-accent-l: #d4a87c;
|
||||
--c-bg: #0d1117;
|
||||
--c-text: #e6edf3;
|
||||
--c-muted: #9aa7b4;
|
||||
--c-border: #2a3340;
|
||||
--c-blue: #38bdf8;
|
||||
--c-purple: #a855f7;
|
||||
--c-gold: #f59e0b;
|
||||
--c-code-bg: #1c2230;
|
||||
--c-pre-bg: #0a0e14;
|
||||
}
|
||||
|
||||
*, *::before, *::after { box-sizing: border-box; }
|
||||
|
||||
body {
|
||||
margin: 0;
|
||||
padding: 0;
|
||||
background: transparent;
|
||||
color: var(--c-text);
|
||||
font-family: system-ui, -apple-system, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif;
|
||||
line-height: 1.6;
|
||||
-webkit-font-smoothing: antialiased;
|
||||
}
|
||||
|
||||
/* ── Post container ── */
|
||||
.post {
|
||||
max-width: 720px;
|
||||
margin: 0 auto;
|
||||
padding: 40px 24px 48px;
|
||||
}
|
||||
|
||||
/* ── Back link ── */
|
||||
.post__back {
|
||||
color: var(--c-blue);
|
||||
text-decoration: none;
|
||||
font-size: 0.9rem;
|
||||
display: inline-block;
|
||||
margin-bottom: 24px;
|
||||
}
|
||||
.post__back:hover { color: var(--c-purple); }
|
||||
|
||||
/* ── Gradient title ── */
|
||||
.post__title {
|
||||
background: linear-gradient(90deg, #c9935a, #f59e0b 40%, #38bdf8);
|
||||
-webkit-background-clip: text;
|
||||
background-clip: text;
|
||||
color: transparent;
|
||||
font-weight: 800;
|
||||
font-size: 2.2rem;
|
||||
line-height: 1.25;
|
||||
margin: 0 0 16px;
|
||||
}
|
||||
|
||||
/* ── Date pill ── */
|
||||
.post__date {
|
||||
display: inline-block;
|
||||
font-family: ui-monospace, 'SF Mono', 'Cascadia Code', Consolas, monospace;
|
||||
font-size: 0.72rem;
|
||||
letter-spacing: 0.22em;
|
||||
text-transform: uppercase;
|
||||
color: var(--c-blue);
|
||||
border: 1px solid rgba(147, 140, 129, 0.25);
|
||||
border-radius: 999px;
|
||||
padding: 4px 14px;
|
||||
margin-bottom: 16px;
|
||||
}
|
||||
|
||||
/* ── Lead ── */
|
||||
.post__lead {
|
||||
color: var(--c-muted);
|
||||
font-size: 1.08rem;
|
||||
line-height: 1.7;
|
||||
}
|
||||
|
||||
/* ── Body ── */
|
||||
.post__body { font-size: 1rem; line-height: 1.7; }
|
||||
|
||||
.post__body p { margin: 14px 0; }
|
||||
|
||||
.post__body h2 {
|
||||
font-size: 1.7rem;
|
||||
margin: 54px 0 6px;
|
||||
padding-left: 16px;
|
||||
border-left: 4px solid var(--c-accent);
|
||||
line-height: 1.3;
|
||||
}
|
||||
|
||||
.post__body h3 {
|
||||
color: var(--c-purple);
|
||||
font-size: 1.18rem;
|
||||
margin: 30px 0 4px;
|
||||
}
|
||||
|
||||
.post__body h4, .post__body h5, .post__body h6 {
|
||||
margin: 20px 0 4px;
|
||||
}
|
||||
|
||||
/* ── Inline code ── */
|
||||
.post__body code {
|
||||
font-family: ui-monospace, 'SF Mono', 'Cascadia Code', Consolas, monospace;
|
||||
font-size: 0.86em;
|
||||
background: var(--c-code-bg);
|
||||
color: #9fe6c0;
|
||||
padding: 2px 6px;
|
||||
border-radius: 5px;
|
||||
border: 1px solid var(--c-border);
|
||||
}
|
||||
|
||||
/* ── Code blocks ── */
|
||||
.post__body pre {
|
||||
background: var(--c-pre-bg);
|
||||
border: 1px solid var(--c-border);
|
||||
border-radius: 10px;
|
||||
padding: 16px 18px;
|
||||
overflow-x: auto;
|
||||
margin: 14px 0;
|
||||
font-family: ui-monospace, 'SF Mono', 'Cascadia Code', Consolas, monospace;
|
||||
font-size: 0.85rem;
|
||||
line-height: 1.55;
|
||||
color: var(--c-text);
|
||||
}
|
||||
|
||||
.post__body pre code {
|
||||
background: transparent;
|
||||
padding: 0;
|
||||
border: none;
|
||||
font-size: inherit;
|
||||
color: inherit;
|
||||
border-radius: 0;
|
||||
}
|
||||
|
||||
/* ── Blockquote ── */
|
||||
.post__body blockquote {
|
||||
background: rgba(245, 158, 11, 0.08);
|
||||
border: 1px solid rgba(245, 158, 11, 0.25);
|
||||
border-left: 4px solid var(--c-gold);
|
||||
border-radius: 0 12px 12px 0;
|
||||
padding: 16px 18px;
|
||||
margin: 18px 0;
|
||||
color: #f4e3c4;
|
||||
font-size: 0.94rem;
|
||||
}
|
||||
|
||||
/* ── Links ── */
|
||||
.post__body a { color: var(--c-blue); }
|
||||
.post__body a:hover { color: var(--c-purple); }
|
||||
|
||||
/* ── Strong ── */
|
||||
.post__body strong { color: var(--c-gold); }
|
||||
|
||||
/* ── HR ── */
|
||||
.post__body hr {
|
||||
border: none;
|
||||
border-top: 1px solid var(--c-border);
|
||||
margin: 32px 0;
|
||||
}
|
||||
|
||||
/* ── Figures ── */
|
||||
.post__body figure { margin: 20px 0; }
|
||||
.post__body figure img {
|
||||
max-width: 100%;
|
||||
border-radius: 12px;
|
||||
border: 1px solid var(--c-border);
|
||||
}
|
||||
|
||||
.post__body figcaption {
|
||||
color: var(--c-muted);
|
||||
font-size: 0.85rem;
|
||||
margin-top: 6px;
|
||||
}
|
||||
|
||||
/* ── Lists ── */
|
||||
.post__body ul, .post__body ol { padding-left: 1.5em; margin: 10px 0; }
|
||||
.post__body li { margin: 4px 0; }
|
||||
|
||||
/* ── Author ── */
|
||||
.post__author {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 14px;
|
||||
margin-top: 48px;
|
||||
padding-top: 24px;
|
||||
border-top: 1px solid var(--c-border);
|
||||
}
|
||||
|
||||
.post__author-avatar {
|
||||
width: 48px;
|
||||
height: 48px;
|
||||
border-radius: 50%;
|
||||
background: var(--c-accent);
|
||||
color: var(--c-bg);
|
||||
display: flex;
|
||||
align-items: center;
|
||||
justify-content: center;
|
||||
font-weight: 700;
|
||||
font-size: 0.85rem;
|
||||
flex-shrink: 0;
|
||||
}
|
||||
|
||||
.post__author-info {
|
||||
font-size: 0.85rem;
|
||||
color: var(--c-muted);
|
||||
line-height: 1.4;
|
||||
}
|
||||
|
||||
.post__author-name {
|
||||
color: var(--c-text);
|
||||
font-weight: 600;
|
||||
}
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
|
||||
<!-- POST -->
|
||||
<article class="post">
|
||||
<a href="/blog/" class="post__back">← All Posts</a>
|
||||
<span class="post__date">10 June 2026</span>
|
||||
<h1 class="post__title">Why Voice Is the Missing Input for Game Development</h1>
|
||||
<p class="post__lead">Every game developer knows this moment. You're playtesting, running through the world, and you see something wrong — a tree floating two meters above the terrain, a UI element clipping, an animation that stutters on frame 14. You make a mental note. Ten minutes later, back at the editor, you try to file it. The coordinates are fuzzy. The exact reproduction steps are gone. You type something vague like "tree floating on west beach maybe" and hope you remember more tomorrow.</p>
|
||||
|
||||
<div class="post__body">
|
||||
<p>Voice changes this entirely. Speak the bug while you're looking at it, and an agent turns your words into a structured issue — with a screenshot, a vision-model description, coordinates, and a severity estimate. No keyboard. No context switch. No memory loss.</p>
|
||||
<h2>The latency that kills bug reports</h2>
|
||||
<p>The distance between seeing a bug and filing it is a memory decay curve. Every second that passes, your recollection loses precision:</p>
|
||||
<p>| Elapsed time | What you remember |</p>
|
||||
<p>|—|—|</p>
|
||||
<p>| 0 seconds | Exact position, camera angle, what you were doing, what's on screen |</p>
|
||||
<p>| 30 seconds | "There was a tree... somewhere west... maybe floating?" |</p>
|
||||
<p>| 5 minutes | "I think there was a rendering issue? Or was it yesterday?" |</p>
|
||||
<p>Typed bug reports are reconstructions from decaying memory. Voice bug reports are real-time captures. The difference in quality isn't marginal — it's the difference between a fix you can act on immediately and a ticket that sits in the backlog for three months while someone tries to reproduce it.</p>
|
||||
<h2>The pipeline: voice → text → structured issue</h2>
|
||||
<p>Here's what actually happens when you speak a bug during playtesting:</p>
|
||||
<pre><code>1. You speak: "There's a tree floating two meters above the terrain
|
||||
on the west beach, near the big rock formation. Happens after
|
||||
the vegetation culling pass kicks in around sunset."
|
||||
|
||||
2. Microphone → transcription (Whisper, local or API, ~500ms)
|
||||
|
||||
3. Transcription → agent context window (~100ms)
|
||||
|
||||
4. Agent parses the raw text and extracts:
|
||||
- What: tree floating above terrain
|
||||
- Where: west beach, near rock formation (camera coordinates auto-captured)
|
||||
- When: after vegetation culling, sunset
|
||||
- Severity: medium (visual, not blocking)
|
||||
- Screenshot: captured from the running game engine
|
||||
|
||||
5. Agent files a structured issue with all of the above,
|
||||
tags the rendering engineer, and posts the digest to team chat.
|
||||
|
||||
Total latency: under 2 seconds. You keep playing.</code></pre>
|
||||
<p>This isn't theoretical. The pipeline runs on our own game project, and it's caught bugs that would have slipped through playtesting entirely — the ones you see, make a mental note about, and forget by the time you alt-tab.</p>
|
||||
<h2>Why game dev is the perfect voice use case</h2>
|
||||
<p><strong>You're already looking at the screen.</strong> Voice input doesn't require switching windows or breaking flow. You're playtesting — your hands are on the controller or WASD, your eyes are on the game. Speaking is the only input channel that doesn't interrupt the thing you're actually doing.</p>
|
||||
<p><strong>Game bugs are spatial and visual.</strong> "The crafting UI text overflows on items with names longer than 20 characters" is something you see, not something you calculate. Describing it verbally while looking at it produces a far richer bug report than typing from memory.</p>
|
||||
<p><strong>Reproduction is half the battle.</strong> When you speak the bug at the moment of occurrence, you naturally include the context: what you were doing, what just happened, what the game state was. You don't have to reconstruct it later.</p>
|
||||
<p><strong>Voice scales to the whole team.</strong> Artists see visual bugs. Designers see balance issues. Producers see UX friction. Not everyone on a game team is a fast typist or comfortable with issue trackers. Everyone can speak.</p>
|
||||
<h2>What the agent adds beyond transcription</h2>
|
||||
<p>Raw transcription is useful — it's a notepad you don't have to type. But the agent layer is what makes voice input a pipeline rather than a dictation tool:</p>
|
||||
<p><strong>Screenshot coordination.</strong> The agent calls the game engine's HTTP API, captures the current frame, and attaches it to the issue. You don't take screenshots. The agent does.</p>
|
||||
<p><strong>Vision model description.</strong> The screenshot goes through a vision model that writes a text description of what's on screen. Future-you searching the issue tracker for "floating tree" finds it even if the transcription was garbled.</p>
|
||||
<p><strong>Coordinates and context.</strong> The game engine provides the player's world position, camera angle, and current game state. The agent bakes these into the issue. A developer can teleport directly to the bug location.</p>
|
||||
<p><strong>Severity and routing.</strong> The agent estimates severity from context ("floating" is visual, "crash" is critical) and tags the right team member. An artist doesn't get pinged for a shader bug. A rendering engineer doesn't get pinged for a UI text overflow.</p>
|
||||
<h2>The numbers</h2>
|
||||
<p>| Method | Time from observation to filed issue | Information loss |</p>
|
||||
<p>|—|—|—|</p>
|
||||
<p>| Mental note → type later | 5-30 minutes | High (positions, steps, context) |</p>
|
||||
<p>| Alt-tab → type immediately | 30-60 seconds | Medium (screenshots missed, flow broken) |</p>
|
||||
<p>| Voice → agent pipeline | 2 seconds | Low (screenshot + position captured automatically) |</p>
|
||||
<p>The throughput difference compounds. A 30-minute playtest session with keyboard-only bug filing might yield 3-4 issues, half of them vague. The same session with voice-to-agent produces 10-15 issues, all with screenshots, positions, and reproduction context.</p>
|
||||
<h2>Setup is simpler than you think</h2>
|
||||
<p>You need three things, all of which you probably already have:</p>
|
||||
<p>1. <strong>A microphone.</strong> The one in your headset is fine. Transcription models handle suboptimal audio surprisingly well.</p>
|
||||
<p>2. <strong>Transcription.</strong> Whisper runs locally and is free. Cloud APIs are sub-cent per minute. Both work.</p>
|
||||
<p>3. <strong>An agent that speaks your game engine's API.</strong> If your engine has an HTTP interface for screenshots and game state, the agent can wire the rest together. If it doesn't — add one. It's a weekend project.</p>
|
||||
<p>The agent itself doesn't need to be custom-built. Any coding agent with tool access can be told "watch the game, transcribe voice input, file issues in the tracker." It's a skill file, not a product.</p>
|
||||
<h2>What changes when you stop typing bugs</h2>
|
||||
<p>The most surprising effect isn't the speed. It's the coverage. When filing a bug costs two seconds of speaking, you file bugs you would have previously ignored. The minor visual glitch. The slight animation hitch. The UI element that's two pixels misaligned.</p>
|
||||
<p>Individually these are low-priority. Collectively they're the difference between a game that feels polished and one that feels rough. And they only get caught when the cost of reporting approaches zero.</p>
|
||||
<p>The second effect is that playtesting becomes a primary input channel. Instead of structured QA sessions with checklists and forms, you just play the game. The agent captures everything. When you're done, you have a list of filed issues with screenshots and context — generated from your spoken observations in real time.</p>
|
||||
<p>Voice isn't a gimmick for game development. It's the input channel that matches the way we actually work — looking at the screen, noticing things, and talking about them. The tools exist. The latency is sub-second. The cost is negligible. The only thing missing is the habit.</p>
|
||||
<hr>
|
||||
<p><em>We build <a href="https://tinqs.com" style="color: var(–c-accent-l);">Tinqs Studio</a> — a game dev platform with built-in AI agents, git hosting, and creative pipelines. <a href="https://arikigame.com" style="color: var(–c-accent-l);">Ariki</a> is the survival colony sim we're building with every tool described here.</em></p>
|
||||
|
||||
</div>
|
||||
|
||||
<div class="post__author">
|
||||
<div class="post__author-avatar">OB</div>
|
||||
<div class="post__author-info">
|
||||
<span class="post__author-name">Ozan Bozkurt</span><br>
|
||||
CTO & Developer, Tinqs
|
||||
</div>
|
||||
</div>
|
||||
</article>
|
||||
|
||||
</body>
|
||||
</html>
|
||||
|
After
|
Reference in New Issue
Block a user