ElevenLabs API Case Study: How an Indie Game Studio Generated 200+ NPC Dialogue Lines in 48 Hours
From Casting Calls to API Calls: Replacing Traditional Voice Acting Pipelines
For indie game studios, voice acting is one of the most expensive and time-consuming production bottlenecks. Casting agencies, recording sessions, retakes, and post-processing can consume weeks of calendar time and thousands of dollars — even for a modest RPG with a handful of NPCs. This case study documents how a fictional but representative indie studio, Ironpine Games, used the ElevenLabs API to generate over 200 fully voiced NPC dialogue lines in just 48 hours. The workflow leveraged three core ElevenLabs features: Projects API, Voice Design presets, and Pronunciation Dictionaries — replacing what would have traditionally required a 3-week casting and recording pipeline.
The Challenge
- 212 dialogue lines across 14 unique NPCs for a fantasy RPG vertical slice- Budget constraint: under $500 total voice production cost- Timeline: 48 hours before a publisher demo- Each NPC needed a distinct, consistent voice with correct pronunciation of 30+ fictional proper nouns (place names, spells, lore terms)
Step 1: Environment Setup and Installation
Install the ElevenLabs Python SDK
pip install elevenlabs
Configure Your API Key
# Set your API key as an environment variable
export ELEVENLABS_API_KEY=YOUR_API_KEY# Python initializationfrom elevenlabs.client import ElevenLabs
client = ElevenLabs(api_key=“YOUR_API_KEY”)
Step 2: Design Unique NPC Voices with Voice Design
Instead of auditioning voice actors, Ironpine used the Voice Design API to generate distinct voice profiles for each NPC archetype — grizzled blacksmith, young apprentice, ancient oracle, and so on.
from elevenlabs import VoiceDesign, Gender, Age, Accent
Design a grizzled blacksmith voice
blacksmith_preview = client.text_to_voice.create_previews(
voice_description=“A gruff, deep-voiced male blacksmith in his 50s with a slight rasp”,
text=“Aye, that blade will cost ye three gold crowns. No less.”
)
Listen to generated previews, then save the best one as a persistent voice
blacksmith_voice = client.text_to_voice.create_voice_from_preview(
voice_name=“Blacksmith_Gorath”,
voice_description=“Gruff male blacksmith NPC”,
generated_voice_id=blacksmith_preview.previews[0].generated_voice_id
)
print(f”Created voice: {blacksmith_voice.voice_id}“)
The team repeated this for all 14 NPCs, generating 2-3 preview variations per character and selecting the best fit — a process that took roughly 3 hours compared to weeks of casting calls.
Step 3: Create a Pronunciation Dictionary for Lore Terms
Fantasy games are full of invented words. Without a pronunciation dictionary, the TTS engine will guess — often incorrectly. ElevenLabs Pronunciation Dictionaries solve this definitively.
import json
Create a pronunciation dictionary from a lexicon file
pronunciation_lexicon.pls is a PLS (Pronunciation Lexicon Specification) XML file
with open(“pronunciation_lexicon.pls”, “rb”) as f:
dictionary = client.pronunciation_dictionary.add_from_file(
file=f,
name=“ironpine_rpg_lore”,
description=“Pronunciation rules for all fantasy proper nouns”
)
print(f”Dictionary ID: {dictionary.id}”)
print(f”Rules added: {dictionary.version_id}“)
Example PLS Lexicon File
Valdrethar
vɑːl.drɛ.θɑːr
Kythira
kɪ.θaɪ.rə
Aethermancy
iː.θɜːr.mæn.si
Step 4: Batch Generate All Dialogue with the Projects API
The Projects API is where the entire pipeline comes together. It allows you to organize chapters, assign voices per character, attach pronunciation dictionaries, and batch-convert an entire script.
# Create a project for the RPG vertical slice
project = client.projects.add(
name="Ironpine RPG - Vertical Slice",
default_model_id="eleven_multilingual_v2",
pronunciation_dictionary_versions_locators=[
{"pronunciation_dictionary_id": dictionary.id, "version_id": dictionary.version_id}
],
default_paragraph_voice_id=blacksmith_voice.voice_id
)
print(f”Project created: {project.project_id}”)
# Add a chapter for each game area or quest chapter = client.projects.add_chapter( project_id=project.project_id, name=“Chapter 1 - Village of Valdrethar” )
print(f”Chapter ID: {chapter.chapter_id}“)
Bulk Upload Dialogue Lines via Script
import csv
import time
npc_voices = {
"Gorath": "voice_id_blacksmith",
"Lyra": "voice_id_apprentice",
"Elder Morvyn": "voice_id_oracle",
# ... 11 more NPC voice mappings
}
with open("dialogue_script.csv", "r") as f:
reader = csv.DictReader(f) # columns: npc_name, line_id, text
for row in reader:
voice_id = npc_voices.get(row["npc_name"])
if not voice_id:
continue
audio = client.text_to_speech.convert(
voice_id=voice_id,
text=row["text"],
model_id="eleven_multilingual_v2",
pronunciation_dictionary_locators=[
{"pronunciation_dictionary_id": dictionary.id, "version_id": dictionary.version_id}
]
)
filename = f"audio/{row['line_id']}.mp3"
with open(filename, "wb") as out:
for chunk in audio:
out.write(chunk)
print(f"Generated: {filename}")
time.sleep(0.5) # respect rate limits
Results Summary
| Metric | Traditional Pipeline | ElevenLabs API Pipeline |
|---|---|---|
| Casting & auditions | 5-7 days | 3 hours (Voice Design) |
| Recording sessions | 3-5 days | 0 (API batch generation) |
| Pronunciation retakes | 1-2 days | 0 (dictionary-driven) |
| Post-processing | 2-3 days | Minimal normalization |
| Total elapsed time | ~15-20 days | ~48 hours |
| Cost (212 lines) | $3,000-$8,000+ | ~$80-$150 API credits |
stability (lower = more expressive) and similarity_boost per line to convey anger, whispers, or excitement without needing separate voice profiles.- **Parallelize with async requests.** Use asyncio and httpx to generate multiple lines concurrently. Respect the concurrency limits on your plan tier.- **Export a voice map JSON.** Keep a single source-of-truth mapping npc_name → voice_id in version control so your entire team references the same voices.- **Tag lines with SSML-style markers.** Insert in dialogue text for natural pauses between sentences — especially useful for dramatic NPC monologues.
## Troubleshooting Common Issues
| Error / Symptom | Cause | Fix |
|---|---|---|
401 Unauthorized | Invalid or expired API key | Regenerate your key at elevenlabs.io dashboard and update the environment variable |
429 Too Many Requests | Rate limit exceeded | Add exponential backoff or time.sleep(1) between calls; upgrade plan tier if persistent |
| Pronunciation dictionary not applied | Missing or incorrect version_id | Always pass both pronunciation_dictionary_id and version_id in the locator object |
| Voice sounds inconsistent between lines | Stability set too low | Increase stability to 0.6-0.75 for NPC dialogue; reserve low values for emotional peaks |
| Generated audio has clipping | Text contains unusual punctuation or symbols | Sanitize input text; remove stray unicode characters and excessive exclamation marks |
Can I use ElevenLabs-generated voices commercially in a shipped game?
Yes. ElevenLabs allows commercial usage of generated audio on paid plans. The voices created through Voice Design are fully owned synthetic voices with no likeness rights concerns, making them ideal for indie game distribution on Steam, itch.io, or console storefronts. Always review the current terms of service for your specific plan tier.
How do I maintain voice consistency when generating hundreds of lines over multiple sessions?
Once you save a designed voice via create_voice_from_preview, it receives a persistent voice_id. All subsequent TTS calls using that ID produce consistent output. Keep stability at 0.5 or higher and use the same model_id across all generations. Avoid regenerating the voice profile mid-production.
What happens if I need to add new dialogue lines after the initial batch?
Simply run the same script with additional CSV rows. The voice IDs, pronunciation dictionary, and model settings remain unchanged. New lines will sound consistent with previously generated audio. For large additions, consider using the Projects API to organize new content into separate chapters for easier management.