How to Create Multilingual Audiobooks with ElevenLabs API: Voice Cloning, Text Splitting & Chapter Automation in Python

Build a Complete Multilingual Audiobook Pipeline with ElevenLabs API

ElevenLabs provides one of the most advanced text-to-speech APIs available, offering voice cloning, multilingual synthesis, and high-fidelity audio generation. In this guide, you’ll build a complete Python workflow that takes a book manuscript, splits it into chapters, clones a voice, and generates professional-quality audiobook files in multiple languages — all automated.

Prerequisites and Installation

Before starting, ensure you have Python 3.9+ installed and an ElevenLabs account with API access. A Pro or Scale plan is recommended for voice cloning and higher character limits.

Step 1: Install Dependencies

pip install elevenlabs pydub requests python-dotenv

Step 2: Configure Your API Key

Create a .env file in your project root: ELEVENLABS_API_KEY=YOUR_API_KEY

Then load it in your Python script: import os from dotenv import load_dotenv

load_dotenv() API_KEY = os.getenv(“ELEVENLABS_API_KEY”)

Voice Cloning Setup

Step 3: Clone a Voice from Audio Samples

Instant Voice Cloning (IVC) requires at least one clean audio sample. For best results, provide 3–5 minutes of clear speech with minimal background noise. import requests

def clone_voice(name: str, sample_paths: list[str]) -> str: url = “https://api.elevenlabs.io/v1/voices/add” headers = {“xi-api-key”: API_KEY} data = { “name”: name, “description”: f”Cloned voice for audiobook narration - {name}”, “labels”: ’{“use_case”: “audiobook”, “language”: “multilingual”}’ } files = [ (“files”, (os.path.basename(p), open(p, “rb”), “audio/mpeg”)) for p in sample_paths ] response = requests.post(url, headers=headers, data=data, files=files) response.raise_for_status() voice_id = response.json()[“voice_id”] print(f”Voice cloned successfully. Voice ID: {voice_id}”) return voice_id


Usage

voice_id = clone_voice(“Narrator_EN”, [ “samples/narrator_sample1.mp3”, “samples/narrator_sample2.mp3” ])

Text Splitting and Chapter Detection

Step 4: Parse and Split Book Text into Chapters

ElevenLabs has a 5,000-character limit per API request. This function splits your manuscript by chapters and further chunks long chapters into API-safe segments. import re

def split_into_chapters(text: str) -> list[dict]: pattern = r”(Chapter\s+\d+[^\n]*)” parts = re.split(pattern, text, flags=re.IGNORECASE) chapters = [] for i in range(1, len(parts), 2): title = parts[i].strip() body = parts[i + 1].strip() if i + 1 < len(parts) else "" chapters.append({“title”: title, “body”: body}) if not chapters: chapters.append({“title”: “Full Text”, “body”: text.strip()}) return chapters

def chunk_text(text: str, max_chars: int = 4500) -> list[str]: sentences = re.split(r’(?<=[.!?])\s+’, text) chunks, current = [], "" for sentence in sentences: if len(current) + len(sentence) + 1 > max_chars: chunks.append(current.strip()) current = sentence else: current += ” ” + sentence if current.strip(): chunks.append(current.strip()) return chunks

Multilingual Audio Generation

Step 5: Generate Audio for Each Chapter and Language

ElevenLabs' eleven_multilingual_v2 model supports 29 languages. Specify the target language naturally in the text or use the model's automatic language detection. import time from pathlib import Path

TTS_URL = “https://api.elevenlabs.io/v1/text-to-speech/{voice_id}”

def generate_audio(text: str, voice_id: str, output_path: str, model: str = “eleven_multilingual_v2”, stability: float = 0.5, similarity: float = 0.75) -> None: url = TTS_URL.format(voice_id=voice_id) headers = { “xi-api-key”: API_KEY, “Content-Type”: “application/json”, “Accept”: “audio/mpeg” } payload = { “text”: text, “model_id”: model, “voice_settings”: { “stability”: stability, “similarity_boost”: similarity, “style”: 0.0, “use_speaker_boost”: True } } response = requests.post(url, json=payload, headers=headers) response.raise_for_status() Path(output_path).parent.mkdir(parents=True, exist_ok=True) with open(output_path, “wb”) as f: f.write(response.content) print(f”Saved: {output_path}“)

Step 6: Orchestrate the Full Pipeline

from pydub import AudioSegment

def create_audiobook(manuscript_path: str, voice_id: str,
                     languages: list[str], output_dir: str = "output") -> None:
    with open(manuscript_path, "r", encoding="utf-8") as f:
        text = f.read()

    chapters = split_into_chapters(text)
    print(f"Found {len(chapters)} chapters")

    for lang in languages:
        lang_dir = os.path.join(output_dir, lang)
        for ch_idx, chapter in enumerate(chapters, 1):
            chunks = chunk_text(chapter["body"])
            audio_segments = []

            for chunk_idx, chunk in enumerate(chunks, 1):
                chunk_file = os.path.join(lang_dir, f"ch{ch_idx}_part{chunk_idx}.mp3")
                generate_audio(chunk, voice_id, chunk_file)
                audio_segments.append(AudioSegment.from_mp3(chunk_file))
                time.sleep(1)  # Rate limit buffer

            # Merge chunk audio files into a single chapter file
            combined = sum(audio_segments[1:], audio_segments[0])
            chapter_file = os.path.join(lang_dir, f"chapter_{ch_idx:02d}.mp3")
            combined.export(chapter_file, format="mp3", bitrate="192k")
            print(f"[{lang}] {chapter['title']} -> {chapter_file}")

            # Clean up chunk files
            for chunk_idx in range(1, len(chunks) + 1):
                os.remove(os.path.join(lang_dir, f"ch{ch_idx}_part{chunk_idx}.mp3"))

# Run the pipeline
create_audiobook(
    manuscript_path="book.txt",
    voice_id=voice_id,
    languages=["en", "ko", "ja", "es"]
)

Note: For true multilingual output, you should provide translated text for each language. The eleven_multilingual_v2 model handles pronunciation natively per language but does not perform translation itself. Pair this pipeline with a translation API such as Google Translate or DeepL for end-to-end multilingual production.

Pro Tips

Use voice settings strategically: Lower stability (0.3–0.4) adds expressiveness for fiction. Higher values (0.7–0.8) suit non-fiction and technical content.- Add SSML-like pauses: Insert ”…” or ”—” in text to create natural pauses between paragraphs and scene changes.- Batch with Projects API: For books over 100,000 characters, use the ElevenLabs Projects API (/v1/projects) which handles chunking, stitching, and chapter metadata automatically.- Monitor usage: Call GET /v1/user/subscription to check remaining character quota before long runs.- Cache voice IDs: Store cloned voice IDs in a config file. Re-cloning the same samples wastes quota and may produce slightly different voice profiles.- Post-process with FFmpeg: Normalize loudness across chapters: ffmpeg -i chapter_01.mp3 -af loudnorm -ar 44100 chapter_01_normalized.mp3

Troubleshooting

Error	Cause	Solution
`401 Unauthorized`	Invalid or missing API key	Verify `ELEVENLABS_API_KEY` in your `.env` file. Regenerate the key from the ElevenLabs dashboard if needed.
`422 text_too_long`	Text exceeds 5,000-character limit	Ensure `chunk_text()` uses a `max_chars` value below 5,000. Default 4,500 provides a safe buffer.
`429 Too Many Requests`	Rate limit exceeded	Increase `time.sleep()` delay between requests. Pro plans allow higher concurrency; check your plan limits.
Audio sounds robotic or distorted	Poor voice clone samples	Use clean, studio-quality recordings. Remove background noise. Provide at least 3 minutes of varied speech.
`pydub` import error	FFmpeg not installed	Install FFmpeg: `sudo apt install ffmpeg` (Linux) or `brew install ffmpeg` (macOS) or download from ffmpeg.org (Windows).
Wrong language pronunciation	Model mismatch	Use `eleven_multilingual_v2` for non-English text. The `eleven_monolingual_v1` model only supports English.

## FAQ

How many languages does ElevenLabs multilingual model support?

The eleven_multilingual_v2 model supports 29 languages including English, Korean, Japanese, Spanish, French, German, Chinese, Arabic, Hindi, and more. The cloned voice adapts its pronunciation to each target language automatically, though the quality is highest for languages with Latin and CJK scripts.

Can I use a cloned voice commercially for audiobooks?

Yes, but only if you have legal rights to the voice. You must own the voice (i.e., it is your own) or have explicit written consent from the voice owner. ElevenLabs’ terms require that cloned voices are used ethically and legally. Commercial audiobook distribution with a cloned voice requires at minimum a Pro plan.

What is the maximum book length this pipeline can handle?

There is no hard limit on book length in the code itself. The practical limit comes from your ElevenLabs character quota. A Pro plan provides approximately 500,000 characters per month. A typical 80,000-word novel is roughly 450,000 characters. For longer works or multiple languages, consider the Scale or Enterprise plan, or spread generation across billing cycles. The Projects API is recommended for books exceeding 100,000 characters as it provides built-in chunking and reliability features.

Explore More Tools

Antigravity AI Content Pipeline Automation Guide: Google Docs to WordPress Publishing Workflow Guide Bolt.new Case Study: Marketing Agency Built 5 Client Dashboards in One Day Case Study Bolt.new Best Practices: Rapid Full-Stack App Generation from Natural Language Prompts Best Practices ChatGPT Advanced Data Analysis (Code Interpreter) Complete Guide: Upload, Analyze, Visualize Guide ChatGPT Custom GPTs Advanced Guide: Actions, API Integration, and Knowledge Base Configuration Guide ChatGPT Voice Mode Guide: Build Voice-First Customer Service and Internal Workflows Guide Claude API Production Chatbot Guide: System Prompt Architecture for Reliable AI Assistants Guide Claude Artifacts Best Practices: Create Interactive Dashboards, Documents, and Code Previews Best Practices Claude Code Hooks Guide: Automate Custom Workflows with Pre and Post Execution Hooks Guide Claude MCP Server Setup Guide: Build Custom Tool Integrations for Claude Code and Claude Desktop Guide Cursor Composer Complete Guide: Multi-File Editing, Inline Diffs, and Agent Mode Guide Cursor Case Study: Solo Founder Built a Next.js SaaS MVP in 2 Weeks with AI-Assisted Development Case Study Cursor Rules Advanced Guide: Project-Specific AI Configuration and Team Coding Standards Guide Devin AI Team Workflow Integration Best Practices: Slack, GitHub, and Code Review Automation Best Practices Devin Case Study: Automated Dependency Upgrade Across 500-Package Python Monorepo Case Study ElevenLabs Case Study: EdTech Startup Localized 200 Course Hours to 8 Languages in 6 Weeks Case Study ElevenLabs Multilingual Dubbing Guide: Automated Video Localization Workflow for Global Content Guide ElevenLabs Voice Design Complete Guide: Create Consistent Character Voices for Games, Podcasts, and Apps Guide Gemini 2.5 Pro vs Claude Sonnet 4 vs GPT-4o: AI Code Generation Comparison 2026 Comparison Gemini API Multimodal Developer Guide: Image, Video, and Document Analysis with Code Examples Guide