AI Text-to-Speech Voiceover
Turn any script into a natural AI voiceover audio file (MP3/WAV/Opus/AAC). Pick a voice, speed, and format. Good for faceless videos, IVR, and narration.
How it works
- 1Open it on Apify
Hit Run on Apify — it opens the tool in the cloud, no install.
- 2Set the inputs
Adjust
text,texts,voice(sensible defaults are pre-filled). - 3Click Run
The tool runs on Apify’s cloud and collects the data for you.
- 4Export the results
Download as JSON, CSV or Excel, or pipe straight into your app, Google Sheets, or an AI agent.
Inputs
| Field | What it does | Type |
|---|---|---|
text | The text to convert to speech. Long scripts are chunked and stitched automatically. | string |
texts | Array of strings OR objects (uses script/scriptText/text/narration). One audio file per item. | array |
voice | AI voice. | string |
model | tts-1 (fast) or tts-1-hd (higher quality). | string |
format | Output audio format. | string |
speed | Playback speed 0.25–4.0 (1.0 = normal). | string |
openaiApiKey | Your OpenAI key (TTS). Kept private. | string |
baseUrl | OpenAI-compatible base URL. Default https://api.openai.com/v1. | string |
What you get
A structured dataset — each result includes fields like:
_demo_noticeaudioKeyaudioUrlcharacterschunksdurationSecondsformatindexmodeltextPreviewvoiceExport every run as JSON, CSV or Excel, or send it to your app, a database, Google Sheets, or an AI agent.
3 ready-to-run use cases
Faceless YouTube AI Voiceover from a Script
Paste a narration script and get a steady AI voiceover track for faceless YouTube videos, ready to drop under your B-roll. MP3 or WAV, adjustable pace.
IVR Phone Menu Voice Prompts - Text to Speech
Turn "press 1 for sales" menu text into clear IVR voice prompts for your phone system, exported as AAC. Telephony-ready greetings without studio time.
Batch Text to Speech: One Audio File per Line
Got a list of lines? Each one comes back as its own AI voiceover file, ideal for app prompts, UI sounds, and clip sets. Pick the voice, speed, and format.
AI Text-to-Speech Voiceover
Turns a block of text or a full script into a natural-sounding AI voiceover file. Pick a voice, set the speed, and choose MP3, WAV, Opus, or AAC. It's meant for the usual narration jobs: faceless videos, audiobooks, IVR prompts, explainer voiceovers.
How it works
The actor sends your text to an OpenAI-compatible TTS endpoint. Long scripts get split at sentence boundaries into chunks under ~3,500 characters, each chunk is synthesized separately, and the parts are stitched back into one file with ffmpeg (using stream copy, so there's no re-encode and no quality loss). Each finished audio file is saved to the run's key-value store and a row is pushed to the dataset.
Input
Nothing is strictly required by the schema, but in practice you need an openaiApiKey and at least one of text or texts. If neither is provided the run errors out.
| Field | Required | Notes |
|---|---|---|
text | one of text/texts | The script to voice, as a single string. |
texts | one of text/texts | Batch mode. Array of strings, or objects keyed by script / scriptText / text / narration. One audio file per item. |
voice | no | alloy, echo, fable, onyx, nova, shimmer. Default onyx (deep male). nova and shimmer are female. |
model | no | tts-1 (fast, default) or tts-1-hd (higher quality, costs more on the OpenAI side). |
format | no | mp3 (default), wav, opus, or aac. |
speed | no | Playback speed from 0.25 to 4.0. Default 1.0. Values outside that range are clamped. |
openaiApiKey | yes in practice | Your OpenAI key, used for the TTS call. Stored as a secret. Falls back to the OPENAI_API_KEY env var if set. |
baseUrl | no | Advanced. Point at any OpenAI-compatible /audio/speech endpoint. Defaults to https://api.openai.com/v1. |
Output
Each input item produces one audio file in the key-value store and one dataset record. The record includes audioKey and audioUrl (where to fetch the file), durationSeconds, characters, chunks (how many pieces the script was split into), plus the voice, model, and resolved format. Failed items get a record with ok: false and the error message instead of stopping the whole run.
Example
{
"text": "Welcome back to the channel. Today we're looking at one of the strangest mysteries of the deep ocean.",
"voice": "onyx",
"model": "tts-1",
"format": "mp3",
"speed": 1.0,
"openaiApiKey": "sk-..."
}
Pricing
$0.04 per voiceover, pay per result, no subscription. The OpenAI TTS usage is billed separately on your own key.
Notes
This actor calls OpenAI for synthesis, so it needs your own OpenAI API key. Individual chunks are capped at 4,000 characters before they're sent, which keeps each request within the model's per-call limit; there's no hard limit on total script length since long inputs are chunked and concatenated.