AI Text-to-Speech Voiceover

Turn any script into a natural AI voiceover audio file (MP3/WAV/Opus/AAC). Pick a voice, speed, and format. Good for faceless videos, IVR, and narration.

Run this in the cloudRun on Apify →

YouTube & Creator Tools

How it works

1
Open it on Apify
Hit Run on Apify — it opens the tool in the cloud, no install.
2
Set the inputs
Adjust text, texts, voice (sensible defaults are pre-filled).
3
Click Run
The tool runs on Apify’s cloud and collects the data for you.
4
Export the results
Download as JSON, CSV or Excel, or pipe straight into your app, Google Sheets, or an AI agent.

Inputs

Field	What it does	Type
`text`	The text to convert to speech. Long scripts are chunked and stitched automatically.	string
`texts`	Array of strings OR objects (uses script/scriptText/text/narration). One audio file per item.	array
`voice`	AI voice.	string
`model`	tts-1 (fast) or tts-1-hd (higher quality).	string
`format`	Output audio format.	string
`speed`	Playback speed 0.25–4.0 (1.0 = normal).	string
`openaiApiKey`	Your OpenAI key (TTS). Kept private.	string
`baseUrl`	OpenAI-compatible base URL. Default https://api.openai.com/v1.	string

What you get

A structured dataset — each result includes fields like:

_demo_noticeaudioKeyaudioUrlcharacterschunksdurationSecondsformatindexmodeltextPreviewvoice

Export every run as JSON, CSV or Excel, or send it to your app, a database, Google Sheets, or an AI agent.

3 ready-to-run use cases

Faceless YouTube AI Voiceover from a Script

Paste a narration script and get a steady AI voiceover track for faceless YouTube videos, ready to drop under your B-roll. MP3 or WAV, adjustable pace.

IVR Phone Menu Voice Prompts - Text to Speech

Turn "press 1 for sales" menu text into clear IVR voice prompts for your phone system, exported as AAC. Telephony-ready greetings without studio time.

Batch Text to Speech: One Audio File per Line

Got a list of lines? Each one comes back as its own AI voiceover file, ideal for app prompts, UI sounds, and clip sets. Pick the voice, speed, and format.

AI Text-to-Speech Voiceover

Turns a block of text or a full script into a natural-sounding AI voiceover file. Pick a voice, set the speed, and choose MP3, WAV, Opus, or AAC. It's meant for the usual narration jobs: faceless videos, audiobooks, IVR prompts, explainer voiceovers.

How it works

The actor sends your text to an OpenAI-compatible TTS endpoint. Long scripts get split at sentence boundaries into chunks under ~3,500 characters, each chunk is synthesized separately, and the parts are stitched back into one file with ffmpeg (using stream copy, so there's no re-encode and no quality loss). Each finished audio file is saved to the run's key-value store and a row is pushed to the dataset.

Input

Nothing is strictly required by the schema, but in practice you need an openaiApiKey and at least one of text or texts. If neither is provided the run errors out.

Field	Required	Notes
`text`	one of `text`/`texts`	The script to voice, as a single string.
`texts`	one of `text`/`texts`	Batch mode. Array of strings, or objects keyed by `script` / `scriptText` / `text` / `narration`. One audio file per item.
`voice`	no	`alloy`, `echo`, `fable`, `onyx`, `nova`, `shimmer`. Default `onyx` (deep male). `nova` and `shimmer` are female.
`model`	no	`tts-1` (fast, default) or `tts-1-hd` (higher quality, costs more on the OpenAI side).
`format`	no	`mp3` (default), `wav`, `opus`, or `aac`.
`speed`	no	Playback speed from 0.25 to 4.0. Default `1.0`. Values outside that range are clamped.
`openaiApiKey`	yes in practice	Your OpenAI key, used for the TTS call. Stored as a secret. Falls back to the `OPENAI_API_KEY` env var if set.
`baseUrl`	no	Advanced. Point at any OpenAI-compatible `/audio/speech` endpoint. Defaults to `https://api.openai.com/v1`.

Output

Each input item produces one audio file in the key-value store and one dataset record. The record includes audioKey and audioUrl (where to fetch the file), durationSeconds, characters, chunks (how many pieces the script was split into), plus the voice, model, and resolved format. Failed items get a record with ok: false and the error message instead of stopping the whole run.

Example

{
  "text": "Welcome back to the channel. Today we're looking at one of the strangest mysteries of the deep ocean.",
  "voice": "onyx",
  "model": "tts-1",
  "format": "mp3",
  "speed": 1.0,
  "openaiApiKey": "sk-..."
}

Pricing

$0.04 per voiceover, pay per result, no subscription. The OpenAI TTS usage is billed separately on your own key.

Notes

This actor calls OpenAI for synthesis, so it needs your own OpenAI API key. Individual chunks are capped at 4,000 characters before they're sent, which keeps each request within the model's per-call limit; there's no hard limit on total script length since long inputs are chunked and concatenated.