Reddit Text Cleaner — TTS-Ready Narration
Turn raw Reddit posts into TTS-ready narration. Strips markdown, links and edit stamps, expands AITA/MIL/TIFU, splits into sentences. No AI, instant.
How it works
- 1Open it on Apify
Hit Run on Apify — it opens the tool in the cloud, no install.
- 2Set the inputs
Adjust
text,texts,expandAbbreviations(sensible defaults are pre-filled). - 3Click Run
The tool runs on Apify’s cloud and collects the data for you.
- 4Export the results
Download as JSON, CSV or Excel, or pipe straight into your app, Google Sheets, or an AI agent.
Inputs
| Field | What it does | Type |
|---|---|---|
text | A single block of text to clean (e.g. a Reddit post body). | string |
texts | An array of strings OR post objects (uses scriptText/narration/selftext/body/text). Lets you pipe the Reddit Scraper's output straight in. | array |
expandAbbreviations | Expand Reddit/internet abbreviations for TTS (AITA → Am I the asshole, MIL → mother-in-law, IMO → in my opinion…). | boolean |
profanityMode | keep = leave as-is · soft = swap for mild words (great for monetization-safe TTS) · censor = f*** · remove = delete. | string |
wpm | Words-per-minute used to estimate read time. | integer |
What you get
A structured dataset — each result includes fields like:
charCountcleanedhookScoreoriginalreadTimeSecondssentenceCountttsSegmentswordCountExport every run as JSON, CSV or Excel, or send it to your app, a database, Google Sheets, or an AI agent.
2 ready-to-run use cases
Clean a Reddit AITA Post for TTS Narration
Paste one r/AmItheAsshole post and get narration-ready text for faceless YouTube and TikTok voiceovers: markdown stripped, AITA spelled out, links removed.
Bulk Reddit Post Cleaner for Batch TTS Voiceover
Reddit story channels pipe in an array of scraped posts and get every body cleaned, then split into sentences for batch TTS voiceover videos.
Reddit Text Cleaner
Reddit and forum text is full of stuff that wrecks text-to-speech: markdown asterisks, link syntax, "Edit:" stamps, emoji, and abbreviations like AITA that a voice reads letter by letter. This actor cleans all of that out and hands back narration that's ready to feed into a TTS engine. It's built for people generating Reddit story videos or audio at scale, where the cleanup step needs to be cheap and predictable.
How it works
Pure rules, no model. It runs a fixed pipeline of regex passes (strip markdown and links, drop edit-stamps and emoji, expand abbreviations, then optionally rewrite profanity), splits the result into sentences, and returns it. Same input always gives the same output, and it returns instantly.
Input
Nothing is strictly required, but you need to pass text one way or another. Use text for a single block, or texts for a batch. If both are present they're all processed.
| Field | Required | Notes |
|---|---|---|
text | no | One block of text to clean, e.g. a post body. |
texts | no | Array of strings or post objects. For objects it reads scriptText, narration, selftext, body, or text, in that order. Lets you pipe the Reddit Scraper's output in directly. |
expandAbbreviations | no | Expand internet shorthand for TTS: AITA to "Am I the asshole", MIL to "mother-in-law", IMO, TIFU, and so on. Default true. |
profanityMode | no | keep leaves swears as-is, soft swaps in mild words (handy for ad-safe narration), censor masks them as f***, remove deletes them. Default keep. |
wpm | no | Words per minute used to estimate read time. Default 150. |
Output
One dataset item per input text. The cleaned narration is in cleaned, and ttsSegments is that same text split into sentences if you want to render audio per line. You also get wordCount, sentenceCount, charCount, readTimeSeconds (based on your wpm), a hookScore for the opening line, and the truncated original.
Example
{
"text": "AITA for leaving? **So** here's the _story_. Check [this](https://x.com).\n\nEdit: thanks for the awards! TL;DR: I left.",
"expandAbbreviations": true,
"profanityMode": "soft"
}
Pricing
$0.0002 per text cleaned. Pay per result, no subscription.
Notes
Everything here is rule-based, so there's no OpenAI key needed and nothing to configure for the AI path. The trade-off is that abbreviation and profanity handling cover a curated list rather than every possible variant, so an obscure acronym may pass through untouched.