Generate AI voiceover with Text-To-Speech

Summary

Generate an AI-dubbed voiceover in the target language with alugha’s Text-To-Speech. This is Step 3 of the alugha AI pipeline (Speech-To-Text → Automated Translation → Text-To-Speech) — it turns your translated transcript into a ready-to-listen dubbed audio track.

Prerequisites

Before you begin:

A translated transcript on your target-language track — run Automated Translation first if you have not already.
Enough credits in your wallet — Text-To-Speech shows the exact cost before you submit.
One or more voices set up on the target-language track. By default, alugha assigns default voices for the target language — you can swap them before generating.

Step-by-Step Instructions

1. Switch to the target-language tab

Open the video in the dubbr and click the language tab for your target language — the language you translated into in Step 2 (for example FRA). Text-To-Speech always runs on the currently selected track.

Check that the target track already has translated segments. Without a translated transcript there is nothing for Text-To-Speech to voice.

2. Open Automation → Text To Speech

In the toolbar click Automation and choose Text To Speech. The tooltip reads: “Create an AI-generated audio dubbing from your subtitle text in the selected language.”

alugha dubbr Automation menu with Text To Speech option highlighted and tooltip to generate AI voiceover text to speech

3. Review the voice assignments

The TEXT TO SPEECH dialog opens with the current track indicator, for example Track: FRA. Each voice slot on the track is listed — typically voice_0, voice_1, and any custom voices you have added.

For each voice slot the dialog shows:

The voice name assigned to that slot (for example Alain (FR) or Jerome (FR)).
A playback preview so you can hear what the voice sounds like before committing.
A dropdown to swap the voice — pick a different voice if the default does not match the speaker.

If you have multiple speakers in the video, assign a different voice to each speaker for a more natural-sounding dub. Use the play button to preview each voice before you submit.

4. Review credits and click SUBMIT

The bottom of the dialog shows the credit cost breakdown:

Credits left in your wallet — your current balance.
Required credits for this action — what Text-To-Speech will cost for this video at this length.
Credits remaining afterwards — your balance once the job runs.

Click SUBMIT to start generation, or CANCEL to close the dialog without spending credits.

alugha Text-To-Speech dialog for FRA track with voice_0 Alain FR and voice_1 Jerome FR, playback previews, and credit cost to generate AI voiceover text to speech

What happens next

Text-To-Speech runs in the background. The job appears under Active jobs and moves to Completed when generation finishes, labeled Text-to-Speech with the target-language code (for example FRA).

Click Reload now to refresh the editor — your target-language tab now has generated audio for every translated segment.

alugha Active jobs panel showing Text-to-Speech completed for FRA target track with Reload now button to generate AI voiceover text to speech

Play the video with the target language selected to hear the AI dub. If a specific segment sounds off, re-generate just that segment from the segment menu or edit the translated text and re-run Text-To-Speech.

Good to know

Text-To-Speech is Step 3 of the AI pipeline — it requires a translated transcript. You cannot run Text-To-Speech on an empty track.
Each voice slot can be swapped before generation. Picking the right voice per speaker makes the dub sound far more natural.
Every Text-To-Speech run costs credits. Re-running on the same track regenerates and replaces the audio — you pay again.
You can always edit individual segments after generation — adjust the translated text, then re-run Text-To-Speech (on the whole track or just that segment).
Voices are language-specific. When you switch tracks, the dialog shows voices available for that language.

Troubleshooting

Text To Speech is greyed out or missing:

Switch to the target-language tab — Text-To-Speech runs on the current track, not on the PROJECT tab.
Confirm the track has translated segments. A completely empty track has nothing for Text-To-Speech to voice.

The generated voice sounds wrong:

Preview voices before clicking SUBMIT — use the play button on each slot.
Swap a voice via the dropdown and re-run. Different voices have different timbre, pace, and accent.
For multi-speaker videos, assign a different voice per speaker slot instead of using one voice for everyone.

A few segments sound off:

Open the segment and check the translated text — odd phrasing often reads better than it sounds. Edit the text, then re-run Text-To-Speech.
Segments with unusual punctuation, numbers, or abbreviations sometimes need manual tweaks for the TTS engine to pronounce them correctly.

Not enough credits:

The dialog shows Credits remaining afterwards — if it goes negative, top up before submitting.
Re-generating only the segments you changed costs less than regenerating the whole track. Use per-segment regeneration to iterate cheaply.

Was this article helpful?