Voice Generator

Overview

The Voice Generator lets you create professional AI voiceovers using OpenAI or ElevenLabs text-to-speech. Write or generate scripts with the AI assistant, select a voice, fine-tune settings, and produce audio files with synced captions — all from a visual timeline editor.

Getting Started

Navigate to Marketing & Content → Voice Generator in the sidebar.
Click + New Project to create a voice project.
Write your script in the Script panel, or use the AI Assistant chat on the right to generate one.
Select a voice from the Voice Settings panel and adjust speed.
Click Generate Voice to create the audio.

AI Script Assistant

The AI chat sidebar is a multi-turn conversation that remembers context. You can:

Ask it to write a full script: "Write a 30-second ad for our coffee blend"
Refine iteratively: "Make it more energetic" or "Shorten to 15 seconds"
Use as full script — replaces the entire script panel
Add to timeline — adds the snippet as a new caption segment at the end

Voice Settings

Configure the text-to-speech provider and voice:

Voice selection — choose from OpenAI voices (Alloy, Echo, Fable, Onyx, Nova, Shimmer) or ElevenLabs voices. Click Preview to hear a sample.
Speed — 0.25x to 4x (OpenAI only).
ElevenLabs options (shown when an ElevenLabs voice is selected): Stability, Similarity Boost, Style, Speaker Boost.

Timeline Editor

The dark-themed timeline shows your audio project across multiple tracks:

VOICE track — shows generated audio blocks (purple). Click segments to play.
WAVE track — real waveform visualization colored by amplitude (green = quiet, yellow = medium, red = loud).
VOLUME track — volume keyframe envelope with draggable points.
CAPTIONS track — caption blocks that can be dragged, resized, split, and merged.
VU Meter — vertical level meter on the right with green/yellow/red zones showing real-time audio levels during playback.

Playback Controls

Play/Pause — click the play button or press Space
Scrub — click and drag on the ruler to move the playhead
Skip — J back 5s, L forward 5s
Nudge — ← / → arrows move 1 second
Jump — Home to start, End to end

Caption Editing

Captions are auto-generated with timestamps when voice is generated. Edit them on the timeline:

Drag caption blocks to reposition timing
Resize — drag left/right edges to adjust start/end
Double-click — edit text and timing in a modal
Right-click — context menu: Edit, Generate voice, Split at playhead, Merge with next, Delete
Split — press C to split selected caption at the playhead
Merge — press M to merge selected with the next caption
Navigate — [ / ] to select previous/next caption
Sync — click the Sync button to rebuild the script text from caption order

Natural pauses are added between sentences: periods get 0.35s, questions 0.3s, other 0.15s.

Per-Segment Voice Generation

Instead of generating one audio from the entire script, you can build it piece by piece:

Each caption in the list shows a mic icon — click to generate voice for just that segment.
After generation, a green play icon appears to preview the segment.
Use Generate All Segments to process all captions that don't have audio yet.
From AI chat, click Add to timeline to add a snippet as a new segment.
Press G to generate voice for the selected segment, P to play it.

Volume Keyframes

The VOLUME track lets you control loudness across the timeline:

Click on the track to add a keyframe point
Drag keyframe dots horizontally (time) and vertically (volume: 0x at bottom, 2x at top, 1x at center)
Right-click a keyframe to delete it
Press V to add a keyframe at the current playhead position
The envelope line shows the interpolated volume curve with a color gradient
During playback, volume changes are applied smoothly in real-time

Auto-Normalize

Click the Normalize button (magic wand icon) in the playback bar to automatically analyze the audio and generate keyframes that even out the volume — boosting quiet parts and reducing loud parts.

Click the eraser icon to clear all keyframes and reset to normal volume.

Export & Save

Audio button — download the original generated MP3
SRT button — download the captions as an SRT subtitle file
Save as New — export audio with volume keyframes baked in as a new file saved to the Gallery
SRT/VTT export — buttons in the captions panel to download subtitle files
All generated audio, SRT files, and exports appear in the Gallery under their respective categories (Voice Audio, Voice Captions, Voice Exports)

Video Generator Integration

Voice projects can be imported into the Video Generator:

In the Video Generator, look for the Voice Projects panel in the sidebar.
Click a completed voice project to import it.
The audio becomes the video's voiceover, and the script + captions are imported automatically.
Projects with captions show a +captions badge.

Keyboard Shortcuts

Key	Action
`Space`	Play / Pause
`J` / `L`	Skip back / forward 5s
`←` / `→`	Nudge playhead 1s
`Home` / `End`	Jump to start / end
`N`	New caption at playhead
`C`	Cut (split) selected caption
`E`	Edit selected caption
`M`	Merge selected with next
`G`	Generate voice for selected segment
`P`	Play selected segment
`V`	Add volume keyframe at playhead
`[` / `]`	Select prev / next caption
`Del`	Delete selected caption
`Ctrl+S`	Save script
`Esc`	Deselect

Token Usage

Voice generation: 2 tokens per character (minimum 100 tokens)
Segment generation: 2 tokens per character (minimum 50 tokens)
AI script writing: standard AI token billing based on input/output length