Overview
The Voice Generator lets you create professional AI voiceovers using OpenAI or ElevenLabs text-to-speech. Write or generate scripts with the AI assistant, select a voice, fine-tune settings, and produce audio files with synced captions — all from a visual timeline editor.
Getting Started
- Navigate to Marketing & Content → Voice Generator in the sidebar.
- Click + New Project to create a voice project.
- Write your script in the Script panel, or use the AI Assistant chat on the right to generate one.
- Select a voice from the Voice Settings panel and adjust speed.
- Click Generate Voice to create the audio.
AI Script Assistant
The AI chat sidebar is a multi-turn conversation that remembers context. You can:
- Ask it to write a full script: "Write a 30-second ad for our coffee blend"
- Refine iteratively: "Make it more energetic" or "Shorten to 15 seconds"
- Use as full script — replaces the entire script panel
- Add to timeline — adds the snippet as a new caption segment at the end
Voice Settings
Configure the text-to-speech provider and voice:
- Voice selection — choose from OpenAI voices (Alloy, Echo, Fable, Onyx, Nova, Shimmer) or ElevenLabs voices. Click Preview to hear a sample.
- Speed — 0.25x to 4x (OpenAI only).
- ElevenLabs options (shown when an ElevenLabs voice is selected): Stability, Similarity Boost, Style, Speaker Boost.
Timeline Editor
The dark-themed timeline shows your audio project across multiple tracks:
- VOICE track — shows generated audio blocks (purple). Click segments to play.
- WAVE track — real waveform visualization colored by amplitude (green = quiet, yellow = medium, red = loud).
- VOLUME track — volume keyframe envelope with draggable points.
- CAPTIONS track — caption blocks that can be dragged, resized, split, and merged.
- VU Meter — vertical level meter on the right with green/yellow/red zones showing real-time audio levels during playback.
Playback Controls
- Play/Pause — click the play button or press Space
- Scrub — click and drag on the ruler to move the playhead
- Skip — J back 5s, L forward 5s
- Nudge — ← / → arrows move 1 second
- Jump — Home to start, End to end
Caption Editing
Captions are auto-generated with timestamps when voice is generated. Edit them on the timeline:
- Drag caption blocks to reposition timing
- Resize — drag left/right edges to adjust start/end
- Double-click — edit text and timing in a modal
- Right-click — context menu: Edit, Generate voice, Split at playhead, Merge with next, Delete
- Split — press C to split selected caption at the playhead
- Merge — press M to merge selected with the next caption
- Navigate — [ / ] to select previous/next caption
- Sync — click the Sync button to rebuild the script text from caption order
Natural pauses are added between sentences: periods get 0.35s, questions 0.3s, other 0.15s.
Per-Segment Voice Generation
Instead of generating one audio from the entire script, you can build it piece by piece:
- Each caption in the list shows a mic icon — click to generate voice for just that segment.
- After generation, a green play icon appears to preview the segment.
- Use Generate All Segments to process all captions that don't have audio yet.
- From AI chat, click Add to timeline to add a snippet as a new segment.
- Press G to generate voice for the selected segment, P to play it.
Volume Keyframes
The VOLUME track lets you control loudness across the timeline:
- Click on the track to add a keyframe point
- Drag keyframe dots horizontally (time) and vertically (volume: 0x at bottom, 2x at top, 1x at center)
- Right-click a keyframe to delete it
- Press V to add a keyframe at the current playhead position
- The envelope line shows the interpolated volume curve with a color gradient
- During playback, volume changes are applied smoothly in real-time
Auto-Normalize
Click the Normalize button (magic wand icon) in the playback bar to automatically analyze the audio and generate keyframes that even out the volume — boosting quiet parts and reducing loud parts.
Click the eraser icon to clear all keyframes and reset to normal volume.
Export & Save
- Audio button — download the original generated MP3
- SRT button — download the captions as an SRT subtitle file
- Save as New — export audio with volume keyframes baked in as a new file saved to the Gallery
- SRT/VTT export — buttons in the captions panel to download subtitle files
- All generated audio, SRT files, and exports appear in the Gallery under their respective categories (Voice Audio, Voice Captions, Voice Exports)
Video Generator Integration
Voice projects can be imported into the Video Generator:
- In the Video Generator, look for the Voice Projects panel in the sidebar.
- Click a completed voice project to import it.
- The audio becomes the video's voiceover, and the script + captions are imported automatically.
- Projects with captions show a +captions badge.
Keyboard Shortcuts
| Key | Action |
|---|---|
| Space | Play / Pause |
| J / L | Skip back / forward 5s |
| ← / → | Nudge playhead 1s |
| Home / End | Jump to start / end |
| N | New caption at playhead |
| C | Cut (split) selected caption |
| E | Edit selected caption |
| M | Merge selected with next |
| G | Generate voice for selected segment |
| P | Play selected segment |
| V | Add volume keyframe at playhead |
| [ / ] | Select prev / next caption |
| Del | Delete selected caption |
| Ctrl+S | Save script |
| Esc | Deselect |
Token Usage
- Voice generation: 2 tokens per character (minimum 100 tokens)
- Segment generation: 2 tokens per character (minimum 50 tokens)
- AI script writing: standard AI token billing based on input/output length