Audio
Updated: 2026-05
1. About This Page
This guide covers Runway’s audio generation features (text-to-speech, sound effects, and lip-syncing). Adding audio to your videos instantly makes your prototypes much more compelling.
2. The Three Types of Sound
There are basically three types of sound used in video prototyping:
| Type | Runway Features | Use |
|---|---|---|
| Dialogue & Narration | Text-to-Speech, Lip-Sync | Character dialogue, commentary |
| Sound Effects (SFX) | Generative Audio | Ambient sounds, specific sounds |
| Background Music (BGM) | (Runway alone is limited) | Atmosphere, mood |
Runway excels at the first two tasks, but it’s often better to create the background music using a separate tool (such as Suno, AIVA, or Adobe Firefly Audio).
3. Text-to-Speech
Basic text-to-speech functionality.
3.1 How to Use
- Left navigation bar → Generative Audio → Text to Speech
- Enter text (up to 600 characters per speaker)
- Select Voice — Choose from a wide range of preset voices
- Generate → Generate audio file
3.2 Types of Audio
- Preset Voices — Runway’s standard presets (free)
- Custom Voice — You can also clone your own voice through the ElevenLabs integration (subject to plan restrictions)
3.3 Examples of Narration Use
- A narration at the beginning of the work, such as “This is a story set in a city of the future”
- A character’s inner thoughts (monologue)
- A voiceover reading explanatory subtitles
4. Lip-syncing
The ultimate feature: Make people in videos lip-sync to any audio track you choose.
4.1 Basic Workflow
- Left navigation → Generative Audio → Lip Sync
- Video Source: Upload a video (showing a person’s face)
- Audio Source: Specify the audio in one of three ways
- Text input (Text-to-Speech runs in the background)
- Upload an audio file
- Record on the spot
- Generate → A video with lip movements synchronized to the audio
4.2 Constraints
- Up to 600 characters per dialogue
- Up to 10 dialogues per video (supports multiple speakers)
- Videos where faces are shown from the front to a three-quarter front view have a higher success rate
- Accuracy decreases for side profiles and dark faces
4.3 Relationship with Act-Two
Act-Two transcribes the acting itself, while Lipsync specializes in synchronizing lip movements with audio.
| Purpose | Recommendation |
|---|---|
| Want to include facial expressions and body language | Act-Two |
| Want to add dialogue to existing video | Lipsync |
| Only need the mouth to move | Lipsync (Light) |
5. Sound Effects (SFX)
You can also generate sound effects using Runway’s Generative Audio.
5.1 How to Use
- Generative Audio → Sound Effects (or Generate Sound)
- Describe what kind of sound you want in text
Examples:
- “Footsteps on wet pavement, slow pace”
- “Distant thunder, low rumble”
- “Birds chirping on a forest morning”
5.2 Using Multiple Libraries
You don’t have to generate everything with AI. Make use of free sound effect libraries as well:
- Freesound.org — A large library of Creative Commons audio files
- Pixabay Audio — Available for commercial use
- BBC Sound Effects — Over 50,000 high-quality sound effects
For class projects, a hybrid approach combining AI-generated content and existing libraries is a practical option.
6. Background Music
Runway alone isn’t very good at generating background music. The standard approach is to use it as follows:
- Suno — Generates music from text (free plan available)
- AIVA — Focuses on film music
- Adobe Firefly Audio — By Adobe
- YouTube Audio Library — Royalty-free, for commercial use
Add the generated or selected background music to the Runway Editor timeline.
7. Volume Balance
Adjustments after adding to the timeline:
- Background music: -20 to -25 dB (low)
- Sound effects: -15 to -20 dB
- Dialogue: -5 to -10 dB (highest)
The basic principle is to maintain a state where the dialogue is always the loudest, rather than focusing on specific dB values.
8. Fade In and Fade Out
Fade the audio at both ends of the clip:
- At a minimum, fade the first and last second of the BGM
- Sounds that start or end abruptly sound very amateurish
- Use the Editor to drag the edges of each audio clip to set the fade
9. Copyright Notice
The following rules must be strictly observed even for class projects:
- ✗ Use of commercial music (e.g., songs from Apple Music)
- ✗ Extracting audio from YouTube videos
- ✓ AI-generated voice and sound effects
- ✓ Creative Commons materials (check licenses such as CC BY)
- ✓ Materials declared to be copyright-free
Since the video may be made public outside the university during the final presentation, we will use only royalty-free materials from the start.
10. Priorities in Sound Design
Priorities for video prototyping when time is limited:
- Voice lines, if any (Lipsync or Text-to-Speech)
- 1–2 main sound effects (distinctive sounds such as footsteps or ambient noise)
- Background music (fade in softly at the end)
- Other minor sound effects
If you aim for perfection, you’ll end up wasting an endless amount of time, so it’s important to know when to declare the project 80% complete.
11. Example Workflow for the Practical Training Session
Sound Design for a 30-Second Short Film:
| Time | Sound |
|---|---|
| 0:00-0:05 | BGM fades in, snow sound effects |
| 0:05-0:15 | Protagonist’s narration (lip-sync), BGM continues |
| 0:15-0:25 | Ambient sounds, footsteps, BGM emphasized |
| 0:25-0:30 | BGM fades out |
A 30-second clip can be composed of 3 to 4 sound clips. The workload for one group is 30 to 60 minutes.
12. What’s Next
- Video Prototyping Mindset — A mindset for defining your goals
- Limits and Next — Runway’s limitations and other models
