Skip to content

Audio

Updated: 2026-05

1. About This Page

This guide covers Runway’s audio generation features (text-to-speech, sound effects, and lip-syncing). Adding audio to your videos instantly makes your prototypes much more compelling.

2. The Three Types of Sound

There are basically three types of sound used in video prototyping:

Type Runway Features Use
Dialogue & Narration Text-to-Speech, Lip-Sync Character dialogue, commentary
Sound Effects (SFX) Generative Audio Ambient sounds, specific sounds
Background Music (BGM) (Runway alone is limited) Atmosphere, mood

Runway excels at the first two tasks, but it’s often better to create the background music using a separate tool (such as Suno, AIVA, or Adobe Firefly Audio).

3. Text-to-Speech

Basic text-to-speech functionality.

3.1 How to Use

  1. Left navigation bar → Generative AudioText to Speech
  2. Enter text (up to 600 characters per speaker)
  3. Select Voice — Choose from a wide range of preset voices
  4. Generate → Generate audio file

3.2 Types of Audio

  • Preset Voices — Runway’s standard presets (free)
  • Custom Voice — You can also clone your own voice through the ElevenLabs integration (subject to plan restrictions)

3.3 Examples of Narration Use

  • A narration at the beginning of the work, such as “This is a story set in a city of the future”
  • A character’s inner thoughts (monologue)
  • A voiceover reading explanatory subtitles

4. Lip-syncing

The ultimate feature: Make people in videos lip-sync to any audio track you choose.

4.1 Basic Workflow

  1. Left navigation → Generative AudioLip Sync
  2. Video Source: Upload a video (showing a person’s face)
  3. Audio Source: Specify the audio in one of three ways
    • Text input (Text-to-Speech runs in the background)
    • Upload an audio file
    • Record on the spot
  4. Generate → A video with lip movements synchronized to the audio

4.2 Constraints

  • Up to 600 characters per dialogue
  • Up to 10 dialogues per video (supports multiple speakers)
  • Videos where faces are shown from the front to a three-quarter front view have a higher success rate
  • Accuracy decreases for side profiles and dark faces

4.3 Relationship with Act-Two

Act-Two transcribes the acting itself, while Lipsync specializes in synchronizing lip movements with audio.

Purpose Recommendation
Want to include facial expressions and body language Act-Two
Want to add dialogue to existing video Lipsync
Only need the mouth to move Lipsync (Light)

5. Sound Effects (SFX)

You can also generate sound effects using Runway’s Generative Audio.

5.1 How to Use

  1. Generative AudioSound Effects (or Generate Sound)
  2. Describe what kind of sound you want in text

Examples:

  • “Footsteps on wet pavement, slow pace”
  • “Distant thunder, low rumble”
  • “Birds chirping on a forest morning”

5.2 Using Multiple Libraries

You don’t have to generate everything with AI. Make use of free sound effect libraries as well:

For class projects, a hybrid approach combining AI-generated content and existing libraries is a practical option.

6. Background Music

Runway alone isn’t very good at generating background music. The standard approach is to use it as follows:

  • Suno — Generates music from text (free plan available)
  • AIVA — Focuses on film music
  • Adobe Firefly Audio — By Adobe
  • YouTube Audio Library — Royalty-free, for commercial use

Add the generated or selected background music to the Runway Editor timeline.

7. Volume Balance

Adjustments after adding to the timeline:

  • Background music: -20 to -25 dB (low)
  • Sound effects: -15 to -20 dB
  • Dialogue: -5 to -10 dB (highest)

The basic principle is to maintain a state where the dialogue is always the loudest, rather than focusing on specific dB values.

8. Fade In and Fade Out

Fade the audio at both ends of the clip:

  • At a minimum, fade the first and last second of the BGM
  • Sounds that start or end abruptly sound very amateurish
  • Use the Editor to drag the edges of each audio clip to set the fade

9. Copyright Notice

The following rules must be strictly observed even for class projects:

  • Use of commercial music (e.g., songs from Apple Music)
  • Extracting audio from YouTube videos
  • ✓ AI-generated voice and sound effects
  • ✓ Creative Commons materials (check licenses such as CC BY)
  • ✓ Materials declared to be copyright-free

Since the video may be made public outside the university during the final presentation, we will use only royalty-free materials from the start.

10. Priorities in Sound Design

Priorities for video prototyping when time is limited:

  1. Voice lines, if any (Lipsync or Text-to-Speech)
  2. 1–2 main sound effects (distinctive sounds such as footsteps or ambient noise)
  3. Background music (fade in softly at the end)
  4. Other minor sound effects

If you aim for perfection, you’ll end up wasting an endless amount of time, so it’s important to know when to declare the project 80% complete.

11. Example Workflow for the Practical Training Session

Sound Design for a 30-Second Short Film:

Time Sound
0:00-0:05 BGM fades in, snow sound effects
0:05-0:15 Protagonist’s narration (lip-sync), BGM continues
0:15-0:25 Ambient sounds, footsteps, BGM emphasized
0:25-0:30 BGM fades out

A 30-second clip can be composed of 3 to 4 sound clips. The workload for one group is 30 to 60 minutes.

12. What’s Next

  • Video Prototyping Mindset — A mindset for defining your goals
  • Limits and Next — Runway’s limitations and other models