What is audio ducking?

GlossaryDefinition

Audio Ducking Definition

Audio ducking is the automated process of temporarily lowering the volume of background music or ambient sound when dialogue or other primary audio occurs, ensuring speech remains clear and intelligible.

Why audio ducking matters for video production

Great video audio should feel invisible. Viewers should hear dialogue clearly without consciously noticing background music. When music competes with speech, viewers strain to understand words or miss critical information entirely. When music stops abruptly to make room for speech, the transition feels jarring. Audio ducking solves both problems by smoothly reducing background audio whenever primary audio occurs.

Manually keyframing music volume throughout a video is tedious and time-consuming. An editor must listen to the entire piece, identify every moment where speech occurs, and create volume automation curves that lower music appropriately. For a 10-minute video with frequent dialogue, this might mean hundreds of manual keyframes. Audio ducking automates this entirely.

How audio ducking works

Traditional audio ducking uses sidechain compression — a dynamics processor that listens to one audio source (the dialogue track) and automatically reduces the volume of another (the music track) whenever the dialogue exceeds a threshold. The reduction amount, speed, and release time are adjustable parameters. When dialogue stops, the music volume smoothly returns to its original level.

AI-enhanced ducking goes further by understanding audio semantically. Rather than simply reacting to volume level, AI can distinguish speech from non-speech sounds (coughs, breath, background noise), ensuring music only ducks for actual dialogue. It can also adapt ducking intensity based on the importance of the speech — a casual aside might trigger less reduction than a critical piece of narration.

Best practices for audio ducking

Set appropriate thresholds: Too low, and music ducks for every tiny sound including breaths and background noise. Too high, and quiet dialogue does not trigger ducking. Start conservative (higher threshold) and lower until all dialogue triggers ducking reliably.

Adjust attack and release times: Attack determines how quickly music volume drops when dialogue starts. Too fast sounds unnatural; too slow means the first syllables get buried. Release controls how quickly music returns after dialogue ends. Faster release sounds more responsive but can create pumping effects if dialogue is sporadic. Typical values: 10-30ms attack, 200-500ms release.

Choose appropriate reduction depth: Music does not need to disappear entirely — just enough reduction to make dialogue clear. Typical ducking reduces music by 6-12 dB. More aggressive reduction (15+ dB) effectively silences music and can feel heavy-handed.

Consider music selection: Dense, busy music with prominent midrange frequencies competes with speech more than sparse, atmospheric music with strong bass and treble. Choose background music that naturally occupies a different frequency space than dialogue, requiring less aggressive ducking.

Common ducking mistakes

Over-ducking: Reducing music so aggressively that the volume pumping becomes distracting. The goal is subtle support for dialogue clarity, not dramatic volume swings.

Ducking everything: Not all background audio needs to duck. Ambient sound (room tone, outdoor atmosphere) typically does not conflict with dialogue the way music does. Ducking everything creates an unnatural vacuum around speech.

Ignoring music phrasing: Ducking cuts across musical phrases randomly based on when dialogue occurs. In music-driven content, consider timing dialogue to align with musical structure when possible, so ducking does not awkwardly interrupt musical moments.

Applications beyond dialogue

Audio ducking applies to any scenario with primary and secondary audio:

Sound effects ducking music during dramatic moments
Voiceover ducking interview audio when both need to coexist
Narration ducking ambient sound in documentaries
Podcast intro music ducking for host voice

How ShotAI relates to audio ducking

ShotAI's audio analysis capabilities can identify sections of video where dialogue is present, enabling editors to quickly locate and review segments that will require audio ducking when adding music, streamlining the audio post-production workflow.

Audio Ducking Definition

Why audio ducking matters for video production

How audio ducking works

Best practices for audio ducking

Common ducking mistakes

Applications beyond dialogue

How ShotAI relates to audio ducking

Related Terms

Automated Video Transcription

Audio Synchronization

Motion Graphics

Start using ShotAI
for free today

Why audio ducking matters for video production

How audio ducking works

Best practices for audio ducking

Common ducking mistakes

Applications beyond dialogue

How ShotAI relates to audio ducking

Related Terms

Automated Video Transcription

Audio Synchronization

Motion Graphics

Start using ShotAIfor free today

Start using ShotAI
for free today