A screenshot from Jon Benjamin Has a Van featuring a

It will almost certainly seem counter-intuitive to talk about captioning silences. That’s because nearly every definition of closed captioning begins and ends with measurable, acoustically quantifiable sounds. For example, Wikipedia describes closed captioning as a “transcription of the audio portion of a program as it occurs (either verbatim or in edited form), sometimes including non-speech elements.” The FCC defines closed captioning as “A service for persons with hearing disabilities that translates television program dialog into written words on the television screen.” Whereas the Wikipedia entry seems to leave room for the possibility that the entire audio track (i.e. all sounds) might be captioned, the FCC focuses narrowly on “dialog” or speech. Nevertheless, both definitions are concerned exclusively (and not surprisingly) with sound.

Yet captioning is complex, perhaps more complex than we’ve ever previously considered. In some well-defined situations, silence must be captioned. We need at times to make room on the caption track for sudden quiet, the illusion of speech (i.e. mouthed words), intentional loss of audio, and the cessation of sustained sounds. While a lack of sound would seem, at first blush, to fall outside the captioner’s purview, it becomes significant for the captioner when it is tied to our visual expectations about how sound works.

Let me share a few examples, organized into three categories. The categories are not exhaustive. All of the examples below are pulled from the official versions shown on TV or DVD.

Captioning the illusion of audible speech

When a speaker is only mouthing words and not making any audible sounds, captions need to indicate the absence of sound so that deaf and hard-of-hearing viewers don’t think that the speaker is making audible utterances. The following examples from Monk use “mouthing words” in the captions to indicate mouthed or silent speech. The first Monk clip is from Season 1, episode 12 (2002): “Mr. Monk and the Red-Headed Stranger.” The second clip (which includes two examples) is from Season 1, episode 13 (2002), “Mr. Monk and the Airplane.”


The following clip from a 1993 episode of Star Trek: The Next Generation (Season 6, episode 18: “Starship Mine“) uses “no audio” in the captions to indicate mouthed speech. (This episode was rerun recently on the SyFy channel.) No audio in this case is less effective as a caption than mouthing words because it doesn’t help us understand the character’s intent. No audio could mean that the audio of the original recording was damaged and the captioner simply couldn’t make out what was said.

The following clip from Drive Angry (2011) uses “inaudible” in the captions to indicate that speech between the two characters can not be heard. Their speech is silent to viewers, not just indistinct. The characters aren’t mouthing words here but presumably speaking to each other privately. A caption is needed to indicate that their speech can not be heard, even though it looks like they are audibly speaking.

In the following clip from Nick and Norah’s Infinite Playlist (2008), viewers experience part of the scene from the perspective of a drunk character locked alone in a car who can’t hear her friends yelling at her. The illusion of speech is captioned as “all speaking inaudibly.” The scene momentarily foregrounds mood music and reduces the friends’ speech to silence. (Warning: The end of this clip is NSFW but is included here to show how the speech sounds outside of the car become audible.)

In the following clip from Twins (1988), inaudible speech is captioned as “silence.” When combined with the familiar visual cue of two speakers separated by a pane of prison glass, this caption adequately conveys what’s going on.

Another “between the glass” example occurs in an episode of The Office (Season 5, episode 21: Two Weeks). Michael is saying farewell to his employees from the parking lot, but they can’t hear him because they are inside the building looking through the windows. Because his speech is inaudible but might be mistaken for audible speech by deaf and hard of hearing viewers, it needs to be captioned.

The illusion of speech isn’t always indicated in the caption track as it should be. In this clip from a 2011 episode of South Park (Season 15, episode 7: “You’re Getting Old“), music is indicated in the captions, but it’s not clear in the captions that music is the only audible sound or that Randy’s words of comfort to his son have no audio. Caption viewers will not necessarily know that Randy does not audibly speak in this clip. If some viewers expect music and speech to co-exist (e.g. music in the background, speech in the foreground), then a deviation from what is expected may need to be indicated in the captions. Put simply, it looks like Randy is audibly speaking. Because he isn’t, a caption is needed.

Mouthed/silent speech turns the tables on hearing viewers by asking them to read lips, to wring meaning from soundless but seemingly sound-producing mouths. The illusion of audible speech makes hearing viewers hard-of-hearing, at least temporarily.

Captioning the cessation of sound

Captions may also need to indicate that a sustained sound has terminated if 1) it isn’t visually obvious or 2) the termination of the sound is significant to the narrative. In the following examples, captions alert us to a change in state from sound to no sound. These changes may affect a single sound (e.g. a phone stops ringing in Knight and Day) or encompass the sound track itself (e.g. complete loss of audio in Jon Benjamin Has a Van).

In the following example from 2010’s Knight and Day, the sustained, repetitive sound of a cell phone ring tone comes to an end when Tom Cruise seems to decline the call (indicated by a clicking sound). But because viewers can’t clearly see him silence the ringtone, it needs to be indicated in the captions. Note that the clicking sound isn’t captioned but rather the end of the ringtone itself (“ringtone stops“).

When the fanfare that accompanies the 20th Century Fox logo ends, the captions alert us with “fanfare ends.” (I took this example from the opening to Alien vs. Predator: Requiem (2007), but the same approach to captioning the opening fanfare is also used on other 20th Century Fox movies.)

When it’s not clear that a repetitive or sustained sound has terminated, a caption is needed if the sound is significant. In this clip from a 2010 episode of Two and a Half Men (Season 7, Episode 18: “Ixnay on the Oggie Day“), the repetitive sound of slapping continues unseen in the next room. A caption alerts us when the sound stops (“slapping stops“). (Warning: This clip is NSFW but still a great example of how repetitive or sustained sounds that can’t be seen need to be captioned when they terminate.)

Another repetitive sound comes to an end in Unknown (2011) when the pounding sound of the MRI machine stops (signaled by “thumping stops“), after starting up 43 seconds earlier with a “machine thumping” caption. The termination of the MRI thumping is immediately preceded by a caption for “inaudible dialogue” (misspelled as “dialouge”) to account for the illusion of audible speech in the flashback/dream sequence.

In this music video for Rush’s “Closer to the Heart” (recently broadcast on VH1 Classic), the lengthy jam session at the end of the video finally comes to an end (“music ends“).

Another example of how captions can alert us when music stops or ends: In the final seconds of Alien vs. Predator: Requiem (2007), the “ominous theme building thematically” comes to an end (“music ends“), the movie fades to black, and the credits roll.

Likewise, in 2008’s Cloverfield, the sustained sound of party music is stopped (“music stops“).

What originally interested me about Cloverfield was the intermittent loss of audio that occurs in a couple places as part of the handheld, amateur style of the film. These moments of intermittent, complete silence are not captioned but seem significant nonetheless. Here’s one example:

Total loss of audio is used for humorous effect in a 2011 episode of Jon Benjamin Has a Van (Season 1, episode 4: “Breakdown“). The same amateur style provides the context for a joke about the fourth wall and what TV audiences take for granted. The scene is remarkable from both a captioning and disability perspective. “Audio crackling” turns to “silence” as the sound track changes over to no audio (or smooth static). Because silence is contextually significant (not to mention unexpected), it must be captioned. (I originally found this example when I became unnerved by the static coming from the TV and lifted my head from my laptop to see what was wrong.)

The crew becomes “deaf” without their sound guy. The hearing audience becomes “deaf” as well when they are forced to follow the narrative by reading characters’ lips and the hand-written signs exchanged with a deaf man. It’s challenging for hearing viewers to read the hand-written notes while trying to read lips at the same time, and it’s in this moment of silent communication — assuming we can get past the characterization of the deaf man as a country bumpkin — that hearing viewers can potentially identify with deaf and hard-of-hearing viewers. The same potential exists in those fleeting moments when hearing viewers must read the lips of characters who silently mouth words.

Captioning censored sounds

When curse words need to be sanitized for television audiences, TV programs may take any number of approaches, including the use of a beep sound as a substitute for the profane word. But I’m interested here in the use of silence in place of the curse word, along with some indication in the captions that a word or sound has been crudely excised. In these cases, the soundtrack is fully silenced for the duration of the spoken curse word, and the caption track marks the silence in some way. (Warning: The following examples may be offensive or disturbing to some viewers.)

In the following clip from a 2004 episode of South Park (Season 8, episode 8: “Douche or Turd“), the lyrics to the “Vote or Die” song are sanitized for TV audiences. Audio is replaced with full silence for the duration of each offensive word, and a “bleep” is put in the caption track.

In the following clip from the 2007 movie The Mad, a series of curse words is similarly replaced with silence in order to clean the movie up for TV audiences. On the caption track, “deleted” is used to indicate that offensive audio was removed/silenced. (Again, this clip may be offensive.)

Redefining captioning

Even if we define closed captioning broadly to include “all sounds” (McKee 2006: 335), we still haven’t made room for the array of silences that may require captions depending on context. Captioners are not simply responsible for sound but must also account for our assumptions about how sound works and how it interacts with on-screen visuals. If viewers assume that a sound is indefinitely sustained (music, ring tones), then captioners may need to indicate when a sustained sound terminates (if it isn’t obvious). If viewers assume that moving lips signal audible speech or speech-like sounds, then captioners may need to indicate when moving lips are only mouthing words silently. When one of our basic assumptions about sound is challenged, the captioner may need to step in.

A broader, more accurate definition of captioning would thus take into account the contexts in which sounds occur and the assumptions that viewers make automatically about how different sounds are sustained.

[Fair use notice: The videos on this site are transformative works used in good faith, in keeping with Section 107 of U.S. copyright law, and as such constitute fair use of copyrighted material. Read this site’s full fair use notice.]