Recently, Bill Creswell pointed out that the trailers for two forthcoming movies — Inception and Avatar — are wordless and thus don’t need to be captioned. (For the record, the Avatar trailer I watched included three spoken words: “This is great!”) But both trailers make use of some pretty intense background music. Bill’s tweet got me thinking about ways of visualizing music or background sounds in addition to the traditional way of using lyrics, a summary (e.g. “[slow ballad]”), or nothing at all.
I don’t mean to suggest that captioned music lyrics are insufficient by nature. Captioned lyrics are often adequate. But lyric-less music often needs to be acknowledged as something more than “[pop ballad].” And what about music lyrics that can’t be captioned because there’s not enough space for them in an already cramped caption field? What about other (non-musical) sounds that can’t be adequately translated into words?
Closed captioners should be exploring alternative means of visualizing music. How can we design closed captions that leverage the full resources of multimedia, i.e. the full range of textual, typographic, graphic, animated, kinetic, semiotic, and linguistic resources for translating sounds into visuals? It’s no secret that captions have a long way to go. Innovation in captioning technology and technique is desperately needed. According to Rashid et al. (2008), captioning technology “has suffered from a lack of innovation since its inception in the early 1970s” (Rashid, R., Q. Vy, R. Hunt & D. Fels  “Dancing with Words: Using Animated Text for Captioning,” International Journal of Human-Computer Interaction, 24: 505-519):
Closed captioning currently provides only the verbatim equivalent of the dialogue and ignores most nonverbal information such as music, sound effects, and intonation. Much of this missing sound information is used to establish mood, create ambiance, and complete the television and film viewing experience. Translating this sound information into an alternative representation or interpretation in a different modality such as a visual or tactile can be a creative process in itself — an activity not usually associated with captioning or captioners. (p. 505)
Rashid et al. explore the potential of “enhanced” captions — i.e. word captions that are animated and kinetic — “to represent emotions contained in music and speech as well as sound effects” (p. 505).
Let me offer an example of visualized music to complement Rashid et al.’s animated word captions. In the teaser trailer for Inception, the lyric-less music track begs to be acknowledged on the caption layer. The music track is akin to a character — and the only one in the trailer who “speaks.” The music pounds, rhythmically, and builds to an intense climax that mirrors the intensity of the film’s subject matter (even if it’s not clear what the movie is about).
This text will be replaced
This text will be replaced
If the music in a movie, clip, or video doesn’t contain lyrics that can be displayed as word captions, then we need to consider alternative ways of visualizing sound. Although we risk creating enhanced captions that are distracting or not useful to users, it’s a risk we need to take as we engage the process of innovating caption technology and technique.
[Fair use notice: The videos on this site are transformative works used in good faith, in keeping with Section 107 of U.S. copyright law, and as such constitute fair use of copyrighted material. Read this site’s full fair use notice.]