Podcasting 2.0: Towards an accessible Web, part 2

Sean Zdenek

Texas Tech University

Computers & Composition Online (Fall 2009)


  1. Introduction
  2. Limiting access in the Podcasting Bible
  3. A critique of “on the fly” podcasting (Part 1, 2)
  4. Podcasting 2.0: Towards an accessible Web (Part 1, 2)
  5. References
  6. About the author

My own approach to accessibility emphasizes bodily differences, web accessibility guidelines (especially W3C’s Web Content Accessibility Guidelines), and the growing wealth of accessibility tools (and video modding apps).

Start with the body

Mainstream discourse about podcasting rarely discusses the affordances of the body. It rarely makes explicit the minimum requirements for participating, at the level of embodiment, or the bodily differences among users and producers that threaten to exclude some people from profitably using web audio and video.

Instead, mainstream discourse about podcasting tends to assume a certain ideal body type — a hearing, seeing, speaking, flexible, mouse-moving (as opposed to keyboard-using) user. Because the minimum requirements for participating are assumed, those who write about podcasting are generally not aware of the need or importance of accommodating technology to users with disabilities.

What if our understanding of web audio and video was grounded on a deep awareness of the body and bodily difference? What if our mainstream discourses did not automatically assume that podcasting, in the absence of transcripts or other accommodations, was accessible to all users? What if we were committed to teasing apart the differences between accessibility and availability, instead of assuming that accessibility was equivalent to making it easy for an ideal user to download files?

In Disability and the Teaching of Writing (2008), Cynthia Lewiecki-Wilson & Brenda Jo Brueggemann ask us to reflect on the body and bodily difference in the context of writing instruction: “How can we better understand learning and writing as embodied practices, foregrounding bodily difference instead of demanding bodily perfection?” (3). Applied to web audio and video, this question urges us to consider the extent to which our understanding of podcasting (as reflected in discourse) is grounded on bodily perfection, the ways in which the body is absent from podcasting discourse, and whether our conceptions of users are normative (e.g. insofar as hearing is required for conformance).

Design for difference at the inception of a project

We need to think of podcasting not as “on the fly” but as situated in the lives of students with diverse abilities and needs. Rather than make our multi-modal texts accessible after the fact (it is illegal under ADA to make accommodations on an ad hoc or as needed basis), we need to design for accessibility at the inception of our Web projects. An accessible (universal) Web is good for everyone. “The underpinning principle of universal design is that in designing with disability in mind a better product will be developed that also better serves the needs of all users, including those who are not disabled” (Seale 2006: 83). To build an accessible Web, we need to immerse ourselves in accessibility standards, principally W3C’s Web Content Accessibility Guidelines (WCAG) and Section 508. WCAG is built on three levels of conformance: A (lowest), AA, and AAA (highest). It is beyond the scope of this text to provide a detailed review of guidelines; a number of good a number of good checklists (for 508 and WCAG 2.0) and comparisons are available online. Of particular interest to audio and video podcasters are the following guidelines: Provide a written transcript for audio-only content.

  • WCAG 2.0, Guideline 1.2.1: “An alternative for time-based media is provided that presents equivalent information for prerecorded audio-only content.” (Level A)
  • Section 508, 1194.22a: “A text equivalent for every non-text element shall be provided (e.g., via “alt”, “longdesc”, or in element content).”

Provide captions for prerecorded web video.

  • WCAG 2.0, Guideline 1.2.1: For prerecorded video-only media, “Either a text alternative or an audio track is provided that presents equivalent information.” (Level A)
  • WCAG 2.0, Guideline 1.2.2: Captions are provided for all prerecorded audio content in synchronized media, except when the media is a media alternative for text and is clearly labeled as such. (Level A)
  • Section 508, 1194.22b: “Equivalent alternatives for any multimedia presentation shall be synchronized with the presentation.”

Ensure all information is keyboard accessible

  • WCAG 2.0, Guideline 2.1.1: “All functionality of the content is operable through a keyboard interface without requiring specific timings for individual keystrokes.” (Level A)

Provide audio description of video content (i.e. all visual information that is not conveyed aurally must be described in a separate, synchronized audio track or long text description for screen reader users).

  • WCAG 2.0, Guideline 1.2.5: “Audio description is provided for all prerecorded video content in synchronized media.” (Level AA)

Surround yourself with solutions

Plenty of options are available for transcribing audio and captioning video. While some of these options fit the traditional view of web accessibility as time-consuming and expensive, others challenge that view.

Transcription companies. Transcription companies provide transcripts of audio content for a fee, with a turn-around time between twenty-four hours and a few days. In October 2006, Jeffrey Daniel Frey reviewed nine transcription services on his blog, ultimately recommending Casting Words. This company currently charges $1.50/minute with a guaranteed six-day turnaround. For a typical five-to-eight minute audio podcast, the cost would run $7.50 – $12.00.

Captioning companies. Captioning companies provide video captions for a fee. Examples include Automatic Sync Technologies, Visual Data Media Services, Video Caption Corporation, and NCI.

Voice recognition technology. With access to a well-trained version of Dragon Naturally Speaking (or possibly a web service like Jott), a podcaster can automate the process of creating a transcript from an audio file. In the case of podcast interviews, the podcaster can, at the completion of the interview, play the role of “shadow speaker” for the sole purpose of creating a transcript. A shadow speaker speaks the interviewee’s part with the voice recognition software enabled so that the software, which has been trained on the speaker’s voice, can more accurately transcribe the interviewee’s part. The original audio interview remains unchanged and the separate shadowed audio file is deleted once a written transcript has been acquired.

Captioning software. A number of software solutions are available for captioning web videos. Since every caption file is comprised of two main pieces of data (a time stamp and a corresponding text caption), software programs for captioning tend to be interchangeable and easy to master, especially after mastering one. Captioning can also be done pretty easily by hand (with a text editor), using basic XML-style markup. The process can be time consuming, however, even with the assistance of captioning software. Bill Creswell, whose blog is devoted to “captioning the internet one video at a time,” estimates that it takes one hour to caption three minutes of web video. Nevertheless, software solutions are easy to find and many are free, including Subs Factory, NCAM’s MAGpie and CC for Flash, and URUWorks Subtitle Workshop. Software programs for purchase include SynchriMedia’s MovCaptioner ($25), Manitu Group’s Captionate ($60), and Video Toolshed’s SubBits subtitler ($250).

Captioning through crowdsourcing. More and more video players on the web have built-in support for closed captioning (e.g. Google, Fox, Hulu, NBC). A major development in closed captioning on the web was Google’s announcement in September 2006 of support for closed captioning in its video player. Captioned content is also being provided by Project ReadOn, which is partnering with other sites (e.g. BarackObama.com, PetsAmerica.com) to provide caption streams for their videos. (The original videos continue to be hosted on the partner sites, with the ReadOn video player syncing captions with the original source content.) On the ReadOn site, users can suggest videos to be captioned, but users do not perform the actual captioning labor. A different, potentially more effective, model taps directly into the crowd of users for direction, feedback, and labor. On websites such as dotSUB and Overstream, users not only decide which videos to upload, transcribe or caption, but also perform the captioning work themselves using a simple but effective web interface. Web interfaces for captioning are essentially streamlined versions of stand-alone captioning software programs. With Overstream, the user imports a video from a supported video provider (e.g. Google, YouTube, etc.), and then uses the video editor to place a stream of captions or subtitles over the video. Like the videos on Project ReadOn, the original videos on Overstream continue to be hosted at the source site (e.g. YouTube), and users can create multiple overstreams of the same video (akin to multiple “plys” in BubblePLY). With dotSUB, the videos are uploaded and hosted by dotSUB, and users can view subtitles in multiple languages from a drop-down menu without changing streams/plys. While DotSUB was created to address the need for translation services in the global economy (see “About dotSUB”), both dotSUB and Overstream also serve the needs of deaf and hard of hearing caption users. Both also leverage the power of crowdsourcing by giving a very large crowd of web users a simple interface for transcribing and captioning video, and letting the crowd do the rest. Everyone benefits from the crowd’s collective labor, even though each member of the crowd may only make a very small contribution to building the site’s value.

VModding. The crowdsourcing model is also at work on a number of websites that allow users to annotate or modify existing videos with text, “bubbles,” or in-video tags: BubblePly, Veotag, Viddler, Jumpcut. Because these sites allow users to overlay text onto video, they have the potential to be used as captioning tools, even though they are not being marketed specifically as such. For example, Overstream “enabl[es] additional partially transparent dynamic content layers to be displayed over any live streaming content” (“About Overstream”). Moreover, Overstream aims to capitalize on the growing influence of vmodding: “Overstream.net heralds the arrival of a new type of video-related net community, that of video modifiers.” Tools for video modding and remixing, particularly when writing (as opposed to emoticons or graphical elements) is involved, need to be distinguished from YouTube’s support for annotations. YouTube’s annotations do not fully capitalize on the social affordances of Web 2.0 technologies. Whereas BubblePly, Viddler, and dotSUB allow anyone to add annotations or captions to any video, only authors can add annotations to YouTube videos. Viewer can turn YouTube annotations on and off, but they can not edit or add them (i.e. viewers are not video modifiers in the YouTube environment). Allowing anyone to annotate (or copy and annotate) any video would provide another means for viewers to comment on videos (in addition to writing text comments and authoring response videos) as well as potentially increase the number of captioned videos available on YouTube.

Next: References