16.9 Conversational Gestures For The Audio Desktop

By 1996, Emacspeak was the only piece of adaptive technology I used; in fall of 1995, I had moved to Adobe Systems from DEC Research to focus on enhancing the Portable Document Format (PDF) to make PDF content repurposable. Between 1996 and 1998, I was primarily focused on electronic document formats — I took this opportunity to step back and evaluate what I had built as an auditory interface within Emacspeak. This retrospect proved extremely useful in gaining a sense of perspective and led to formalizing the high-level concept of Conversational Gestures and structured browsing/searching as a means of thinking about user interfaces.

By now, Emacspeak was a complete environment — I formalized what it provided under the moniker Complete Audio Desktop. The fully integrated user experience allowed me to move forward with respect to defining interaction models that were highly optimized to eyes-free interaction — as an example, see how Emacspeak interfaces with modes like dired (Directory Editor) for browsing and manipulating the filesystem, or proced (Process Editor) for browsing and manipulating running processes. Emacs’ integration with ispell for spell checking, as well as its various completion facilities ranging from minibuffer completion to other forms of dynamic completion while typing text provided more opportunities for creating innovative forms of eyes-free interaction. With respect to what had gone before (and is still par for the course as far as traditional screen-readers are concerned), these types of highly dynamic interfaces present a challenge. For example, consider handling a completion interface using a screen-reader that is speaking the visual display. There is a significant challenge in deciding what to speak e.g., when presented with a list of completions, the currently typed text, and the default completion, which of these should you speak, and in what order? The problem gets harder when you consider that the underlying semantics of these items is generally not available from examining the visual presentation in a consistent manner. By having direct access to the underlying information being presented, Emacspeak had a leg up with respect to addressing the higher-level question — when you do have access to this information, how do you present it effectively in an eyes-free environment? For this and many other cases of dynamic interaction, a combination of audio formatting, auditory icons, and the ability to synthesize succinct messages from a combination of information items — rather than having to forcibly speak each item as it is rendered visually provided for highly efficient eyes-free interaction.

This was also when I stepped back to build out Emacspeak’s table browsing facilities — see the online Emacspeak documentation for details on Emacspeak’s table browsing functionality which continues to remain one of the richest collection of end-user affordances for working with two-dimensional data.