Next: , Previous: , Up: Top   [Contents][Index]

14 TTS Servers

Emacspeak produces spoken output by communicating with one of many speech servers. This section documents the communication protocol between the client application i.e. Emacspeak, and the TTS server. This section is primarily intended for developers wishing to: For additional notes on how to log and view TTS server commands when developing a speech server, see

14.1 High-level Overview

The TTS server reads commands from standard input, and script speech-server can be used to cause a TTS server to communicate via a TCP socket. Speech server commands are used by the client application to make specific requests of the server; the server listens for these requests in a non-blocking read loop and executes requests as they become available. Requests can be classified as follows:

All commands are of the form

        commandWord {arguments}

The braces are optional if the command argument contains no white space. The speech server maintains a current state that determines various characteristics of spoken output such as speech rate, punctuations mode etc. (see set of commands that manipulate speech state for complete list). The client application queues The text and non-speech audio output to be produced before asking the server to dispatch the set of queued requests, i.e. start producing output.

Once the server has been asked to produce output, it removes items from the front of the queue, sends the requisite commands to the underlying TTS engine, and waits for the engine to acknowledge that the request has been completely processed. This is a non-blocking operation, i.e., if the client application generates additional requests, these are processed immediately.

The above design allows the Emacspeak TTS server to be highly responsive; Client applications can queue large amounts of text (typically queued a clause at a time to achieve the best prosody), ask the TTS server to start speaking, and interrupt the spoken output at any time.

14.1.1 Commands That Queue Output.

This section documents commands that either produce spoken output, or queue output to be produced on demand. Commands that place the request on the queue are clearly marked.


Speaks the version of the TTS engine. Produces output immediately.

        tts_say text 

Speaks the specified text immediately. The text is not pre-processed in any way, contrast this with the primary way of speaking text which is to queue text before asking the server to process the queue.

Note that this command needs to handle the special syntax for morpheme boundaries ‘[*]’. The ‘[*]’ syntax is specific to the Dectalk family of synthesizers; servers for other TTS engines need to map this pattern to the engine-specific code for each engine. As an example, see ‘servers/outloud’ A morpheme boundary results in synthesizing compound words such as left bracket with the right intonation; using a space would result in that phrase being synthesized as two separate words.

l c

Speak c a single character, as a letter. The character is spoken immediately. This command uses the TTS engine’s capability to speak a single character with the ability to flush speech immediately. Client applications wishing to produce character-at-a-time output, e.g., when providing character echo during keyboard input should use this command.


This command is used to dispatch all queued requests. It was renamed to a single character command (like many of the commonly used TTS server commands) to work more effectively over slow (9600) dialup lines. The effect of calling this command is for the TTS server to start processing items that have been queued via earlier requests.


This pauses speech immediately. It does not affect queued requests; when command tts_resume is called, the output resumes at the point where it was paused. Not all TTS engines provide this capability.


Resume spoken output if it has been paused earlier.


Stop speech immediately. Spoken output is interrupted, and all pending requests are flushed from the queue.

        q text

Queues text to be spoken. No spoken output is produced until a dispatch request is received via execution of command d.

        c codes

Queues synthesis codes to be sent to the TTS engine. Codes are sent to the engine with no further transformation or processing. The codes are inserted into the output queue and will be dispatched to the TTS engine at the appropriate point in the output stream.

        a filename

Cues the audio file identified by filename for playing.

        t freq length

Queues a tone to be played at the specified frequency and having the specified length. Frequency is specified in hertz and length is specified in milliseconds.

        sh duration

Queues the specified duration of silence. Silence is specified in milliseconds.

14.1.2 Commands That Set State


Reset TTS engine to default settings.

        tts_set_punctuations mode

Sets TTS engine to the specified punctuation mode. Typically, TTS servers provide at least three modes:

        tts_set_speech_rate rate

Sets speech rate. The interpretation of this value is typically engine specific.

        tts_set_character_scale factor

Scale factor applied to speech rate when speaking individual characters.Thus, setting speech rate to 500 and character scale to 1.2 will cause command l to use a speech rate of 500 * 1.2 = 600.

        tts_split_caps flag

Set state of split caps processing. Turn this on to speak mixed-case (AKA Camel Case) identifiers.

        tts_capitalize flag

Indicate capitalization via a beep tone or voice pitch.

        tts_allcaps_beep flag

Setting this flag produces a high-pitched beep when speaking words that are in all-caps, e.g. abbreviations.

Next: , Previous: , Up: Top   [Contents][Index]