Emacspeak produces spoken output by communicating with one of many speech servers. This section documents the communication protocol between the client application i.e. Emacspeak, and the TTS server. This section is primarily intended for developers wishing to: For additional notes on how to log and view TTS server commands when developing a speech server, see http://emacspeak.blogspot.com/2015/04/howto-log-speech-server-output-to-aid.html.
The TTS server reads commands from standard input, and script speech-server can be used to cause a TTS server to communicate via a TCP socket. Speech server commands are used by the client application to make specific requests of the server; the server listens for these requests in a non-blocking read loop and executes requests as they become available. Requests can be classified as follows:
All commands are of the form
The braces are optional if the command argument contains no white space. The speech server maintains a current state that determines various characteristics of spoken output such as speech rate, punctuations mode etc. (see set of commands that manipulate speech state for complete list). The client application queues The text and non-speech audio output to be produced before asking the server to dispatch the set of queued requests, i.e. start producing output.
Once the server has been asked to produce output, it removes items from the front of the queue, sends the requisite commands to the underlying TTS engine, and waits for the engine to acknowledge that the request has been completely processed. This is a non-blocking operation, i.e., if the client application generates additional requests, these are processed immediately.
The above design allows the Emacspeak TTS server to be highly responsive; Client applications can queue large amounts of text (typically queued a clause at a time to achieve the best prosody), ask the TTS server to start speaking, and interrupt the spoken output at any time.
This section documents commands that either produce spoken output, or queue output to be produced on demand. Commands that place the request on the queue are clearly marked.
Speaks the version of the TTS engine. Produces output immediately.
Speaks the specified text immediately. The text is not pre-processed in any way, contrast this with the primary way of speaking text which is to queue text before asking the server to process the queue.
Note that this command needs to handle the special syntax for morpheme boundaries ‘[*]’. The ‘[*]’ syntax is specific to the Dectalk family of synthesizers; servers for other TTS engines need to map this pattern to the engine-specific code for each engine. As an example, see ‘servers/outloud’ A morpheme boundary results in synthesizing compound words such as left bracket with the right intonation; using a space would result in that phrase being synthesized as two separate words.
Speak c a single character, as a letter. The character is spoken immediately. This command uses the TTS engine’s capability to speak a single character with the ability to flush speech immediately. Client applications wishing to produce character-at-a-time output, e.g., when providing character echo during keyboard input should use this command.
This command is used to dispatch all queued requests. It was renamed to a single character command (like many of the commonly used TTS server commands) to work more effectively over slow (9600) dialup lines. The effect of calling this command is for the TTS server to start processing items that have been queued via earlier requests.
This pauses speech immediately. It does not affect queued requests; when command tts_resume is called, the output resumes at the point where it was paused. Not all TTS engines provide this capability.
Resume spoken output if it has been paused earlier.
Stop speech immediately. Spoken output is interrupted, and all pending requests are flushed from the queue.
Queues text to be spoken. No spoken output is produced until a dispatch request is received via execution of command d.
Queues synthesis codes to be sent to the TTS engine. Codes are sent to the engine with no further transformation or processing. The codes are inserted into the output queue and will be dispatched to the TTS engine at the appropriate point in the output stream.
Cues the audio file identified by filename for playing.
t freq length
Queues a tone to be played at the specified frequency and having the specified length. Frequency is specified in hertz and length is specified in milliseconds.
Queues the specified duration of silence. Silence is specified in milliseconds.
Reset TTS engine to default settings.
Sets TTS engine to the specified punctuation mode. Typically, TTS servers provide at least three modes:
Sets speech rate. The interpretation of this value is typically engine specific.
Scale factor applied to speech rate when speaking individual characters.Thus, setting speech rate to 500 and character scale to 1.2 will cause command l to use a speech rate of 500 * 1.2 = 600.
Set state of split caps processing. Turn this on to speak mixed-case (AKA Camel Case) identifiers.
Indicate capitalization via a beep tone or voice pitch.
Setting this flag produces a high-pitched beep when speaking words that are in all-caps, e.g. abbreviations.