Text to Speech (TTS)

Clients

Manager

This module contains the Spokestack text to speech manager which handles a text to speech client, decodes the returned audio, and writes the audio to the specified output.

class spokestack.tts.manager.SequenceIO(sequence)[source]

Wrapper that allows for incrementally received audio to be decoded.

class spokestack.tts.manager.TextToSpeechManager(client, output, format_='mp3')[source]

Manages tts client and io target.

Parameters
  • client (Any) – Text to speech client that returns encoded mp3 audio

  • output (Any) – Audio io target

  • format – Audio format, one of FORMAT_MP3 or FORMAT_PCM16

close()[source]

Closes the client and output.

Return type

None

synthesize(utterance, mode='text', voice='demo-male', profile='default')[source]

Synthesizes the given utterance with the voice and format provided.

Text can be formatted as plain text (mode=”text”), SSML (mode=”ssml”), or Speech Markdown (mode=”markdown”).

This method also supports different formats for the synthesized audio via the profile argument. The supported profiles and their associated formats are:

Parameters
  • utterance (str) – string that needs to be rendered as speech.

  • mode (str) – synthesis mode to use with utterance. text, ssml, markdown, etc.

  • voice (str) – name of the tts voice.

  • profile (str) – name of the audio profile used to create the resulting stream.

Return type

None

TTS-Lite

Spokestack-Lite Speech Synthesizer

This module contains the SpeechSynthesizer class used to convert text to speech using local TTS models trained on the Spokestack platform. A SpeechSynthesizer instance can be passed to the TextToSpeechManager for playback.

Example

This example assumes that a TTS model was downloaded from the Spokestack platform and extracted to the model directory.

from spokestack.io.pyaudio import PyAudioOutput
from spokestack.tts.manager import TextToSpeechManager, FORMAT_PCM16
from spokestack.tts.lite import SpeechSynthesizer, BLOCK_LENGTH, SAMPLE_RATE

tts = TextToSpeechManager(
    SpeechSynthesizer("./model"),
    PyAudioOutput(sample_rate=SAMPLE_RATE, frames_per_buffer=BLOCK_LENGTH),
    format_=FORMAT_PCM16)

tts.synthesize("Hello world!")
class spokestack.tts.lite.SpeechSynthesizer(model_path)[source]

Initialize a new lightweight speech synthesizer

Parameters

model_path (str) – Path to the extracted TTS model downloaded from the Spokestack platform

synthesize(utterance, *_args, **_kwargs)[source]

Synthesize a text utterance to speech audio

Parameters

utterance (str) – The text string to synthesize

Returns

A generator for returns a sequence of PCM-16 numpy audio blocks for playback, storage, etc.

Return type

Iterator[np.array]