Automatic Speech Recognition (ASR)¶

spokestack.asr.spokestack.cloud_client module¶

This module contains the websocket logic used to communicate with Spokestack’s cloud-based ASR service.

exception spokestack.asr.spokestack.cloud_client.APIError(response)[source]¶

Spokestack api error pass through

Parameters: response (dict) – message from the api service

class spokestack.asr.spokestack.cloud_client.CloudClient(key_id, key_secret, socket_url='wss://api.spokestack.io', audio_format='PCM16LE', sample_rate=16000, language='en', limit=10, idle_timeout=None)[source]¶

Spokestack client for cloud based speech to text

Parameters

key_id (str) – identity from spokestack api credentials
key_secret (str) – secret key from spokestack api credentials
socket_url (str) – url for socket connection
audio_format (str) – format of input audio
sample_rate (int) – audio sample rate (kHz)
language (str) – language for recognition
limit (int) – Limit of messages per api response
idle_timeout (Any) – Time before client timeout. Defaults to None

connect()[source]¶

connects to websocket

Return type: None

disconnect()[source]¶

disconnects client socket connection

Return type: None

end()[source]¶

sends empty string in binary to indicate last frame

Return type: None

property idle_count¶

current counter of idle time

Return type: int

property idle_timeout¶

property for maximum idle time

Return type: Any

initialize()[source]¶

sends/receives the initial api request

Return type: None

property is_connected¶

status of the socket connection

Return type: bool

property is_final¶

status of most recent sever response

Return type: bool

receive()[source]¶

receives the api response

Return type: None

property response¶

current response message

Return type: dict

send(frame)[source]¶

sends a single frame of audio

Parameters: frame (np.ndarray) – segment of PCM-16 encoded audio
Return type: None

spokestack.asr.spokestack.speech_recognizer module¶

This module contains the recognizer for cloud based ASR in the speech pipeline

class spokestack.asr.spokestack.speech_recognizer.CloudSpeechRecognizer(spokestack_id='', spokestack_secret='', language='en', sample_rate=16000, frame_width=20, idle_timeout=5000, **kwargs)[source]¶

Speech recognizer for use in the speech pipeline

Parameters

spokestack_id (str) – identity under spokestack api credentials
spokestack_secret (str) – secret key from spokestack api credentials
language (str) – language recognized
sample_rate (int) – audio sample rate (kHz)
frame_width (int) – frame width of the audio (ms)
idle_timeout (int) – the number of iterations before the connection times out

close()[source]¶

closes client connection

Return type: None

reset()[source]¶

resets client connection

Return type: None

spokestack.asr.google.speech_recognizer module¶

This module contains the google asr speech recognizer

class spokestack.asr.google.speech_recognizer.GoogleSpeechRecognizer(language, credentials=None, sample_rate=16000, **kwargs)[source]¶

Transforms speech into text using Google’s ASR.

Parameters

language (str) – The language of given audio as a [BCP-47](https://www.rfc-editor.org/rfc/bcp/bcp47.txt) language tag. Example: “en-US”
credentials (Union[None, str, dict]) – Dictionary of Google API credentials or path to credentials. if set to None credentials will be pulled from the environment variable: GOOGLE_APPLICATION_CREDENTIALS
sample_rate (int) – sample rate of the input audio (Hz)
**kwargs (optional) – additional keyword arguments

close()[source]¶

closes recognizer

Return type: None

reset()[source]¶

resets recognizer

Return type: None

This module contains the Spokestack KeywordRecognizer which identifies multiple keywords from an audio stream.

class spokestack.asr.keyword.tflite.KeywordRecognizer(classes, pre_emphasis=0.97, sample_rate=16000, fft_window_type='hann', fft_hop_length=10, model_dir='', posterior_threshold=0.5, **kwargs)[source]¶

Recognizes keywords in an audio stream.

Parameters

classes (List[str]) – Keyword labels
pre_emphasis (float) – The value of the pre-emphasis filter
sample_rate (int) – The number of audio samples per second of audio (kHz)
fft_window_type (str) – The type of fft window. (only support for hann)
fft_hop_length (int) – Audio sliding window for STFT calculation (ms)
model_dir (str) – Path to the directory containing .tflite models
posterior_threshold (float) – Probability threshold for detection

close()[source]¶

Close interface for use in the SpeechPipeline

Return type: None

reset()[source]¶

Resets the current KeywordDetector state

Return type: None