Automatic Speech Recognition (ASR)

spokestack.asr.spokestack.cloud_client module

This module contains the websocket logic used to communicate with Spokestack’s cloud-based ASR service.

exception spokestack.asr.spokestack.cloud_client.APIError(response)[source]

Spokestack api error pass through

Parameters

response (dict) – message from the api service

class spokestack.asr.spokestack.cloud_client.CloudClient(key_id, key_secret, socket_url='wss://api.spokestack.io', audio_format='PCM16LE', sample_rate=16000, language='en', limit=10, idle_timeout=None)[source]

Spokestack client for cloud based speech to text

Parameters
  • key_id (str) – identity from spokestack api credentials

  • key_secret (str) – secret key from spokestack api credentials

  • socket_url (str) – url for socket connection

  • audio_format (str) – format of input audio

  • sample_rate (int) – audio sample rate (kHz)

  • language (str) – language for recognition

  • limit (int) – Limit of messages per api response

  • idle_timeout (Any) – Time before client timeout. Defaults to None

connect()[source]

connects to websocket

Return type

None

disconnect()[source]

disconnects client socket connection

Return type

None

end()[source]

sends empty string in binary to indicate last frame

Return type

None

property idle_count

current counter of idle time

Return type

int

property idle_timeout

property for maximum idle time

Return type

Any

initialize()[source]

sends/receives the initial api request

Return type

None

property is_connected

status of the socket connection

Return type

bool

property is_final

status of most recent sever response

Return type

bool

receive()[source]

receives the api response

Return type

None

property response

current response message

Return type

dict

send(frame)[source]

sends a single frame of audio

Parameters

frame (np.ndarray) – segment of PCM-16 encoded audio

Return type

None

spokestack.asr.spokestack.speech_recognizer module

This module contains the recognizer for cloud based ASR in the speech pipeline

class spokestack.asr.spokestack.speech_recognizer.CloudSpeechRecognizer(spokestack_id='', spokestack_secret='', language='en', sample_rate=16000, frame_width=20, idle_timeout=5000, **kwargs)[source]

Speech recognizer for use in the speech pipeline

Parameters
  • spokestack_id (str) – identity under spokestack api credentials

  • spokestack_secret (str) – secret key from spokestack api credentials

  • language (str) – language recognized

  • sample_rate (int) – audio sample rate (kHz)

  • frame_width (int) – frame width of the audio (ms)

  • idle_timeout (int) – the number of iterations before the connection times out

close()[source]

closes client connection

Return type

None

reset()[source]

resets client connection

Return type

None

spokestack.asr.google.speech_recognizer module

This module contains the google asr speech recognizer

class spokestack.asr.google.speech_recognizer.GoogleSpeechRecognizer(language, credentials=None, sample_rate=16000, **kwargs)[source]

Transforms speech into text using Google’s ASR.

Parameters
  • language (str) – The language of given audio as a [BCP-47](https://www.rfc-editor.org/rfc/bcp/bcp47.txt) language tag. Example: “en-US”

  • credentials (Union[None, str, dict]) – Dictionary of Google API credentials or path to credentials. if set to None credentials will be pulled from the environment variable: GOOGLE_APPLICATION_CREDENTIALS

  • sample_rate (int) – sample rate of the input audio (Hz)

  • **kwargs (optional) – additional keyword arguments

close()[source]

closes recognizer

Return type

None

reset()[source]

resets recognizer

Return type

None

This module contains the Spokestack KeywordRecognizer which identifies multiple keywords from an audio stream.

class spokestack.asr.keyword.tflite.KeywordRecognizer(classes, pre_emphasis=0.97, sample_rate=16000, fft_window_type='hann', fft_hop_length=10, model_dir='', posterior_threshold=0.5, **kwargs)[source]

Recognizes keywords in an audio stream.

Parameters
  • classes (List[str]) – Keyword labels

  • pre_emphasis (float) – The value of the pre-emphasis filter

  • sample_rate (int) – The number of audio samples per second of audio (kHz)

  • fft_window_type (str) – The type of fft window. (only support for hann)

  • fft_hop_length (int) – Audio sliding window for STFT calculation (ms)

  • model_dir (str) – Path to the directory containing .tflite models

  • posterior_threshold (float) – Probability threshold for detection

close()[source]

Close interface for use in the SpeechPipeline

Return type

None

reset()[source]

Resets the current KeywordDetector state

Return type

None