Speech Recognition
Xircuits Speech Recognition Project Template
This template allows you to train a Tensorflow speech recognition model, using a mini version of the speech_commands dataset.
It consists of the components listed below:
Dataset preparation: this section handles the dataset used in this template through multiple components.
DownloadDataset: Mini version of speech commands dataset.ExtractAudioFilesAndLabels: Extract the audio files from each folder & labels from the folder's name.
Preprocessing dataset: Preparing dataset to be fed into the model.
AudioToTensors: Decode the audio .wav file into waveforms.WaveformsToSpectrograms: Convert the waveforms to spectrogram to be fed into the model.PlotSpectrogram: Visualize spectrogram.SplitData: Split the dataset into training, validation and testing set.
Model training: build and compile the model for training.
BuildSpeechModel: building a simple network model.CompileSpeechModel: compile the model with the chosen optimizer.TrainSpeechModel: training and validating the model with the defined epoch number.PlotSpeechMetrics: evaluate training performance, by plotting the training loss and accuracy against the number of training epochs.EvaluateSpeechModel: determine the model accuracy based on the testing dataset, and able to view the confusion matrix.SaveSpeechModel: save model in keras or tensorflow format.ConvertSpeechTFModelToOnnx: convert TF model to onnx model to be used in other platforms.
Inference components: To predict the text of the speech from an audio file.
LoadModel: Load the trained Tensorflow model.LoadAudioFile: Load an audio file and preprocess for prediction.PredictSpeech: Predict the text.
Silero inference components: To predict the text of the speech using pretrained Silero models.
SileroModelInference: Predict the text of audio file using Silero model.
Prerequisites
You will need Python 3.9+.
Installation
- Clone this repository
- Create virtual environments and install the required python packages.
pip install -r requirements.txt
- Run xircuits from the root directory
xircuits
Workflow in this Template
SpeechRecognition.xircuits
- In this template, we used the perform a simple speech recognition. You can further fine tune the model by modifying the hyperparameters.

Inference.xircuits
- Predicts the speech from an audio file and outputs the probability of the prediction.

SileroInference.xircuits
- Predicts the speech from an audio file and outputs the text prediction.

Future work
- Train model on complex speech dataset.