This demo demonstrates Automatic Speech Recognition (ASR) with a pretrained Mozilla* DeepSpeech 0.6.1 model.
The application accepts
After computing audio features, running a neural network to get per-frame character probabilities, and CTC decoding, the demo prints the decoded text together with the timings of the processing stages.
The app depends on
ctcdecode_numpy Python* module, its installation is described below.
You can download and convert a pre-trained Mozilla* DeepSpeech 0.6.1 model with OpenVINO Model Downloader. This essentially boils down to the following commands:
Please pay attention to the model license, Mozilla Public License 2.0.
ASR performance depends heavily on beam width (a.k.a. beam size), which is the number of candidate strings maintained by beam search on each iteration. Using larger beam results in better recognition, but is slower. The demo depends on
ctcdecode_numpy Python module: it implements CTC decoding in C++ for faster decoding.
ctcdecode_numpy Python module either follow "Build the Native Python* Extension Modules", or install it with pip:
Create and activate a virtualenv, it you haven't already:
```shell virtualenv -p python3 –system-site-packages deepspeech-venv . deepspeech-venv/bin/activate ```
Build and install
ctcdecode_numpy Python module:
```shell cd ctcdecode-numpy/ python -m pip install . ```
Run the application with
-h option to see help message. Here are the essential options:
The typical command line is:
Only 16-bit, 16 kHz, mono-channel WAVE audio files are supported.
An example audio file can be taken from
The application shows the time taken by initialization and processing stages, and the decoded text for the audio file.