The sample works with Kaldi ARK or Numpy* uncompressed NPZ files, so it does not cover an end-to-end speech recognition scenario (speech to text), requiring additional preprocessing (feature extraction) to get a feature vector from a speech signal, as well as postprocessing (decoding) to produce text from scores.
Automatic Speech Recognition Python sample application demonstrates how to use the following Inference Engine Python API in applications:
|Import/Export Model||IECore.import_network, ExecutableNetwork.export||The GNA plugin supports loading and saving of the GNA-optimized model|
|Network Operations||IENetwork.batch_size, CDataPtr.shape, ExecutableNetwork.input_info, ExecutableNetwork.outputs||Managing of network: configure input and output blobs|
|Network Operations||IENetwork.add_outputs||Managing of network: Change names of output layers in the network|
|InferRequest Operations||InferRequest.query_state, VariableState.reset||Gets and resets state control interface for given executable network|
Basic Inference Engine API is covered by Hello Classification Python* Sample.
|Validated Models||Acoustic model based on Kaldi* neural networks (see Model Preparation section)|
|Model Format||Inference Engine Intermediate Representation (.xml + .bin)|
|Supported devices||See Execution Modes section below and List Supported Devices|
|Other language realization||C++|
At startup, the sample application reads command-line parameters, loads a specified model and input data to the Inference Engine plugin, performs synchronous inference on all speech utterances stored in the input file, logging each step in a standard output stream.
You can see the explicit description of each sample step at Integration Steps section of "Integrate the Inference Engine with Your Application" guide.
If the GNA device is selected (for example, using the
-d GNA flag), the GNA Inference Engine plugin quantizes the model and input feature vector sequence to integer representation before performing inference.
-qb flag provides a hint to the GNA plugin regarding the preferred target weight resolution for all layers.
For example, when
-qb 8 is specified, the plugin will use 8-bit weights wherever possible in the network.
- It is not always possible to use 8-bit weights due to GNA hardware limitations. For example, convolutional layers always use 16-bit weights (GNA hardware version 1 and 2). This limitation will be removed in GNA hardware version 3 and higher.
Several execution modes are supported via the
CPU- All calculation are performed on CPU device using CPU Plugin.
GPU- All calculation are performed on GPU device using GPU Plugin.
MYRIAD- All calculation are performed on Intel® Neural Compute Stick 2 device using VPU MYRIAD Plugin.
GNA_AUTO- GNA hardware is used if available and the driver is installed. Otherwise, the GNA device is emulated in fast-but-not-bit-exact mode.
GNA_HW- GNA hardware is used if available and the driver is installed. Otherwise, an error will occur.
GNA_SW- Deprecated. The GNA device is emulated in fast-but-not-bit-exact mode.
GNA_SW_FP32- Substitutes parameters and calculations from low precision to floating point (FP32).
GNA_SW_EXACT- GNA device is emulated in bit-exact mode.
The GNA plugin supports loading and saving of the GNA-optimized model (non-IR) via the
Thereby, it is possible to avoid the cost of full model quantization at run time.
In addition to performing inference directly from a GNA model file, this option makes it possible to:
Run the application with the
-h option to see the usage message:
You can use the following model optimizer command to convert a Kaldi nnet1 or nnet2 neural network to Inference Engine Intermediate Representation format:
The following pre-trained models are available:
All of them can be downloaded from https://storage.openvinotoolkit.org/models_contrib/speech/2021.2.
You can do inference on Intel® Processors with the GNA co-processor (or emulation library):
- Before running the sample with a trained model, make sure the model is converted to the Inference Engine format (*.xml + *.bin) using the Model Optimizer tool.
- The sample supports input and output in numpy file format (.npz)
The sample application logs each step in a standard output stream.