NeMo project provides the QuartzNet model.
To download the pre-trained model, refer to the NeMo Speech Models Catalog. Here are the instructions on how to obtain QuartzNet in ONNX* format.
This code produces 3 ONNX* model files:
qn.onnx. They are
encoder and a combined
decoder(encoder(x)) models, respectively.
If using a combined model:
If using separate models:
Where shape is determined by the audio file Mel-Spectrogram length: B - batch dimension, X - dimension based on the input length, Y - determined by encoder output, usually
X / 2.