DL Workbench can lower the precision of a model from FP32 to INT8 with a process called calibration. Calibration accelerates the performance of certain models on hardware that supports INT8. An INT8 model takes up less memory footprint and speeds up inference time at the cost of a small reduction in accuracy.
When INT8 is selected from the Optimize tab, the DL Workbench automatically calibrates the selected model to INT8 by running a calibration procedure and then generating a calibrated version of the model.
NOTE: INT8 calibration is not available in the following cases:
- your project uses a generated dataset
- your project uses a model with Intermediate Representation (IR) versions lower than 10
- your model is already calibrated
- you run the project on a Intel® Processor Graphics, Intel® Movidius™ Neural Compute Stick 2, or Intel® Vision Accelerator Design with Intel® Movidius™ VPUs plugin
DL Workbench supports two calibration methods: maximum performance calibration (optimization method: Default) and maximum accuracy calibration (optimization method: AccuracyAware).
TIP: As a rule, the smaller the calibration subset, the less time the algorithms take. It is recommended to use at least a 3-5% subset of the validation dataset (300-1000 images).
NOTE: A model optimized by the Default method translates all layers that support INT8 execution into INT8 precision, while the AccuracyAware method translates only those layers that both can be executed in INT8 precision and almost do not affect increase accuracy drop.
Maximum performance calibration optimizes your model to achieve best performance. The algorithm usually produces the fastest model and usually but not always results in accuracy drop within 1%. Also, this algorithm takes less time than the AccuracyAware optimization method.
NOTE: This method supports both annotated and unannotated datasets. See Dataset Types for details.
Maximum accuracy calibration optimizes your model to achieve best performance possible with the specified maximum acceptable accuracy drop. Maximum accuracy calibration might result in lower performance compared to the Maximum performance calibration, while the accuracy drop is predictable. Accuracy drop is the difference between the original model accuracy and the optimized model accuracy. Accuracy of the optimized model is guaranteed to be not smaller than the difference between the original model accuracy and the accuracy drop.
NOTE: This method supports only annotated datasets. See Dataset Types for details.
Overall flow for converting a model from FP32 to INT8:
Use the links above to walk through the steps and workflow for creating a calibrated model. Topics specific only to the INT8 calibration process (steps 4-6) are described below.
Once a model has been profiled by the DL Workbench, you can convert it from FP32 to to INT8. For non-FP32 models, the INT8 option is grayed out. Go to the Optimizing tab on the Configurations page:
NOTE: Using INT8 calibration, you can tune only an original (top-level) model.
Check INT8 and click Optimize. It takes you to the Calibration Options page where you must select or import a calibration dataset, the percentage of images to use, and the optimization method.
NOTE: During the calibration process, a model tends to overfit the dataset its being calibrated on. To avoid overfitting, use separate datasets for calibration and validation.
Select a dataset you want to calibrate the model on, or import a calibration dataset by clicking Import:
Once you click Import, the Import Calibration Dataset page appears. Import a dataset, enter its name and click Import:
You are directed back to the Calibration Options page. Specify the percentage of images you will use during the calibration procedure in the Subset of Images box. The default value is 100%.
For the AccuracyAware option, specify the Maximum Accuracy Drop to instruct the application to only convert layers that do not exceed the maximum accuracy drop you want to sacrifice. If a layer is estimated to exceed this value, it is not calibrated and remains at the original precision.
Click Optimize. Specify the model usage and accuracy parameters in the new window:
NOTE: See Configure Accuracy Settings for details.
After you click Save, you are directed back to the previous window. Click Calibrate.
Click Calibrate, and a new row for your model appears.
Once the job is done, you can compare an optimized model with the original model. For more details, go to Compare Performance between Two Versions of Models.
The value of the outputPrecisions parameter in the Layer Name table for layers of INT8 optimized models is U8 (INT8 unsigned integer value):