This topic demonstrates how to use the Benchmark Application to estimate deep learning inference performance on supported devices. Performance can be measured for two inference modes: synchronous and asynchronous.
NOTE: This topic describes usage of C++ implementation of the Benchmark Application. For the Python* implementation, refer to Benchmark Application (Python*).
NOTE: To achieve benchmark results similar to the official published results, set CPU frequency to 2.9 GHz and GPU frequency to 1 GHz.
Upon start-up, the application reads command-line parameters and loads a network and images to the Inference Engine plugin, which is chosen depending on a specified device. The number of infer requests and execution approach depend on the mode defined with the -api
command-line parameter.
NOTE: By default, Inference Engine samples and demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with
--reverse_input_channels
argument specified. For more information about the argument, refer to When to Reverse Input Channels section of Converting a Model Using General Conversion Parameters.
If you run the application in the synchronous mode, it creates one infer request and executes the Infer
method. If you run the application in the asynchronous mode, it creates as many infer requests as specified in the -nireq
command-line parameter and executes the StartAsync
method for each of them.
The Wait
method is used to wait for a previous execution of an infer request to complete. A number of execution steps is defined by one of the two values:
-niter
command-line argument-niter
is not specified. Predefined duration value depends on device.During the execution, the application collects latency for each executed infer request.
Reported latency value is calculated as a median value of all collected latencies. Reported throughput value is reported in frames per second (FPS) and calculated as a derivative from:
Throughput value also depends on batch size.
The application also collects per-layer Performance Measurement (PM) counters for each executed infer request if you enable statistics dumping by setting the -report_type
parameter to one of the possible values:
no_counters
report includes configuration options specified, resulting FPS and latency.median_counters
report extends the no_counters
report and additionally includes median PM counters values for each layer from the network.detailed_counters
report extends the median_counters
report and additionally includes per-layer PM counters and latency for each executed infer request.Depending on the type, the report is stored to benchmark_no_counters_report.csv
, benchmark_median_counters_report.csv
, or benchmark_detailed_counters_report.csv
file located in the path specified in -report_folder
.
The application also saves executable graph information serialized to a XML file if you specify a path to it with the -exec_graph_path
parameter.
Running the application with the -h
option yields the following usage message:
Running the application with the empty list of options yields the usage message given above and an error message.
You can run the application for one input layer four-dimensional models that support images as input, for example, public AlexNet and GoogLeNet models. To download the pre-trained models, use the OpenVINO Model Downloader or go to https://download.01.org/opencv/.
NOTE: Before running the demo with a trained model, make sure the model is converted to the Inference Engine format (*.xml + *.bin) using the Model Optimizer tool.
For example, to perform inference on CPU in the synchronous mode and get estimated performance metrics for AlexNet model, run the following command:
For the asynchronous mode:
The application outputs latency and throughput. Additionally, if you set the -report_type
parameter, the application outputs statistics report. If you set -exec_graph_path
, the application reports executable graph information serialized. Progress bar shows the progress of each execution step:
All measurements including per-layer PM counters are reported in milliseconds.