This sample demonstrates how to build and execute inference in pipelined mode on example of classifications networks.
The pipelined mode might increase the throghput of the pictures. The latency of one inference will be the same as for syncronious execution.
The throughput is increased due to follow reasons:
When two and more devices are involved in inference process of one picture, creation of several infer requests and starting of asynchronious inference allows to utilize devices the most efficient way. If two devices are involved in execution, the most optimal value for -nireq option is 2 To do this efficiently, Classification Sample Async uses round-robin algorithm for infer requests. It starts execution for the current infer request and swith for the waiting of results for previous one. After finishing of wait, it switches infer requsts and repeat the procedure.
Another required aspect of seeing good throughput is number of iterations. Only having big number of iterations you can emulate the real application work and see performance
The batch mode is an independent attribute on the pipelined mode. Pipelined mode works efficiently with any batch size.
Running the application with the -h
option yields the following usage message:
Running the application with the empty list of options yields the usage message given above and an error message.
You can do inference on an image using a trained AlexNet network on FPGA with fallback to Intel® Processors using the following command:
NOTE: Before running the sample with a trained model, make sure the model is converted to the Inference Engine format (*.xml + *.bin) using the Model Optimizer tool.
By default the application outputs top-10 inference results for each infer request. In addition to this information it will provide throughput value measured in frames per seconds.
Upon the start-up the sample application reads command line parameters and loads a network and an image to the Inference Engine plugin. Then application creates several infer requests pointed in -nireq parameter and loads pictures for inference.
Then in the loop it starts inference for the current infer request and switch for waiting of another one. When results are ready, infer requests will be swapped.
When inference is done, the application outputs data to the standard output stream.