Asynchronous Inference Request runs an inference pipeline asynchronously in one or several task executors depending on a device pipeline structure. Inference Engine Plugin API provides the base InferenceEngine::AsyncInferRequestThreadSafeDefault class:
std::vector<std::pair<ITaskExecutor::Ptr, Task> >, which contains pairs of an executor and executed task.
_pipelineto finish in a class destructor. The method does not stop task executors and they are still in the running stage, because they belong to the executable network instance and are not destroyed.
Inference Engine Plugin API provides the base InferenceEngine::AsyncInferRequestThreadSafeDefault class for a custom asynchronous inference request implementation:
_inferRequest- a reference to the synchronous inference request implementation. Its methods are reused in the
AsyncInferRequestconstructor to define a device pipeline.
_waitExecutor- a task executor that waits for a response from a device about device tasks completion
NOTE: If a plugin can work with several instances of a device,
_waitExecutormust be device-specific. Otherwise, having a single task executor for several devices does not allow them to work in parallel.
The main goal of the
AsyncInferRequest constructor is to define a device pipeline
_pipeline. The example below demonstrates
_pipeline creation with the following stages:
inferPreprocessis a CPU compute task.
startPipelineis a CPU ligthweight task to submit tasks to a remote device.
waitPipelineis a CPU non-compute task that waits for a response from a remote device.
inferPostprocessis a CPU compute task.
The stages are distributed among two task executors in the following way:
startPipelineare combined into a single task and run on
_requestExecutor, which computes CPU tasks.
waitPipelineis sent to
_waitExecutor, which works with the device.
callbackExecutoris also passed to the constructor and it is used in the base InferenceEngine::AsyncInferRequestThreadSafeDefault class, which adds a pair of
callbackExecutorand a callback function set by the user to the end of the pipeline.
Inference request stages are also profiled using IE_PROFILING_AUTO_SCOPE, which shows how pipelines of multiple asynchronous inference requests are run in parallel via the Intel® VTune™ Profiler tool.
In the asynchronous request destructor, it is necessary to wait for a pipeline to finish. It can be done using the InferenceEngine::AsyncInferRequestThreadSafeDefault::StopAndWait method of the base class.