The heterogeneous plugin enables computing for inference on one network on several devices. The purposes of executing networks in heterogeneous mode:
The execution through heterogeneous plugin can be divided to two independent steps:
These steps are decoupled. The setting of affinity can be done automatically using fallback policy or in manual mode.
The fallback automatic policy causes "greedy" behavior and assigns all layers that can be executed on certain device according to the priorities you specify (for example,
HETERO:GPU,CPU). Automatic policy does not take into account plugin peculiarities such as the inability to infer some layers without other special layers placed before or after that layer. The plugin is responsible for solving such cases. If the device plugin does not support the subgraph topology constructed by the Hetero plugin, then you should set affinity manually.
Some of the topologies are not friendly to heterogeneous execution on some devices or cannot be executed in such mode at all. Examples of such networks are networks having activation layers which are not supported on primary device. If transmitting data from one part of a network to another part in heterogeneous mode takes more time than in normal mode, it may not make sense to execute them in heterogeneous mode. In this case, you can define heaviest part manually and set the affinity to avoid sending data back and forth many times during one inference.
Default fallback policy decides which layer goes to which device automatically according to the support in dedicated plugins (FPGA, GPU, CPU, MYRIAD).
Another way to annotate a network is to set affinity manually using
ngraph::Node::get_rt_info with key
The fallback policy does not work if even one layer has an initialized affinity. The sequence should be calling of automating affinity settings and then fix manually.
NOTE: If you set affinity manually, be careful at the current moment Inference Engine plugins don't support constant (
Result) and empty (
Result) networks. Please avoid such subgraphs when you set affinity manually.
InferenceEngine::Core::QueryNetworkdoes not depend on affinities set by a user, but queries for layer support based on device capabilities.
During loading of the network to heterogeneous plugin, network is divided to separate parts and loaded to dedicated plugins. Intermediate blobs between these sub graphs are allocated automatically in the most efficient way.
Precision for inference in heterogeneous plugin is defined by
Samples can be used with the following command:
HETEROstands for heterogeneous plugin
FPGA,CPUpoints to fallback policy with priority on FPGA and fallback to CPU
You can point more than two devices:
After enabling of
KEY_HETERO_DUMP_GRAPH_DOT config key, you can dump GraphViz*
.dot files with annotations of devices per layer.
Heterogeneous plugin can generate two files:
hetero_affinity_<network name>.dot- annotation of affinities per layer. This file is written to the disk only if default fallback policy was executed
hetero_subgraphs_<network name>.dot- annotation of affinities per graph. This file is written to the disk during execution of
ICNNNetwork::LoadNetwork()for heterogeneous plugin
You can use GraphViz* utility or converters to
.png formats. On Ubuntu* operating system, you can use the following utilities:
sudo apt-get install xdot
You can use performance data (in samples, it is an option
-pc) to get performance data on each subgraph.
Here is an example of the output: for Googlenet v1 running on FPGA with fallback to CPU: