Executable Network

ExecutableNetwork class functionality:

  • Compile an InferenceEngine::ICNNNetwork instance to a backend specific graph representation
  • Create an arbitrary number of InferRequest objects
  • Hold some common resources shared between different instances of InferRequest. For example:
    • InferenceEngine::ExecutableNetworkInternal::_taskExecutor task executor to implement asynchronous execution
    • InferenceEngine::ExecutableNetworkInternal::_callbackExecutor task executor to run an asynchronous inference request callback in a separate thread

ExecutableNetwork Class

Inference Engine Plugin API provides the helper InferenceEngine::ExecutableNetworkThreadSafeDefault class recommended to use as a base class for an executable network. Based on that, a declaration of an executable network class can look as follows:

class ExecutableNetwork : public InferenceEngine::ExecutableNetworkThreadSafeDefault {
ExecutableNetwork(const std::shared_ptr<const ngraph::Function>& function,
const Configuration& cfg,
const std::shared_ptr<Plugin>& plugin);
ExecutableNetwork(std::istream& model,
const Configuration& cfg,
const std::shared_ptr<Plugin>& plugin);
~ExecutableNetwork() override = default;
// Methods from a base class ExecutableNetworkThreadSafeDefault
void ExportImpl(std::ostream& model) override;
InferenceEngine::OutputsDataMap networkOutputs) override;
InferenceEngine::Parameter GetMetric(const std::string &name) const override;
InferenceEngine::Parameter GetConfig(const std::string &name) const override;
friend class TemplateInferRequest;
void CompileNetwork(const std::shared_ptr<const ngraph::Function>& function);
void InitExecutor();
std::atomic<std::size_t> _requestId = {0};
Configuration _cfg;
std::shared_ptr<Plugin> _plugin;
std::shared_ptr<ngraph::Function> _function;
std::map<std::string, std::size_t> _inputIndex;
std::map<std::string, std::size_t> _outputIndex;
IInferencePlugin::Ptr _plugin
A pointer to a IInferencePlugin interface.
Definition: ie_executable_network_internal.hpp:140
virtual void ExportImpl(std::ostream &networkModel)
Exports an internal hardware-dependent model to a stream.
Definition: ie_executable_network_internal.hpp:128
Parameter GetMetric(const std::string &name) const override
Gets general runtime metric for dedicated hardware.
Definition: ie_executable_network_internal.hpp:112
Parameter GetConfig(const std::string &name) const override
Gets configuration dedicated to plugin behaviour.
Definition: ie_executable_network_internal.hpp:107
This class provides optimal thread safe default implementation. The class is recommended to be used a...
Definition: ie_executable_network_thread_safe_default.hpp:26
IInferRequest::Ptr CreateInferRequest() override
Given optional implementation of creating asynchronous inference request to avoid need for it to be i...
Definition: ie_executable_network_thread_safe_default.hpp:53
virtual InferRequestInternal::Ptr CreateInferRequestImpl(InputsDataMap networkInputs, OutputsDataMap networkOutputs)=0
Creates a synchronous inference request object used to infer the network.
std::shared_ptr< IInferRequest > Ptr
std::shared_ptr< InferRequestInternal > Ptr
A shared pointer to a InferRequestInternal implementation.
Definition: ie_infer_request_internal.hpp:37
std::map< std::string, InputInfo::Ptr > InputsDataMap
std::map< std::string, DataPtr > OutputsDataMap

Class Fields

The example class has several fields:

  • _requestId - Tracks a number of created inference requests, which is used to distinguish different inference requests during profiling via the Intel® Instrumentation and Tracing Technology (ITT) library.
  • _cfg - Defines a configuration an executable network was compiled with.
  • _plugin - Refers to a plugin instance.
  • _function - Keeps a reference to transformed ngraph::Function which is used in ngraph reference backend computations. Note, in case of other backends with backend specific graph representation _function has different type and represents backend specific graph or just a set of computational kernels to perform an inference.
  • _inputIndex - maps a name of input with its index among all network inputs.
  • _outputIndex - maps a name of output with its index among all network outputs.

ExecutableNetwork Constructor with ICNNNetwork

This constructor accepts a generic representation of a neural network as an InferenceEngine::ICNNNetwork reference and is compiled into a backend specific device graph:

TemplatePlugin::ExecutableNetwork::ExecutableNetwork(const std::shared_ptr<const ngraph::Function>& function,
const Configuration& cfg,
const Plugin::Ptr& plugin) :
InferenceEngine::ExecutableNetworkThreadSafeDefault(nullptr, nullptr), // Disable default threads creation
_plugin(plugin) {
// TODO: if your plugin supports device ID (more that single instance of device can be on host machine)
// you should select proper device based on KEY_DEVICE_ID or automatic behavior
// In this case, _waitExecutor should also be created per device.
try {
InitExecutor(); // creates thread-based executor using for async requests
} catch (const InferenceEngine::details::InferenceEngineException&) {
} catch (const std::exception & e) {
THROW_IE_EXCEPTION << "Standard exception from compilation library: " << e.what();
} catch (...) {
THROW_IE_EXCEPTION << "Generic exception is thrown";
Inference Engine Plugin API namespace.

The implementation CompileNetwork is fully device-specific.


The function accepts a const shared pointer to ngraph::Function object and performs the following steps:

  1. Applies ngraph passes using TransformNetwork function, which defines plugin-specific conversion pipeline.
  2. Maps the transformed graph to a backend specific graph representation (for example, to MKLDNN graph for Intel CPU).
  3. Allocates and fills memory for graph weights, backend specific memory handles and so on.
// forward declaration
std::shared_ptr<ngraph::Function> TransformNetwork(const std::shared_ptr<const ngraph::Function>& function);
void TemplatePlugin::ExecutableNetwork::CompileNetwork(const std::shared_ptr<const ngraph::Function>& function) {
// TODO: perform actual graph compilation / mapping to backend graph representation / kernels
// apply plugins transformations
_function = TransformNetwork(function);
// Generate backend specific blob mappings. For example Inference Engine uses not ngraph::Result nodes friendly name
// as inference request output names but the name of the layer before.
for (auto&& result : _function->get_results()) {
auto previousOutput = result->get_input_source_output(0);
auto outputName = previousOutput.get_node()->get_friendly_name();
if (previousOutput.get_node()->get_output_size() > 1) {
outputName += '.' + std::to_string(previousOutput.get_index());
_outputIndex.emplace(outputName, _function->get_result_index(result));
for (auto&& parameter : _function->get_parameters()) {
_inputIndex.emplace(parameter->get_friendly_name(), _function->get_parameter_index(parameter));
// Perform any other steps like allocation and filling backend specific memory handles and so on

NOTE: After all these steps, the backend specific graph is ready to create inference requests and perform inference.

ExecutableNetwork Constructor Importing from Stream

This constructor creates a backend specific graph by importing from a stream object:

NOTE: The export of backend specific graph is done in the ExportImpl method, and data formats must be the same for both import and export.

TemplatePlugin::ExecutableNetwork::ExecutableNetwork(std::istream & model,
const Configuration& cfg,
const Plugin::Ptr& plugin) :
_plugin(plugin) {
// read XML content
std::string xmlString;
std::uint64_t dataSize = 0;
model.read(reinterpret_cast<char*>(&dataSize), sizeof(dataSize));
model.read(const_cast<char*>(xmlString.c_str()), dataSize);
// read blob content
model.read(reinterpret_cast<char*>(&dataSize), sizeof(dataSize));
if (0 != dataSize) {
dataBlob = InferenceEngine::make_shared_blob<std::uint8_t>(
model.read(dataBlob->buffer(), dataSize);
// TODO: implement Import / Export of configuration options and merge with `cfg`
// TODO: implement Import / Export of network precisions, layouts, preprocessing info
auto cnnnetwork = _plugin->GetCore()->ReadNetwork(xmlString, std::move(dataBlob));
try {
InitExecutor(); // creates thread-based executor using for async requests
} catch (const InferenceEngine::details::InferenceEngineException&) {
} catch (const std::exception & e) {
THROW_IE_EXCEPTION << "Standard exception from compilation library: " << e.what();
} catch (...) {
THROW_IE_EXCEPTION << "Generic exception is thrown";
std::shared_ptr< Blob > Ptr


Implementation details:
Base InferenceEngine::ExecutableNetworkThreadSafeDefault class implements the public InferenceEngine::ExecutableNetworkThreadSafeDefault::Export method as following:

  • Writes _plugin->GetName() to the model stream.
  • Calls the ExportImpl method defined in a derived class to dump a backend specific graph.

The implementation of the method should write all data to the model stream, which is required to import a backend specific graph later in the Plugin::Import method:

void TemplatePlugin::ExecutableNetwork::ExportImpl(std::ostream& modelStream) {
OV_ITT_SCOPED_TASK(itt::domains::TemplatePlugin, "ExecutableNetwork::ExportImpl");
// Note: custom ngraph extensions are not supported
std::map<std::string, ngraph::OpSet> custom_opsets;
std::stringstream xmlFile, binFile;
ngraph::pass::Serialize serializer(xmlFile, binFile,
ngraph::pass::Serialize::Version::IR_V10, custom_opsets);
auto m_constants = binFile.str();
auto m_model = xmlFile.str();
auto dataSize = static_cast<std::uint64_t>(m_model.size());
modelStream.write(reinterpret_cast<char*>(&dataSize), sizeof(dataSize));
modelStream.write(m_model.c_str(), dataSize);
dataSize = static_cast<std::uint64_t>(m_constants.size());
modelStream.write(reinterpret_cast<char*>(&dataSize), sizeof(dataSize));
modelStream.write(reinterpret_cast<char*>(&m_constants[0]), dataSize);
// TODO: implement network precision, layout, preprocessing info serialization
Serialize transformation converts ngraph::Function into IR files.
Definition: serialize.hpp:29


The method creates an asynchronous inference request and returns it. While the public Inference Engine API has a single interface for inference request, which can be executed in synchronous and asynchronous modes, a plugin library implementation has two separate classes:

  • Synchronous inference request, which defines pipeline stages and runs them synchronously in the Infer method.
  • Asynchronous inference request, which is a wrapper for a synchronous inference request and can run a pipeline asynchronously. Depending on a device pipeline structure, it can has one or several stages:
    • For single-stage pipelines, there is no need to define this method and create a class derived from InferenceEngine::AsyncInferRequestThreadSafeDefault. For single stage pipelines, a default implementation of this method creates InferenceEngine::AsyncInferRequestThreadSafeDefault wrapping a synchronous inference request and runs it asynchronously in the _taskExecutor executor.
    • For pipelines with multiple stages, such as performing some preprocessing on host, uploading input data to a device, running inference on a device, or downloading and postprocessing output data, schedule stages on several task executors to achieve better device use and performance. You can do it by creating a sufficient number of inference requests running in parallel. In this case, device stages of different inference requests are overlapped with preprocessing and postprocessing stage giving better performance.

      IMPORTANT: It is up to you to decide how many task executors you need to optimally execute a device pipeline.

      InferenceEngine::IInferRequest::Ptr TemplatePlugin::ExecutableNetwork::CreateInferRequest() {
      auto internalRequest = CreateInferRequestImpl(_networkInputs, _networkOutputs);
      auto asyncThreadSafeImpl = std::make_shared<TemplateAsyncInferRequest>(std::static_pointer_cast<TemplateInferRequest>(internalRequest),
      _taskExecutor, _plugin->_waitExecutor, _callbackExecutor);
      asyncRequest.reset(new InferenceEngine::InferRequestBase(asyncThreadSafeImpl),
      [](InferenceEngine::IInferRequest *p) { p->Release(); });
      return asyncRequest;
      Inference request noexcept wrapper which accepts IAsyncInferRequestInternal derived instance which ca...
      Definition: ie_infer_async_request_base.hpp:24


This is a helper method used by CreateInferRequest to create a synchronous inference request, which is later wrapped with the asynchronous inference request class:

InferenceEngine::InferRequestInternal::Ptr TemplatePlugin::ExecutableNetwork::CreateInferRequestImpl(InferenceEngine::InputsDataMap networkInputs,
return std::make_shared<TemplateInferRequest>(networkInputs, networkOutputs, std::static_pointer_cast<ExecutableNetwork>(shared_from_this()));


Returns a metric value for a metric with the name name. A metric is a static type of information about an executable network. Examples of metrics:

  • EXEC_NETWORK_METRIC_KEY(NETWORK_NAME) - name of an executable network
  • EXEC_NETWORK_METRIC_KEY(OPTIMAL_NUMBER_OF_INFER_REQUESTS) - heuristic to denote an optimal (or at least sub-optimal) number of inference requests needed to run asynchronously to use the current device fully
  • Any other executable network metric specific for a particular device. Such metrics and possible values must be declared in a plugin configuration public header, for example, template/template_config.hpp
InferenceEngine::Parameter TemplatePlugin::ExecutableNetwork::GetMetric(const std::string &name) const {
// TODO: return more supported values for metrics
std::vector<std::string> configKeys = {
auto streamExecutorConfigKeys = InferenceEngine::IStreamsExecutor::Config{}.SupportedKeys();
for (auto&& configKey : streamExecutorConfigKeys) {
auto networkName = _function->get_friendly_name();
unsigned int value = _cfg._streamsExecutorConfig._streams;
} else {
THROW_IE_EXCEPTION << "Unsupported ExecutableNetwork metric: " << name;
#define IE_SET_METRIC_RETURN(name,...)
Return metric value with specified name and arguments .... Example:
Definition: ie_metric_helpers.hpp:52
#define METRIC_KEY(name)
#define CONFIG_KEY(name)
Defines IStreamsExecutor configuration.
Definition: ie_istreams_executor.hpp:50
std::vector< std::string > SupportedKeys()
Supported Configuration keys.

The IE_SET_METRIC_RETURN helper macro sets metric value and checks that the actual metric type matches a type of the specified value.


Returns a current value for a configuration key with the name name. The method extracts configuration values an executable network is compiled with.

InferenceEngine::Parameter TemplatePlugin::ExecutableNetwork::GetConfig(const std::string &name) const {
return _cfg.Get(name);

This function is the only way to get configuration values when a network is imported and compiled by other developers and tools (for example, the Compile tool).

The next step in plugin library implementation is the Synchronous Inference Request class.