Executable Network

ExecutableNetwork class functionality:

  • Compile an InferenceEngine::ICNNNetwork instance to a backend specific graph representation
  • Create an arbitrary number of InferRequest objects
  • Hold some common resources shared between different instances of InferRequest. For example:
    • InferenceEngine::IExecutableNetworkInternal::_taskExecutor task executor to implement asynchronous execution
    • InferenceEngine::IExecutableNetworkInternal::_callbackExecutor task executor to run an asynchronous inference request callback in a separate thread

ExecutableNetwork Class

Inference Engine Plugin API provides the helper InferenceEngine::ExecutableNetworkThreadSafeDefault class recommended to use as a base class for an executable network. Based on that, a declaration of an executable network class can look as follows:

class ExecutableNetwork : public InferenceEngine::ExecutableNetworkThreadSafeDefault {
public:
ExecutableNetwork(const std::shared_ptr<const ngraph::Function>& function, const InferenceEngine::InputsDataMap& inputInfoMap,
const InferenceEngine::OutputsDataMap& outputsInfoMap, const Configuration& cfg, const std::shared_ptr<Plugin>& plugin);
ExecutableNetwork(std::istream& model, const Configuration& cfg, const std::shared_ptr<Plugin>& plugin);
// Methods from a base class ExecutableNetworkThreadSafeDefault
void Export(std::ostream& model) override;
InferenceEngine::OutputsDataMap networkOutputs) override;
InferenceEngine::Parameter GetMetric(const std::string& name) const override;
InferenceEngine::Parameter GetConfig(const std::string& name) const override;
private:
friend class TemplateInferRequest;
void CompileNetwork(const std::shared_ptr<const ngraph::Function>& function, const InferenceEngine::InputsDataMap& inputInfoMap,
const InferenceEngine::OutputsDataMap& outputsInfoMap);
void InitExecutor();
std::atomic<std::size_t> _requestId = {0};
Configuration _cfg;
std::shared_ptr<Plugin> _plugin;
std::shared_ptr<ngraph::Function> _function;
std::map<std::string, std::size_t> _inputIndex;
std::map<std::string, std::size_t> _outputIndex;
};
This class provides optimal thread safe default implementation. The class is recommended to be used a...
Definition: ie_executable_network_thread_safe_default.hpp:23
IInferRequestInternal::Ptr CreateInferRequest() override
Given optional implementation of creating asynchronous inference request to avoid need for it to be i...
Definition: ie_executable_network_thread_safe_default.hpp:50
virtual void Export(const std::string &modelFileName)
Export the current created executable network so it can be used later in the Import() main API.
virtual std::shared_ptr< IInferRequestInternal > CreateInferRequestImpl(InputsDataMap networkInputs, OutputsDataMap networkOutputs)
Creates an inference request internal implementation.
virtual Parameter GetConfig(const std::string &name) const
Gets configuration dedicated to plugin behaviour.
std::shared_ptr< IInferencePlugin > _plugin
A pointer to a IInferencePlugin interface.
Definition: ie_iexecutable_network_internal.hpp:150
virtual Parameter GetMetric(const std::string &name) const
Gets general runtime metric for dedicated hardware.
std::shared_ptr< IInferRequestInternal > Ptr
A shared pointer to a IInferRequestInternal interface.
Definition: ie_iinfer_request_internal.hpp:33
std::map< std::string, InputInfo::Ptr > InputsDataMap
std::map< std::string, DataPtr > OutputsDataMap

Class Fields

The example class has several fields:

  • _requestId - Tracks a number of created inference requests, which is used to distinguish different inference requests during profiling via the Intel® Instrumentation and Tracing Technology (ITT) library.
  • _cfg - Defines a configuration an executable network was compiled with.
  • _plugin - Refers to a plugin instance.
  • _function - Keeps a reference to transformed ngraph::Function which is used in ngraph reference backend computations. Note, in case of other backends with backend specific graph representation _function has different type and represents backend specific graph or just a set of computational kernels to perform an inference.
  • _inputIndex - maps a name of input with its index among all network inputs.
  • _outputIndex - maps a name of output with its index among all network outputs.

ExecutableNetwork Constructor with ICNNNetwork

This constructor accepts a generic representation of a neural network as an InferenceEngine::ICNNNetwork reference and is compiled into a backend specific device graph:

TemplatePlugin::ExecutableNetwork::ExecutableNetwork(const std::shared_ptr<const ngraph::Function>& function,
const InferenceEngine::InputsDataMap& inputInfoMap, const InferenceEngine::OutputsDataMap& outputsInfoMap,
const Configuration& cfg, const Plugin::Ptr& plugin)
: InferenceEngine::ExecutableNetworkThreadSafeDefault(nullptr, nullptr), // Disable default threads creation
_cfg(cfg),
_plugin(plugin) {
// TODO: if your plugin supports device ID (more that single instance of device can be on host machine)
// you should select proper device based on KEY_DEVICE_ID or automatic behavior
// In this case, _waitExecutor should also be created per device.
try {
CompileNetwork(function, inputInfoMap, outputsInfoMap);
InitExecutor(); // creates thread-based executor using for async requests
} catch (const InferenceEngine::Exception&) {
throw;
} catch (const std::exception& e) {
IE_THROW(Unexpected) << "Standard exception from compilation library: " << e.what();
} catch (...) {
IE_THROW(Unexpected) << "Generic exception is thrown";
}
}
#define IE_THROW(...)
Inference Engine Plugin API namespace.

The implementation CompileNetwork is fully device-specific.

CompileNetwork()

The function accepts a const shared pointer to ngraph::Function object and performs the following steps:

  1. Applies ngraph passes using TransformNetwork function, which defines plugin-specific conversion pipeline.
  2. Maps the transformed graph to a backend specific graph representation (for example, to MKLDNN graph for Intel CPU).
  3. Allocates and fills memory for graph weights, backend specific memory handles and so on.
// forward declaration
std::shared_ptr<ngraph::Function> TransformNetwork(const std::shared_ptr<const ngraph::Function>& function, const InferenceEngine::InputsDataMap& inputInfoMap,
const InferenceEngine::OutputsDataMap& outputsInfoMap);
void TemplatePlugin::ExecutableNetwork::CompileNetwork(const std::shared_ptr<const ngraph::Function>& function,
const InferenceEngine::InputsDataMap& inputInfoMap,
const InferenceEngine::OutputsDataMap& outputsInfoMap) {
// TODO: perform actual graph compilation / mapping to backend graph representation / kernels
// apply plugins transformations
_function = TransformNetwork(function, inputInfoMap, outputsInfoMap);
// Generate backend specific blob mappings. For example Inference Engine uses not ngraph::Result nodes friendly name
// as inference request output names but the name of the layer before.
for (auto&& result : _function->get_results()) {
auto previousOutput = result->get_input_source_output(0);
auto outputName = previousOutput.get_node()->get_friendly_name();
if (previousOutput.get_node()->get_output_size() > 1) {
outputName += '.' + std::to_string(previousOutput.get_index());
}
_outputIndex.emplace(outputName, _function->get_result_index(result));
}
for (auto&& parameter : _function->get_parameters()) {
_inputIndex.emplace(parameter->get_friendly_name(), _function->get_parameter_index(parameter));
}
// Perform any other steps like allocation and filling backend specific memory handles and so on
}

NOTE: After all these steps, the backend specific graph is ready to create inference requests and perform inference.

ExecutableNetwork Constructor Importing from Stream

This constructor creates a backend specific graph by importing from a stream object:

NOTE: The export of backend specific graph is done in the Export method, and data formats must be the same for both import and export.

TemplatePlugin::ExecutableNetwork::ExecutableNetwork(std::istream& model, const Configuration& cfg, const Plugin::Ptr& plugin): _cfg(cfg), _plugin(plugin) {
// read XML content
std::string xmlString;
std::uint64_t dataSize = 0;
model.read(reinterpret_cast<char*>(&dataSize), sizeof(dataSize));
xmlString.resize(dataSize);
model.read(const_cast<char*>(xmlString.c_str()), dataSize);
// read blob content
model.read(reinterpret_cast<char*>(&dataSize), sizeof(dataSize));
if (0 != dataSize) {
dataBlob = InferenceEngine::make_shared_blob<std::uint8_t>(
InferenceEngine::TensorDesc(InferenceEngine::Precision::U8, {static_cast<std::size_t>(dataSize)}, InferenceEngine::Layout::C));
dataBlob->allocate();
model.read(dataBlob->buffer(), dataSize);
}
// TODO: implement Import / Export of configuration options and merge with `cfg`
// TODO: implement Import / Export of network precisions, layouts, preprocessing info
auto cnnnetwork = _plugin->GetCore()->ReadNetwork(xmlString, std::move(dataBlob));
setNetworkInputs(cnnnetwork.getInputsInfo());
setNetworkOutputs(cnnnetwork.getOutputsInfo());
SetPointerToPlugin(_plugin->shared_from_this());
try {
CompileNetwork(cnnnetwork.getFunction(), inputInfoMap, outputInfoMap);
InitExecutor(); // creates thread-based executor using for async requests
} catch (const InferenceEngine::Exception&) {
throw;
} catch (const std::exception& e) {
IE_THROW(Unexpected) << "Standard exception from compilation library: " << e.what();
} catch (...) {
IE_THROW(Unexpected) << "Generic exception is thrown";
}
}
std::shared_ptr< Blob > Ptr

Export()

The implementation of the method should write all data to the model stream, which is required to import a backend specific graph later in the Plugin::Import method:

void TemplatePlugin::ExecutableNetwork::Export(std::ostream& modelStream) {
OV_ITT_SCOPED_TASK(itt::domains::TemplatePlugin, "ExecutableNetwork::Export");
// Note: custom ngraph extensions are not supported
std::map<std::string, ngraph::OpSet> custom_opsets;
std::stringstream xmlFile, binFile;
ngraph::pass::Serialize serializer(xmlFile, binFile, ngraph::pass::Serialize::Version::IR_V10, custom_opsets);
serializer.run_on_function(_function);
auto m_constants = binFile.str();
auto m_model = xmlFile.str();
auto dataSize = static_cast<std::uint64_t>(m_model.size());
modelStream.write(reinterpret_cast<char*>(&dataSize), sizeof(dataSize));
modelStream.write(m_model.c_str(), dataSize);
dataSize = static_cast<std::uint64_t>(m_constants.size());
modelStream.write(reinterpret_cast<char*>(&dataSize), sizeof(dataSize));
modelStream.write(reinterpret_cast<char*>(&m_constants[0]), dataSize);
// TODO: implement network precision, layout, preprocessing info serialization
}
Serialize transformation converts ngraph::Function into IR files.
Definition: serialize.hpp:30

CreateInferRequest()

The method creates an asynchronous inference request and returns it. While the public Inference Engine API has a single interface for inference request, which can be executed in synchronous and asynchronous modes, a plugin library implementation has two separate classes:

  • Synchronous inference request, which defines pipeline stages and runs them synchronously in the Infer method.
  • Asynchronous inference request, which is a wrapper for a synchronous inference request and can run a pipeline asynchronously. Depending on a device pipeline structure, it can has one or several stages:
    • For single-stage pipelines, there is no need to define this method and create a class derived from InferenceEngine::AsyncInferRequestThreadSafeDefault. For single stage pipelines, a default implementation of this method creates InferenceEngine::AsyncInferRequestThreadSafeDefault wrapping a synchronous inference request and runs it asynchronously in the _taskExecutor executor.
    • For pipelines with multiple stages, such as performing some preprocessing on host, uploading input data to a device, running inference on a device, or downloading and postprocessing output data, schedule stages on several task executors to achieve better device use and performance. You can do it by creating a sufficient number of inference requests running in parallel. In this case, device stages of different inference requests are overlapped with preprocessing and postprocessing stage giving better performance.

      IMPORTANT: It is up to you to decide how many task executors you need to optimally execute a device pipeline.

      InferenceEngine::IInferRequestInternal::Ptr TemplatePlugin::ExecutableNetwork::CreateInferRequest() {
      auto internalRequest = CreateInferRequestImpl(_networkInputs, _networkOutputs);
      return std::make_shared<TemplateAsyncInferRequest>(std::static_pointer_cast<TemplateInferRequest>(internalRequest), _taskExecutor, _plugin->_waitExecutor,
      _callbackExecutor);
      }

      CreateInferRequestImpl()

This is a helper method used by CreateInferRequest to create a synchronous inference request, which is later wrapped with the asynchronous inference request class:

InferenceEngine::IInferRequestInternal::Ptr TemplatePlugin::ExecutableNetwork::CreateInferRequestImpl(InferenceEngine::InputsDataMap networkInputs,
return std::make_shared<TemplateInferRequest>(networkInputs, networkOutputs, std::static_pointer_cast<ExecutableNetwork>(shared_from_this()));
}

GetMetric()

Returns a metric value for a metric with the name name. A metric is a static type of information about an executable network. Examples of metrics:

  • EXEC_NETWORK_METRIC_KEY(NETWORK_NAME) - name of an executable network
  • EXEC_NETWORK_METRIC_KEY(OPTIMAL_NUMBER_OF_INFER_REQUESTS) - heuristic to denote an optimal (or at least sub-optimal) number of inference requests needed to run asynchronously to use the current device fully
  • Any other executable network metric specific for a particular device. Such metrics and possible values must be declared in a plugin configuration public header, for example, template/template_config.hpp
InferenceEngine::Parameter TemplatePlugin::ExecutableNetwork::GetMetric(const std::string& name) const {
// TODO: return more supported values for metrics
if (EXEC_NETWORK_METRIC_KEY(SUPPORTED_METRICS) == name) {
IE_SET_METRIC_RETURN(SUPPORTED_METRICS, std::vector<std::string> {METRIC_KEY(NETWORK_NAME), METRIC_KEY(SUPPORTED_METRICS),
METRIC_KEY(SUPPORTED_CONFIG_KEYS), METRIC_KEY(OPTIMAL_NUMBER_OF_INFER_REQUESTS)});
} else if (EXEC_NETWORK_METRIC_KEY(SUPPORTED_CONFIG_KEYS) == name) {
std::vector<std::string> configKeys = {CONFIG_KEY(DEVICE_ID), CONFIG_KEY(PERF_COUNT), TEMPLATE_CONFIG_KEY(THROUGHPUT_STREAMS)};
auto streamExecutorConfigKeys = InferenceEngine::IStreamsExecutor::Config {}.SupportedKeys();
for (auto&& configKey : streamExecutorConfigKeys) {
configKeys.emplace_back(configKey);
}
IE_SET_METRIC_RETURN(SUPPORTED_CONFIG_KEYS, configKeys);
} else if (EXEC_NETWORK_METRIC_KEY(NETWORK_NAME) == name) {
auto networkName = _function->get_friendly_name();
IE_SET_METRIC_RETURN(NETWORK_NAME, networkName);
} else if (EXEC_NETWORK_METRIC_KEY(OPTIMAL_NUMBER_OF_INFER_REQUESTS) == name) {
unsigned int value = _cfg._streamsExecutorConfig._streams;
IE_SET_METRIC_RETURN(OPTIMAL_NUMBER_OF_INFER_REQUESTS, value);
} else {
IE_THROW() << "Unsupported ExecutableNetwork metric: " << name;
}
}
#define IE_SET_METRIC_RETURN(name,...)
Return metric value with specified name and arguments .... Example:
Definition: ie_metric_helpers.hpp:52
#define METRIC_KEY(name)
#define CONFIG_KEY(name)
#define EXEC_NETWORK_METRIC_KEY(name)
Defines IStreamsExecutor configuration.
Definition: ie_istreams_executor.hpp:52
std::vector< std::string > SupportedKeys()
Supported Configuration keys.

The IE_SET_METRIC_RETURN helper macro sets metric value and checks that the actual metric type matches a type of the specified value.

GetConfig()

Returns a current value for a configuration key with the name name. The method extracts configuration values an executable network is compiled with.

InferenceEngine::Parameter TemplatePlugin::ExecutableNetwork::GetConfig(const std::string& name) const {
return _cfg.Get(name);
}

This function is the only way to get configuration values when a network is imported and compiled by other developers and tools (for example, the Compile tool).

The next step in plugin library implementation is the Synchronous Inference Request class.