Introduction to nGraph Flow in Inference Engine

New Run-Time Intermediate Representation (IR): nGraph

Starting from the OpenVINO™ release 2020.1, the Inference Engine integrates the nGraph Core. That implies that the Inference Engine uses a new way to represent a model in run time underneath of the conventional CNNNetwork API, which is an instance of ngraph::Function.

Besides the representation update, nGraph integration resulted in the following changes and new features:

  1. New operations sets. When operations from the nGraph Core were combined with conventional layers from CNNNetwork, there were created a new sets of operations called `opset1`, `opset2` and etc., which covered both interfaces except several not very important cases. Operations from opset3 are generated by the Model Optimizer and are accepted in the Inference Engine.
  2. New version approach that attaches a version to each operation rather than to the entire IR file format. IR is still versioned but has a different meaning. For details, see Deep Learning Network Intermediate Representation and Operation Sets in OpenVINO™.
  3. Creating models in run-time without loading IR from an xml/binary file. You can enable it by creating ngraph::Function passing it to CNNNetwork.
  4. Run-time reshape capability and constant folding are implemented through the nGraph code for more operations compared to previous releases. As a result, more models can be reshaped. For details, see the dedicated guide about the reshape capability.
  5. Loading model from ONNX format without converting it to the Inference Engine IR.

The conventional flow that is not based on nGraph is still available. The complete picture of co-existence of legacy and new flows is presented below. The rest of the document describes the coexistence of legacy and new flows showed in the picture below:

TopLevelNGraphFlow.png

Read the Intermediate Representation to CNNNetwork

As the new operation set is introduced, the Model Optimizer generates the IR version 10 using the new operations by default. Each layer generated in the IR has a semantics matching to the corresponding operation from the nGraph namespace opset3. The IR version 10 automatically triggers the nGraph flow inside the Inference Engine. When such IR is read in an application, the Inference Engine IR reader produces CNNNetwork that encapsulates the ngraph::Function instance underneath. Thus the OpenVINO IR becomes a new serialization format for the nGraph IR, and it can be deserialized reading the CNNNetwork.

IMPORTANT: Conventional interfaces are used (CNNNetwork, the reader), so no changes required in most applications.

NOTE: While you still can use old APIs, there is an independent process of continuous improvements in the Inference Engine API. For example, the Core::Read API is recommended to use instead of CNNNetworkReader. These changes are independent of nGraph integration and do not enable or disable new features.

Interpretation of the IR version 10 differs from the old IR version. Besides having a different operations set, the IR version 10 ignores the shapes and data types assigned to the ports in an XML file. Both shapes and types are reinferred while loading to the Inference Engine using the nGraph shape and type propagation function that is a part of each nGraph operation.

Legacy IR Versions

You can read old versions of the IR in the Inference Engine. Each version below or equal to 7 is treated as an old one. When the Inference Engine reader reads an old version of the IR, it does not use the nGraph representation. There is no way to activate nGraph flow with an old IR version. The rest of this document is not applied in this case.

Model Optimizer generates the IR version 10 by default, and there is the command line key --generate_deprecated_IR_V7 which switches generation to the legacy IR version 7. It is useful when the new nGraph flow does not work for some reason.

Build a Model in the Application

Alternative method to feed the Inference Engine with a model is to create the model in the run time. It is achieved by creation of the ngraph::Function construction using nGraph operation classes and optionally user-defined operations. For details, see Add Custom nGraph Operations and examples. At this stage, the code is completely independent of the rest of the Inference Engine code and can be built separately. After you construct an instance of ngraph::Function, you can use it to create CNNNetwork by passing it to the new constructor for this class.

Initializing CNNNetwork from the nGraph Function means encapsulating the object and not converting it to a conventional representation. Going to low-level details, technically it is achieved by using another class for the CNNNetwork internals. The old representation that is used for former versions of IR before version 10 uses CNNNetworkImpl. The new representation that is built around nGraph uses CNNNetworkNGraphImpl.

NewAndOldCNNNetworkImpl.png

Automatic Conversion to the Old Representation

The old representation is still required in the cases listed below. When old representation is required, the conversion from the ngraph::Function to the old representation is called automatically. The following methods lead to the automatic conversion:

  1. Using the old API, which is expected to produce an old representation. Guaranteed to be read-only. Once you call such a method, the original nGraph representation is preserved and continues to be used in the successive calls.

    1.1. CNNNetwork::serialize. Dumps the old representation after automatically called conversion. Cannot be used to dump IR V10. For details, see Graph Debug Capabilities.

  2. Calling CNNNetwork methods that modify the model. After that nGraph representation is lost and cannot be used afterwards.

    1.1. CNNNetwork::addLayer

    1.2. CNNNetwork::setBatchSize. Still implemented through old logic for backward compatibility without using nGraph capabilities. For details, see Using Shape Inference.

  3. Using methods that return objects inside an old representation. Using these methods does not mean modification of the model, but you are not limited by the API to make read-only changes. These methods should be used in the read-only mode with respect to a model representation. If the model is changed, for example attribute of some layer is changed or layers are reconnected, the modification is lost whenever any method that uses nGraph is called, including methods inside plugins like CNNNetwork::reshape. It is hard to predict whether the nGraph function is used in a plugin or other methods of CNNNetworks, so modifying a network using the following methods is strongly not recommended. This is an important limitation that is introduced for the old API calls listed below:
    1.1. `Data::getInputTo`
    
    1.2. `Data::getCreatorLayer`
    
    1.3. `CNNNetwork::getLayerByName`
    
    1.4. Iterating over `CNNLayer` objects in `CNNNetwork`: `CNNNetwork::begin`, `details::CNNNetworkIterator` class.
    
  4. Using a conventional plugin that accepts the old representation only.

Though the conversion is always a one-way process, which means there is no method to convert back, there are important caveats.

In the cases [1] and [3], both representations are held underneath and you should use the old representation in the read-only mode only from the caller side. It is hard to track from the Inference Engine side whether the API is used in the read-only mode or for modification of the model.

That is why when using potentially modifying methods listed in section [3] above, you should not modify the model via those methods. Use a direct manipulation of the nGraph function instead.

Conversion Function

Inference Engine implements the conversion function that is used when the nGraph function is transformed to the old CNNNetworkImpl representation. This conversion function is hidden and you cannot call it directly from the application. Nevertheless, it is an important component of the model transformation pipeline in the Inference Engine. Some issues of models may be caught during the conversion process in this function. Exceptions are thrown in this function, and you should know what this function does to find a root cause.

The conversion function performs the following steps:

  1. Convert and decompose some operations as the first step of the nGraph function preparation for optimization. Reduce operation set to easily optimize it at the next stages. For example, decomposing of BatchNormInference happens at this stage.
  2. Optimizing transformations that usually happen in the Model Optimizer are called here, because the nGraph function is not always read from an already optimized IR.
  3. Changing operation set from opsetX to legacy layer semantics described in the Legacy Layers Catalog. The model is still represented as the nGraph function at this stage, but the operation set is completely different.
  4. One-to-one conversion of nGraph representation to the corresponding CNNNetworkImpl without changing its semantics. You can see the result of the conversion by calling the CNNNetwork::serialize method, which produces legacy IR semantics, which is not nGraph-based even if it is applied to CNNNetwork constructed from the nGraph Function. It may help in debugging, see Graph Debug Capabilities to view all options for dumping new and old IR representations.

Deprecation Notice

Deprecation Begins June 1, 2020
Removal Date December 1, 2020

Starting with the OpenVINO™ toolkit 2020.2 release, all of the features previously available through nGraph have been merged into the OpenVINO™ toolkit. As a result, all the features previously available through ONNX RT Execution Provider for nGraph have been merged with ONNX RT Execution Provider for OpenVINO™ toolkit.

Therefore, ONNX RT Execution Provider for nGraph will be deprecated starting June 1, 2020 and will be completely removed on December 1, 2020. Users are recommended to migrate to the ONNX RT Execution Provider for OpenVINO™ toolkit as the unified solution for all AI inferencing on Intel® hardware.