Introduction to Intel® Deep Learning Deployment Toolkit

Deployment Challenges

Deploying deep learning networks from the training environment to embedded platforms for inference might be a complex task that introduces a number of technical challenges that must be addressed:

Deployment Workflow

The process assumes that you have a network model trained using one of the supported frameworks. The scheme below illustrates the typical workflow for deploying a trained deep learning model:

workflow_steps.png

The steps are:

  1. Configure Model Optimizer for the specific framework (used to train your model).
  2. Run Model Optimizer to produce an optimized Intermediate Representation (IR) of the model based on the trained network topology, weights and biases values, and other optional parameters.
  3. Test the model in the IR format using the Inference Engine in the target environment with provided Inference Engine sample applications.
  4. Integrate Inference Engine in your application to deploy the model in the target environment.

Model Optimizer

Model Optimizer is a cross-platform command line tool that facilitates the transition between the training and deployment environment, performs static model analysis and automatically adjusts deep learning models for optimal execution on end-point target devices.

Model Optimizer is designed to support multiple deep learning supported frameworks and formats.

While running Model Optimizer you do not need to consider what target device you wish to use, the same output of the MO can be used in all targets.

Model Optimizer Workflow

The process assumes that you have a network model trained using one of the supported frameworks. The Model Optimizer workflow can be described as following:

Supported Frameworks and Formats

Supported Models

For the list of supported models refer to the framework or format specific page:

Intermediate Representation

Intermediate representation describing a deep learning model plays an important role connecting the OpenVINO™ toolkit components. The IR is a pair of files:

Intermediate Representation (IR) files can be read, loaded and inferred with the Inference Engine. Inference Engine API offers a unified API across a number of supported Intel® platforms. IR is also consumed, modified and written by Post-Training Optimization Tool which provides quantization capabilities.

Refer to a dedicated description about Intermediate Representation and Operation Sets for further details.

nGraph Integration

OpenVINO toolkit is powered by nGraph capabilities for Graph construction API, Graph transformation engine and Reshape. nGraph Function is used as an intermediate representation for a model in the run-time underneath the CNNNetwork API. The conventional representation for CNNNetwork is still available if requested for backward compatibility when some conventional API methods are used. Please refer to the Overview of nGraph Flow describing the details of nGraph integration into the Inference Engine and co-existence with the conventional representation.

Deprecation Notice

Deprecation Begins June 1, 2020
Removal Date December 1, 2020

Starting with the OpenVINO™ toolkit 2020.2 release, all of the features previously available through nGraph have been merged into the OpenVINO™ toolkit. As a result, all the features previously available through ONNX RT Execution Provider for nGraph have been merged with ONNX RT Execution Provider for OpenVINO™ toolkit.

Therefore, ONNX RT Execution Provider for nGraph will be deprecated starting June 1, 2020 and will be completely removed on December 1, 2020. Users are recommended to migrate to the ONNX RT Execution Provider for OpenVINO™ toolkit as the unified solution for all AI inferencing on Intel® hardware.

Inference Engine

Inference Engine is a runtime that delivers a unified API to integrate the inference with application logic:

The Inference Engine supports inference of multiple image classification networks, including AlexNet, GoogLeNet, VGG and ResNet families of networks, fully convolutional networks like FCN8 used for image segmentation, and object detection networks like Faster R-CNN.

For the full list of supported hardware, refer to the Supported Devices section.

For Intel® Distribution of OpenVINO™ toolkit, the Inference Engine package contains headers, runtime libraries, and sample console applications demonstrating how you can use the Inference Engine in your applications.

The open source version is available in the OpenVINO™ toolkit GitHub repository and can be built for supported platforms using the Inference Engine Build Instructions.

See Also

Optimization Notice

For complete information about compiler optimizations, see our Optimization Notice.