After you have used the Model Optimizer to create an Intermediate Representation (IR), use the Inference Engine to infer input data.

The Inference Engine is a C++ library with a set of C++ classes to infer input data (images) and get a result. The C++ library provides an API to read the Intermediate Representation, set the input and output formats, and execute the model on devices.

To learn about how to use the Inference Engine API for your application, see the Integrating Inference Engine in Your Application documentation.

Complete API Reference is in the full offline package documentation:

Go to <INSTALL_DIR>/deployment_tools/documentation/, where <INSTALL_DIR> is the OpenVINO toolkit installation directory.
Open index.html in an Internet browser.
Select API References from the menu at the top of the screen.
From the API References page, select Inference Engine API References.

NOTE: To read about the "legacy" Inference Engine API from previous releases (lower than 2018 R1), see Integrating Inference Engine in Your Application (legacy API). It is best to stop using the legacy API since it will be removed in a future product release.

Inference Engine uses a plugin architecture. Inference Engine plugin is a software component that contains complete implementation for inference on a certain Intel® hardware device: CPU, GPU, VPU, FPGA, etc. Each plugin implements the unified API and provides additional hardware-specific APIs.

Modules in the Inference Engine component

Core Inference Engine Libraries

Your application must link to the core Inference Engine library:

Linux* OS: ibinference_engine.so
Windows* OS: inference_engine.dll

The required C++ header files are located in the include directory.

This library contains the classes to:

Read the network (InferenceEngine::CNNNetReader)
Manipulate network information (InferenceEngine::CNNNetwork)
Create and use the different plugins (InferenceEngine::PluginDispatcher)
Execute and pass inputs and outputs (InferenceEngine::ExecutableNetwork and InferenceEngine::InferRequest)

Device-specific Plugin Libraries

For each supported target device, Inference Engine provides a plugin — a DLL/shared library that contains complete implementation for inference on this particular device. The following plugins are avalible:

Plugin	Device Type
CPU	Intel® Xeon® with Intel® AVX2 and AVX512, Intel® Core™ Processors with Intel® AVX2, Intel® Atom® Processors with Intel® SSE
GPU	Intel® Processor Graphics, including Intel® HD Graphics and Intel® Iris® Graphics
FPGA	Intel® Arria® 10 GX FPGA Development Kit, Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA, Intel® Vision Accelerator Design with an Intel® Arria 10 FPGA
MYRIAD	Intel® Movidius™ Neural Compute Stick powered by the Intel® Movidius™ Myriad™ 2, Intel® Neural Compute Stick 2 powered by the Intel® Movidius™ Myriad™ X
GNA	Intel® Speech Enabling Developer Kit, Amazon Alexa* Premium Far-Field Developer Kit, Intel® Pentium® Silver processor J5005, Intel® Celeron® processor J4005, Intel® Core™ i3-8121U processor
HETERO	Enables distributing a calculation workload across several devices

The table below shows the plugin libraries and dependencies for Linux and Windows platforms.

Plugin	Library name for Linux	Dependency libraries for Linux	Library name for Windows	Dependency libraries for Windows
CPU	`libMKLDNNPlugin.so`	`libmklml_tiny.so`, `libiomp5md.so`	`MKLDNNPlugin.dll`	`mklml_tiny.dll`, `libiomp5md.dll`
GPU	`libclDNNPlugin.so`	`libclDNN64.so`	`clDNNPlugin.dll`	`clDNN64.dll`
FPGA	`libdliaPlugin.so`	`libdla_compiler_core.so`	`dliaPlugin.dll`	`dla_compiler_core.dll`
MYRIAD	`libmyriadPlugin.so`	No dependencies	`myriadPlugin.dll`	No dependencies
HDDL	`libHDDLPlugin.so`	`libbsl.so`, `libhddlapi.so`, `libmvnc-hddl.so`	`HDDLPlugin.dll`	`bsl.dll`, `hddlapi.dll`, `json-c.dll`, `libcrypto-1_1-x64.dll`, `libssl-1_1-x64.dll`, `mvnc-hddl.dll`
GNA	`libGNAPlugin.so`	`libgna_api.so`	`GNAPlugin.dll`	`gna.dll`
HETERO	`libHeteroPlugin.so`	Same as for selected plugins	`HeteroPlugin.dll`	Same as for selected plugins

Make sure those libraries are in your computer's path or in the place you pointed to in the plugin loader. Make sure each plugin's related dependencies are in the:

Linux: LD_LIBRARY_PATH
Windows: PATH

On Linux, use the script bin/setupvars.sh to set the environment variables.

On Windows, run the bin\setupvars.bat batch file to set the environment variables.

To learn more about supported devices and corresponding plugins, see the Supported Devices chapter.

Common Workflow for Using the Inference Engine API

The common workflow contains the following steps:

Read the Intermediate Representation - Using the InferenceEngine::CNNNetReader class, read an Intermediate Representation file into a CNNNetwork class. This class represents the network in host memory.
Prepare inputs and outputs format - After loading the network, specify input and output precision, and the layout on the network. For these specification, use the CNNNetwork::getInputInfo() and CNNNetwork::getOutputInfo().
Select Plugin - Select the plugin on which to load your network. Create the plugin with the InferenceEngine::PluginDispatcher load helper class. Pass per device loading configurations specific to this device, and register extensions to this device.
Compile and Load - Use the plugin interface wrapper class InferenceEngine::InferencePlugin to call the LoadNetwork() API to compile and load the network on the device. Pass in the per-target load configuration for this compilation and load operation.
Set input data - With the network loaded, you have an ExecutableNetwork object. Use this object to create an InferRequest in which you signal the input buffers to use for input and output. Specify a device-allocated memory and copy it into the device memory directly, or tell the device to use your application memory to save a copy.
Execute - With the input and output memory now defined, choose your execution mode:
- Synchronously - Infer() method. Blocks until inference finishes.
- Asynchronously - StartAsync() method. Check status with the wait() method (0 timeout), wait, or specify a completion callback.
Get the output - After inference is completed, get the output memory or read the memory you provided earlier. Do this with the IInferRequest::GetBlob() API.

Modules in the Inference Engine component

Core Inference Engine Libraries

Device-specific Plugin Libraries

Common Workflow for Using the Inference Engine API

Further Reading