OpenVINO API Tutorial¶
This tutorial is also available as a Jupyter notebook that can be cloned directly from GitHub. See the installation guide for instructions to run this tutorial locally on Windows, Linux or macOS. To run without installing anything, click the launch binder button.
This notebook explains the basics of the OpenVINO Inference Engine API. It covers:
The notebook is divided into sections with headers. Each section is standalone and does not depend on previous sections. A segmentation and classification IR model and a segmentation ONNX model are provided as examples. You can replace these model files with your own models. The exact outputs will be different, but the process is the same.
Load Inference Engine and Show Info¶
Initialize Inference Engine with IECore()
from openvino.inference_engine import IECore
ie = IECore()
Inference Engine can load a network on a device. A device in this
context means a CPU, an Intel GPU, a Neural Compute Stick 2, etc. The
available_devices
property shows the devices that are available on
your system. The “FULL_DEVICE_NAME” option to ie.get_metric()
shows
the name of the device.
In this notebook the CPU device is used. To use an integrated GPU, use
device_name="GPU"
instead. Note that loading a network on GPU will
be slower than loading a network on CPU, but inference will likely be
faster.
devices = ie.available_devices
for device in devices:
device_name = ie.get_metric(device_name=device, metric_name="FULL_DEVICE_NAME")
print(f"{device}: {device_name}")
CPU: Intel(R) Core(TM) i9-10920X CPU @ 3.50GHz
Loading a Model¶
After initializing Inference Engine, first read the model file with
read_network()
, then load it to the specified device with
load_network()
.
IR Model¶
An IR (Intermediate Representation) model consists of an .xml file,
containing information about network topology, and a .bin file,
containing the weights and biases binary data. read_network()
expects the weights file to be located in the same directory as the xml
file, with the same filename, and the extension .bin:
model_weights_file == Path(model_xml).with_suffix(".bin")
. If this
is the case, specifying the weights file is optional. If the weights
file has a different filename, it can be specified with the weights
parameter to read_network()
.
See the
tensorflow-to-openvino
and
pytorch-onnx-to-openvino
notebooks for information on how to convert your existing TensorFlow,
PyTorch or ONNX model to OpenVINO’s IR format with OpenVINO’s Model
Optimizer. For exporting ONNX models to IR with default settings, the
.serialize()
method can also be used.
from openvino.inference_engine import IECore
ie = IECore()
classification_model_xml = "model/classification.xml"
net = ie.read_network(model=classification_model_xml)
exec_net = ie.load_network(network=net, device_name="CPU")
ONNX Model¶
An ONNX model is a single file. Reading and loading an ONNX model works
the same way as reading and loading an IR model. The model
argument
points to the ONNX filename.
from openvino.inference_engine import IECore
ie = IECore()
onnx_model = "model/segmentation.onnx"
net_onnx = ie.read_network(model=onnx_model)
exec_net_onnx = ie.load_network(network=net_onnx, device_name="CPU")
The ONNX model can be exported to IR with .serialize()
:
net_onnx.serialize("model/exported_onnx_model.xml")
Getting Information about a Model¶
The OpenVINO IENetwork instance stores information about the model.
Information about the inputs and outputs of the model are in
net.input_info
and net.outputs
. These are also properties of the
ExecutableNetwork instance. Where we use net.input_info
and
net.outputs
in the cells below, you can also use
exec_net.input_info
and exec_net.outputs
.
Model Inputs¶
from openvino.inference_engine import IECore
ie = IECore()
classification_model_xml = "model/classification.xml"
net = ie.read_network(model=classification_model_xml)
net.input_info
{'input': <openvino.inference_engine.ie_api.InputInfoPtr at 0x7f3942bf5ea0>}
The cell above shows that the model loaded expects one input, with the name input. If you loaded a different model, you may see a different input layer name, and you may see more inputs.
It is often useful to have a reference to the name of the first input
layer. For a model with one input, next(iter(net.input_info))
gets
this name.
input_layer = next(iter(net.input_info))
input_layer
'input'
Information for this input layer is stored in input_info
. The next
cell prints the input layout, precision and shape.
print(f"input layout: {net.input_info[input_layer].layout}")
print(f"input precision: {net.input_info[input_layer].precision}")
print(f"input shape: {net.input_info[input_layer].tensor_desc.dims}")
input layout: NCHW
input precision: FP32
input shape: [1, 3, 224, 224]
This cell output tells us that the model expects inputs with a shape of [1,3,224,224], and that this is in NCHW layout. This means that the model expects input data with a batch size (N) of 1, 3 channels (C), and images of a height (H) and width (W) of 224. The input data is expected to be of FP32 (floating point) precision.
Model Outputs¶
from openvino.inference_engine import IECore
ie = IECore()
classification_model_xml = "model/classification.xml"
net = ie.read_network(model=classification_model_xml)
net.outputs
{'MobilenetV3/Predictions/Softmax': <openvino.inference_engine.ie_api.DataPtr at 0x7f3861ac6bf0>}
Model output info is stored in net.outputs
. The cell above shows
that the model returns one output, with the name
MobilenetV3/Predictions/Softmax. If you loaded a different model, you
will probably see a different output layer name, and you may see more
outputs.
Since this model has one output, follow the same method as for the input layer to get its name.
output_layer = next(iter(net.outputs))
output_layer
'MobilenetV3/Predictions/Softmax'
Getting the output layout, precision and shape is similar to getting the input layout, precision and shape.
print(f"output layout: {net.outputs[output_layer].layout}")
print(f"output precision: {net.outputs[output_layer].precision}")
print(f"output shape: {net.outputs[output_layer].shape}")
output layout: NC
output precision: FP32
output shape: [1, 1001]
This cell output shows that the model returns outputs with a shape of [1, 1001], where 1 is the batch size (N) and 1001 the number of classes (C). The output is returned as 32-bit floating point.
Doing Inference on a Model¶
To do inference on a model, call the infer()
method of the
ExecutableNetwork, the exec_net
that we loaded with
load_network()
. infer()
expects one argument: inputs. This is
a dictionary, mapping input layer names to input data.
Preparation: load network
from openvino.inference_engine import IECore
ie = IECore()
classification_model_xml = "model/classification.xml"
net = ie.read_network(model=classification_model_xml)
exec_net = ie.load_network(network=net, device_name="CPU")
input_layer = next(iter(net.input_info))
output_layer = next(iter(net.outputs))
Preparation: load image and convert to input shape
To propagate an image through the network, it needs to be loaded into an array, resized to the shape that the network expects, and converted to the network’s input layout.
import cv2
image_filename = "data/coco_hollywood.jpg"
image = cv2.imread(image_filename)
image.shape
(663, 994, 3)
The image has a shape of (663,994,3). It is 663 pixels in height, 994 pixels in width, and has 3 color channels. We get a reference to the height and width that the network expects and resize the image to that size.
# N,C,H,W = batch size, number of channels, height, width
N, C, H, W = net.input_info[input_layer].tensor_desc.dims
# OpenCV resize expects the destination size as (width, height)
resized_image = cv2.resize(src=image, dsize=(W, H))
resized_image.shape
(224, 224, 3)
Now the image has the width and height that the network expects. It is
still in H,C,W format. We change it to N,C,H,W format (where N=1) by
first calling np.transpose()
to change to C,H,W and then adding the
N dimension by calling np.expand_dims()
. Convert the data to FP32
with np.astype()
.
import numpy as np
input_data = np.expand_dims(np.transpose(resized_image, (2, 0, 1)), 0).astype(np.float32)
input_data.shape
(1, 3, 224, 224)
Do inference
Now that the input data is in the right shape, doing inference is one simple command:
result = exec_net.infer({input_layer: input_data})
result
{'MobilenetV3/Predictions/Softmax': array([[1.9758525e-04, 5.8728176e-05, 6.4592241e-05, ..., 4.0716415e-05,
1.7331471e-04, 1.3031582e-04]], dtype=float32)}
.infer()
returns a dictionary, mapping output layers to data. Since
we know this network returns one output, and we stored the reference to
the output layer in the output_layer
variable, we can get the data
with result[output_layer]
output = result[output_layer]
output.shape
(1, 1001)
The output shape is (1,1001), which we saw is the expected shape of the output. This output shape indicates that the network returns probabilities for 1001 classes. To transform this into meaningful information, check out the hello world notebook.
Reshaping and Resizing¶
Change Image Size¶
Instead of reshaping the image to fit the model, you can also reshape the model to fit the image. Note that not all models support reshaping, and models that do may not support all input shapes. The model accuracy may also suffer if you reshape the model input shape.
We first check the input shape of the model, and then reshape to the new input shape.
from openvino.inference_engine import IECore
ie = IECore()
segmentation_model_xml = "model/segmentation.xml"
segmentation_net = ie.read_network(model=segmentation_model_xml)
segmentation_input_layer = next(iter(segmentation_net.input_info))
segmentation_output_layer = next(iter(segmentation_net.outputs))
print("~~~~ ORIGINAL MODEL ~~~~")
print(f"input layout: {segmentation_net.input_info[segmentation_input_layer].layout}")
print(f"input shape: {segmentation_net.input_info[segmentation_input_layer].tensor_desc.dims}")
print(f"output shape: {segmentation_net.outputs[segmentation_output_layer].shape}")
new_shape = (1, 3, 544, 544)
segmentation_net.reshape({segmentation_input_layer: new_shape})
segmentation_exec_net = ie.load_network(network=segmentation_net, device_name="CPU")
print("~~~~ RESHAPED MODEL ~~~~")
print(f"net input shape: {segmentation_net.input_info[segmentation_input_layer].tensor_desc.dims}")
print(
f"exec_net input shape: "
f"{segmentation_exec_net.input_info[segmentation_input_layer].tensor_desc.dims}"
)
print(f"output shape: {segmentation_net.outputs[segmentation_output_layer].shape}")
~~~~ ORIGINAL MODEL ~~~~
input layout: NCHW
input shape: [1, 3, 512, 512]
output shape: [1, 1, 512, 512]
~~~~ RESHAPED MODEL ~~~~
net input shape: [1, 3, 544, 544]
exec_net input shape: [1, 3, 544, 544]
output shape: [1, 1, 544, 544]
The input shape for the segmentation network is [1,3,512,512], with an
NCHW layout: the network expects 3-channel images with a width and
height of 512 and a batch size of 1. We reshape the network to make it
accept input images with a width and height of 544 with the
.reshape()
method of IENetwork
. This segmentation network always
returns arrays with the same width and height as the input width and
height, so setting the input dimensions to 544x544 also modifies the
output dimensions. After reshaping, load the network to the device
again.
Change Batch Size¶
We can also use .reshape()
to set the batch size, by increasing the
first element of new_shape. For example, to set a batch size of two,
set new_shape = (2,3,544,544)
in the cell above. If you only want to
change the batch size, you can also set the batch_size
property
directly.
from openvino.inference_engine import IECore
ie = IECore()
segmentation_model_xml = "model/segmentation.xml"
segmentation_net = ie.read_network(model=segmentation_model_xml)
segmentation_input_layer = next(iter(segmentation_net.input_info))
segmentation_output_layer = next(iter(segmentation_net.outputs))
segmentation_net.batch_size = 2
segmentation_exec_net = ie.load_network(network=segmentation_net, device_name="CPU")
print(f"input layout: {segmentation_net.input_info[segmentation_input_layer].layout}")
print(f"input shape: {segmentation_net.input_info[segmentation_input_layer].tensor_desc.dims}")
print(f"output shape: {segmentation_net.outputs[segmentation_output_layer].shape}")
input layout: NCHW
input shape: [2, 3, 512, 512]
output shape: [2, 1, 512, 512]
The output shows that by setting the batch size to 2, the first element (N) of the input and output shape now has a value of 2. Let’s see what happens if we propagate our input image through the network:
import numpy as np
from openvino.inference_engine import IECore
ie = IECore()
segmentation_model_xml = "model/segmentation.xml"
segmentation_net = ie.read_network(model=segmentation_model_xml)
segmentation_input_layer = next(iter(segmentation_net.input_info))
segmentation_output_layer = next(iter(segmentation_net.outputs))
input_data = np.random.rand(*segmentation_net.input_info[segmentation_input_layer].tensor_desc.dims)
segmentation_net.batch_size = 2
segmentation_exec_net = ie.load_network(network=segmentation_net, device_name="CPU")
result_batch = segmentation_exec_net.infer({segmentation_input_layer: input_data})
print(f"input data shape: {input_data.shape}")
print(f"result data data shape: {result_batch[segmentation_output_layer].shape}")
input data shape: (1, 3, 512, 512)
result data data shape: (2, 1, 512, 512)
The output shows that if batch_size
is set to 2, the network output
will have a batch size of 2, even if only one image was propagated
through the network. Regardless of batch size, you can always do
inference on one image. In that case, only the first network output
contains meaningful information.
Verify that inference on two images works by creating random data with a batch size of 2:
import numpy as np
from openvino.inference_engine import IECore
ie = IECore()
segmentation_model_xml = "model/segmentation.xml"
segmentation_net = ie.read_network(model=segmentation_model_xml)
segmentation_input_layer = next(iter(segmentation_net.input_info))
segmentation_output_layer = next(iter(segmentation_net.outputs))
segmentation_net.batch_size = 2
input_data = np.random.rand(*segmentation_net.input_info[segmentation_input_layer].tensor_desc.dims)
segmentation_exec_net = ie.load_network(network=segmentation_net, device_name="CPU")
result_batch = segmentation_exec_net.infer({segmentation_input_layer: input_data})
print(f"input data shape: {input_data.shape}")
print(f"result data shape: {result_batch[segmentation_output_layer].shape}")
input data shape: (2, 3, 512, 512)
result data shape: (2, 1, 512, 512)
Caching a Model¶
For some devices, like GPU, loading a model can take some time. Model
Caching solves this issue by caching the model in a cache directory. If
ie.set_config({"CACHE_DIR": cache_dir}, device_name=device_name)
is
set, caching will be used. This option checks if a model exists in the
cache. If so, it loads it from the cache. If not, it loads the model
regularly, and stores it in the cache, so that the next time the model
is loaded when this option is set, the model will be loaded from the
cache.
In the cell below, we create a model_cache directory as a subdirectory of model, where the model will be cached for the specified device. The model will be loaded to the GPU. After running this cell once, the model will be cached, so subsequent runs of this cell will load the model from the cache. Note: Model Caching is not available on CPU devices
import time
from pathlib import Path
from openvino.inference_engine import IECore
ie = IECore()
device_name = "GPU" # Model Caching is not available for CPU
if device_name in ie.available_devices and device_name != "CPU":
cache_path = Path("model/model_cache")
cache_path.mkdir(exist_ok=True)
# Enable caching for Inference Engine. Comment out this line to disable caching
ie.set_config({"CACHE_DIR": str(cache_path)}, device_name=device_name)
classification_model_xml = "model/classification.xml"
net = ie.read_network(model=classification_model_xml)
start_time = time.perf_counter()
exec_net = ie.load_network(network=net, device_name=device_name)
end_time = time.perf_counter()
print(f"Loading the network to the {device_name} device took {end_time-start_time:.2f} seconds.")
else:
print("Model caching is not available on CPU devices.")
Model caching is not available on CPU devices.
After running the previous cell, we know the model exists in the cache directory. We delete exec net and load it again, and measure the time it takes now.
if device_name in ie.available_devices and device_name != "CPU":
del exec_net
start_time = time.perf_counter()
exec_net = ie.load_network(network=net, device_name=device_name)
end_time = time.perf_counter()
print(f"Loading the network to the {device_name} device took {end_time-start_time:.2f} seconds.")