Post-Training Optimization Toolkit API


This document describes Python* API of the Post-Training Optimization Toolkit (POT) that allows implementing a custom optimization pipeline for a single or cascaded/composite DL model (set of joint models). By the optimization pipeline, we mean the consecutive application of optimization algorithms to the model. The input for the optimization pipeline is a full-precision model, and the result is an optimized model. The optimization pipeline is configured to sequentially apply optimization algorithms in the order they are specified. The key requirement for applying the optimization algorithm is the availability of the calibration dataset for statistics collection and validation dataset for accuracy validation which in practice can be the same. The Python* POT API provides simple interfaces for implementing custom model inference with data loading and pre-processing on an arbitrary dataset and implementing custom accuracy metrics to make it possible to use optimization algorithms from the POT.

The Python* POT API provides Pipeline class for creating and configuring the optimization pipeline and applying it to the model. The Pipeline class depends on the implementation of the following model specific interfaces which should be implemented according to the custom DL model:

The pipeline with implemented model specific interfaces such as Engine, DataLoader and Metric we will call the custom optimization pipeline (see the picture below that shows relationships between classes).


Use Cases

The main and easiest way to get the optimized model is to use the Post-Training Optimization Command-line Tool where you just need to prepare the configuration file. Before diving into the Python* POT API, it is highly recommended to read Best Practices document where various scenarios of using the Post-Training Optimization Command-line Tool are described.

The POT Python* API for model optimization can be used in the following cases:

API Description

Below is a detailed explanation of POT Python* APIs which should be implemented in order to create a custom optimization pipeline.


class compression.api.DataLoader(config)

The base class for all DataLoaders.

DataLoader loads data from a dataset and applies pre-processing to them providing access to the pre-processed data by index.

All subclasses should override __len__() function, which should return the size of the dataset, and __getitem__(), which supports integer indexing in range of 0 to len(self)


class compression.api.Metric()

An abstract class representing an accuracy metric.

All subclasses should override the following properties:

All subclasses should override the following methods:


class compression.api.Engine(config, data_loader=None, metric=None)

Base class for all Engines.

The engine provides model inference, statistics collection for activations and calculation of accuracy metrics for a dataset.


All subclasses should override the following methods:

Helpers and Internal Model Representation

In order to simplify implementation of optimization pipelines we provide a set of ready-to-use helpers. Here we also describe internal representation of the DL model and how to work with it.


class compression.engines.IEEngine(config, data_loader=None, metric=None)

IEEngine is a helper which implements Engine class based on OpenVINO™ Inference Engine Python* API. This class support inference in synchronous and asynchronous modes and can be reused as-is in the custom pipeline or with some modifications, e.g. in case of custom post-processing of inference results.

The following methods can be overridden in subclasses:

IEEngine supports data returned by DataLoader in the format:

(img_id, img_annotation), image)


((img_id, img_annotation), image, image_metadata)

Metric values returned by a Metric instance are expected to be in the format:

In order to implement a custom Engine class you may need to get familiar with the following interfaces:


The Python* POT API provides the NXModelWrapper class as one interface for working with single and cascaded DL model. It is used to load, save and access the model, in case of the cascaded model, access each model of the cascaded model.

class compression.graph.nx_wrapper.NXModelWrapper(**kwargs)

The NXModelWrapper class provides a representation of the DL model. A single model and cascaded model can be represented as an instance of this class. The cascaded model is stored as a list of models.


Loading model from IR

The Python* POT API provides the utility function to load model from the OpenVINO™ Intermediate Representation (IR):




Saving model to IR

The Python* POT API provides the utility function to save model in the OpenVINO™ Intermediate Representation (IR):

compression.graph.model_utils.save_model(model, save_path, model_name=None, for_stat_collection=False)




class compression.samplers.Sampler(data_loader=None, batch_size=1, subset_indices=None)

Base class for all Samplers.

Sampler provides a way to iterate over the dataset.

All subclasses overwrite __iter__() method, providing a way to iterate over the dataset, and a __len__() method that returns the length of the returned iterators.



class compression.samplers.batch_sampler.BatchSampler(data_loader, batch_size=1, subset_indices=None):

Sampler provides an iterable over the dataset subset if subset_indices is specified or over the whole dataset with given batch_size. Returns a list of data items.


class compression.pipline.pipeline.Pipeline(engine)

Pipeline class represents the optimization pipeline.


The pipeline can be applied to the DL model by calling run(model) method where model is the NXModelWrapper instance.

Create a pipeline

The POT Python* API provides the utility function to create and configure the pipeline:

compression.pipline.initializer.create_pipeline(algo_config, engine)



Usage Example

Before running the optimization tool it's highly recommended to make sure that

As was described above, DataLoader, Metric and Engine interfaces should be implemented in order to create the custom optimization pipeline for your model. There might be a case you have the Python* validation script for your model using the OpenVINO™ Inference Engine, which in practice includes loading a dataset, model inference, and calculating the accuracy metric. So you just need to wrap the existing functions of your validation script in DataLoader, Metric and Engine interfaces. In another case, you need to implement interfaces from scratch.

For facilitation of using Python* POT API, we implemented IEEngine class providing the model inference of the most models from the Vision Domain which can be reused for an arbitrary model.

After YourDataLoader, YourMetric, YourEngine interfaces are implemented, the custom optimization pipeline can be created and applied to the model as follows:

# Step 1: Load the model.
model_config = {
'model_name': 'your_model',
'model': <PATH_TO_MODEL>/your_model.xml,
'weights': <PATH_TO_WEIGHTS/your_model.bin>
model = load_model(model_config)
# Step 2: Initialize the data loader.
dataset_config = {} # dictionary with the dataset parameters
data_loader = YourDataLoader(dataset_config)
# Step 3 (Optional. Required for AccuracyAwareQuantization): Initialize the metric.
metric = YourMetric()
# Step 4: Initialize the engine for metric calculation and statistics collection.
engine_config = {} # dictionary with the engine parameters
engine = YourEngine(engine_config, data_loader, metric)
# Step 5: Create a pipeline of compression algorithms.
pipeline = create_pipeline(algorithms, engine)
# Step 6: Execute the pipeline.
compressed_model =
# Step 7: Save the compressed model.
save_model(compressed_model, "path_to_save_model")

For in-depth examples of using Python* POT API, browse the samples included into the OpenVINO™ toolkit installation and available in the <INSTALL_DIR>/deployment_tools/tools/post_training_optimization_toolkit/sample directory.