Custom Operations Guide

The Intel® Distribution of OpenVINO™ toolkit supports neural network models trained with multiple frameworks including TensorFlow*, Caffe*, MXNet*, Kaldi* and ONNX* file format. The list of supported operations (layers) is different for each of the supported frameworks. To see the operations supported by your framework, refer to Supported Framework Layers.

Custom operations are operations that are not included in the list of known operations. If your model contains any operation that is not in the list of known operations, the Model Optimizer is not able to generate an Intermediate Representation (IR) for this model.

This guide illustrates the workflow for running inference on topologies featuring custom operations, allowing you to plug in your own implementation for existing or completely new operation.

NOTE: Layer — The legacy term for an operation which came from Caffe* framework. Currently it is not used. Refer to the Deep Learning Network Intermediate Representation and Operation Sets in OpenVINO™ for more information on the topic.

Terms Used in This Guide

  • Intermediate Representation (IR) — Neural Network used only by the Inference Engine in OpenVINO abstracting the different frameworks and describing the model topology, operations parameters and weights.
  • Operation — The abstract concept of a math function that is selected for a specific purpose. Operations supported by OpenVINO™ are listed in the supported operation set provided in the Available Operations Sets. Examples of the operations are: ReLU, Convolution, Add, etc.
  • Kernel — The implementation of a operation function in the OpenVINO™ plugin, in this case, the math programmed (in C++ and OpenCL) to perform the operation for a target hardware (CPU or GPU).
  • Inference Engine Extension — Device-specific module implementing custom operations (a set of kernels).

Custom Operation Support Overview

There are three steps to support inference of a model with custom operation(s):

  1. Add support for a custom operation in the Model Optimizer so the Model Optimizer can generate the IR with the operation.
  2. Create an operation set and implement a custom nGraph operation in it as described in the Custom nGraph Operation.
  3. Implement a customer operation in one of the Inference Engine plugins to support inference of this operation using a particular target hardware (CPU, GPU or VPU).

To see the operations that are supported by each device plugin for the Inference Engine, refer to the Supported Devices.

NOTE: If a device doesn't support a particular operation, an alternative to creating a new operation is to target an additional device using the HETERO plugin. The Heterogeneous Plugin may be used to run an inference model on multiple devices allowing the unsupported operations on one device to "fallback" to run on another device (e.g., CPU) that does support those operations.

Custom Operation Support for the Model Optimizer

Model Optimizer model conversion pipeline is described in details in "Model Conversion Pipeline" section on the Model Optimizer Extensibility. It is recommended to read that article first for a better understanding of the following material.

Model Optimizer provides extensions mechanism to support new operations and implement custom model transformations to generate optimized IR. This mechanism is described in the "Model Optimizer Extensions" section on the Model Optimizer Extensibility.

Two types of the Model Optimizer extensions should be implemented to support custom operation at minimum:

  1. Operation class for a new operation. This class stores information about the operation, its attributes, shape inference function, attributes to be saved to an IR and some others internally used attributes. Refer to the "Model Optimizer Operation" section on the Model Optimizer Extensibility for the detailed instruction on how to implement it.
  2. Operation attributes extractor. The extractor is responsible for parsing framework-specific representation of the operation and uses corresponding operation class to update graph node attributes with necessary attributes of the operation. Refer to the "Operation Extractor" section on the Model Optimizer Extensibility for the detailed instruction on how to implement it.

NOTE: In some cases you may need to implement some transformation to support the operation. This topic is covered in the "Graph Transformation Extensions" section on the Model Optimizer Extensibility.

Custom Operations Extensions for the Inference Engine

Inference Engine provides extensions mechanism to support new operations. This mechanism is described in the Inference Engine Extensibility Mechanism.

Each device plugin includes a library of optimized implementations to execute known operations which must be extended to execute a custom operation. The custom operation extension is implemented according to the target device:

  • Custom Operation CPU Extension
  • Custom Operation GPU Extension
    • OpenCL source code (.cl) for the custom operation kernel that will be compiled to execute on the GPU along with a operation description file (.xml) needed by the GPU Plugin for the custom operation kernel. Refer to the How to Implement Custom GPU Operations for more details.
  • Custom Operation VPU Extension
    • OpenCL source code (.cl) for the custom operation kernel that will be compiled to execute on the VPU along with a operation description file (.xml) needed by the VPU Plugin for the custom operation kernel. Refer to the How to Implement Custom Operations for VPU for more details.

Also, it is necessary to implement nGraph custom operation according to the Custom nGraph Operation so the Inference Engine can read an IR with this operation and correctly infer output tensors shape and type.

Enabling Magnetic Resonance Image Reconstruction Model

This chapter provides a step-by-step instruction on how to enable the magnetic resonance image reconstruction model implemented in the repository using a custom operation on CPU. The example is prepared for a model generated from the repository with hash 2ede2f96161ce70dcdc922371fe6b6b254aafcc8.

Download and Convert the Model to a Frozen TensorFlow* Model Format

The original pre-trained model is provided in the hdf5 format which is not supported by OpenVINO directly and needs to be converted to TensorFlow* frozen model format first.

  1. Download repository<br
    git clone
    git checkout 2ede2f96161ce70dcdc922371fe6b6b254aafcc8
  2. Convert pre-trained .hdf5 to a frozen .pb graph using the following script (tested with TensorFlow==1.15.0 and Keras==2.2.4) which should be executed from the root of the cloned repository:
    import keras as K
    import numpy as np
    import Modules.frequency_spatial_network as fsnet
    import tensorflow as tf
    under_rate = '20'
    stats = np.load("Data/stats_fs_unet_norm_" + under_rate + ".npy")
    var_sampling_mask = np.load("Data/sampling_mask_" + under_rate + "perc.npy")
    model = fsnet.wnet(stats[0], stats[1], stats[2], stats[3], kshape = (5,5), kshape2=(3,3))
    model_name = "Models/wnet_" + under_rate + ".hdf5"
    inp = np.random.standard_normal([1, 256, 256, 2]).astype(np.float32)'inp', inp)
    sess = K.backend.get_session()
    graph_def = sess.graph.as_graph_def()
    graph_def = tf.graph_util.convert_variables_to_constants(sess, graph_def, ['conv2d_44/BiasAdd'])
    with tf.gfile.FastGFile('wnet_20.pb', 'wb') as f:

As a result the TensorFlow* frozen model file "wnet_20.pb" is generated.

Convert the Frozen TensorFlow* Model to Intermediate Representation

Firstly, open the model in the TensorBoard or other TensorFlow* model visualization tool. The model supports dynamic batch dimension because the value for the batch dimension is not hardcoded in the model. Model Optimizer need to set all dynamic dimensions to some specific value to create the IR, therefore specify the command line parameter -b 1 to set the batch dimension equal to 1. The actual batch size dimension can be changed at runtime using the Inference Engine API described in the Using Shape Inference. Also refer to Converting a Model Using General Conversion Parameters and Convert Your TensorFlow* Model for more details and command line parameters used for the model conversion.

./<MO_INSTALL_DIR>/ --input_model <PATH_TO_MODEL>/wnet_20.pb -b 1

Model Optimizer produces the following error:

[ ERROR ] List of operations that cannot be converted to Inference Engine IR:
[ ERROR ] Complex (1)
[ ERROR ] lambda_2/Complex
[ ERROR ] IFFT2D (1)
[ ERROR ] lambda_2/IFFT2D
[ ERROR ] ComplexAbs (1)
[ ERROR ] lambda_2/Abs
[ ERROR ] Part of the nodes was not converted to IR. Stopped.

The error means that the Model Optimizer doesn't know how to handle 3 types of TensorFlow* operations: "Complex", "IFFT2D" and "ComplexAbs". In order to see more details about the conversion process run the model conversion with additional parameter --log_level DEBUG. It is worth to mention the following lines from the detailed output:

[ INFO ] Called "tf_native_tf_node_infer" for node "lambda_2/Complex"
[ <TIMESTAMP> ] [ DEBUG ] [ tf:228 ] Added placeholder with name 'lambda_2/lambda_3/strided_slice_port_0_ie_placeholder'
[ <TIMESTAMP> ] [ DEBUG ] [ tf:228 ] Added placeholder with name 'lambda_2/lambda_4/strided_slice_port_0_ie_placeholder'
[ <TIMESTAMP> ] [ DEBUG ] [ tf:241 ] update_input_in_pbs: replace input 'lambda_2/lambda_3/strided_slice' with input 'lambda_2/lambda_3/strided_slice_port_0_ie_placeholder'
[ <TIMESTAMP> ] [ DEBUG ] [ tf:249 ] Replacing input '0' of the node 'lambda_2/Complex' with placeholder 'lambda_2/lambda_3/strided_slice_port_0_ie_placeholder'
[ <TIMESTAMP> ] [ DEBUG ] [ tf:241 ] update_input_in_pbs: replace input 'lambda_2/lambda_4/strided_slice' with input 'lambda_2/lambda_4/strided_slice_port_0_ie_placeholder'
[ <TIMESTAMP> ] [ DEBUG ] [ tf:249 ] Replacing input '1' of the node 'lambda_2/Complex' with placeholder 'lambda_2/lambda_4/strided_slice_port_0_ie_placeholder'
[ <TIMESTAMP> ] [ DEBUG ] [ tf:148 ] Inferred shape of the output tensor with index '0' of the node 'lambda_2/Complex': '[ 1 256 256]'
[ <TIMESTAMP> ] [ DEBUG ] [ infer:145 ] Outputs:
[ <TIMESTAMP> ] [ DEBUG ] [ infer:32 ] output[0]: shape = [ 1 256 256], value = <UNKNOWN>
[ <TIMESTAMP> ] [ DEBUG ] [ infer:129 ] --------------------
[ <TIMESTAMP> ] [ DEBUG ] [ infer:130 ] Partial infer for lambda_2/IFFT2D
[ <TIMESTAMP> ] [ DEBUG ] [ infer:131 ] Op: IFFT2D
[ <TIMESTAMP> ] [ DEBUG ] [ infer:132 ] Inputs:
[ <TIMESTAMP> ] [ DEBUG ] [ infer:32 ] input[0]: shape = [ 1 256 256], value = <UNKNOWN>

This is a part of the log of the partial inference phase of the model conversion. See the "Partial Inference" section on the Model Optimizer Extensibility for more information about this phase. Model Optimizer inferred output shape for the unknown operation of type "Complex" using a "fallback" to TensorFlow*. However, it is not enough to generate the IR because Model Optimizer doesn't know which attributes of the operation should be saved to IR. So it is necessary to implement Model Optimizer extensions to support these operations.

Before going into the extension development it is necessary to understand what these unsupported operations do according to the TensorFlow* framework specification.

  • "Complex" - returns a tensor of complex type constructed from two real input tensors specifying real and imaginary part of a complex number.
  • "IFFT2D" - returns a tensor with inverse 2-dimensional discrete Fourier transform over the inner-most 2 dimensions of an input.
  • "ComplexAbs" - returns a tensor with absolute values of input tensor with complex numbers.

The part of the model with all three unsupported operations is depicted below:

This model uses complex numbers during the inference but Inference Engine does not support tensors of this data type. So it is necessary to find a way how to avoid using tensors of such a type in the model. Fortunately, the complex tensor appear as a result of "Complex" operation, is used as input in the "IFFT2D" operation then is passed to "ComplexAbs" which produces real value tensor as output. So there are just 3 operations consuming/producing complex tensors in the model.

Let's design an OpenVINO operation "FFT" which get a single real number tensor describing the complex number and produces a single real number tensor describing output complex tensor. This way the fact that the model uses complex numbers is hidden inside the "FFT" operation implementation. The operation gets a tensor of shape [N, H, W, 2] and produces the output tensor with the same shape, where the innermost dimension contains pairs of real numbers describing the complex number (its real and imaginary part). As we will see further this operation will allow us to support the model. The implementation of the Model Optimizer operation should be saved to mo_extensions/ops/ file:

from mo.front.common.partial_infer.elemental import copy_shape_infer
from mo.graph.graph import Node, Graph
from mo.ops.op import Op
class FFT(Op):
op = 'FFT'
enabled = False
def __init__(self, graph: Graph, attrs: dict):
super().__init__(graph, {
'type': self.op,
'op': self.op,
'version': 'fft_extension',
'inverse': None,
'in_ports_count': 1,
'out_ports_count': 1,
'infer': copy_shape_infer
}, attrs)
def backend_attrs(self):
return ['inverse']

The attribute inverse is a flag specifying type of the FFT to apply: forward or inverse.

See the "Model Optimizer Operation" section on the Model Optimizer Extensibility for the detailed instruction on how to implement the operation.

Now it is necessary to implement extractor for the "IFFT2D" operation according to the "Operation Extractor" section on the Model Optimizer Extensibility. The following snippet provides two extractors: one for "IFFT2D", another one for "FFT2D", however only on of them is used in this example. The implementation should be saved to the file mo_extensions/front/tf/

from ...ops.FFT import FFT
from mo.front.extractor import FrontExtractorOp
from mo.utils.error import Error
class FFT2DFrontExtractor(FrontExtractorOp):
op = 'FFT2D'
enabled = True
def extract(cls, node):
attrs = {
'inverse': 0
FFT.update_node_stat(node, attrs)
return cls.enabled
class IFFT2DFrontExtractor(FrontExtractorOp):
op = 'IFFT2D'
enabled = True
def extract(cls, node):
attrs = {
'inverse': 1
FFT.update_node_stat(node, attrs)
return cls.enabled

NOTE: The graph is in inconsistent state after extracting node attributes because according to original operation "IFFT2D" semantic it should have an input consuming a tensor of complex numbers, but the extractor instantiated an operation "FFT" which expects a real tensor with specific layout. But the inconsistency will be resolved during applying front phase transformations discussed below.

The output shape of the operation "AddV2" from the picture above is [N, H, W, 2]. Where the innermost dimension contains pairs of real numbers describing the complex number (its real and imaginary part). The following "StridedSlice" operations split the input tensor into 2 parts to get a tensor of real and a tensor of imaginary parts which are then consumed with the "Complex" operation to produce a tensor of complex numbers. These "StridedSlice" and "Complex" operations can be removed so the "FFT" operation will get a real value tensor encoding complex numbers. To achieve this we implement the front phase transformation which searches for a pattern of two "StridedSlice" operations with specific attributes producing data to "Complex" operation and removes it from the graph. Refer to the "Pattern-Defined Front Phase Transformations" section on the Model Optimizer Extensibility for more information on how this type of transformation works. The code snippet should be saved to the file mo_extensions/front/tf/

import logging as log
import numpy as np
from mo.front.common.replacement import FrontReplacementSubgraph
from mo.graph.graph import Graph
class Complex(FrontReplacementSubgraph):
enabled = True
def pattern(self):
return dict(
('strided_slice_real', dict(op='StridedSlice')),
('strided_slice_imag', dict(op='StridedSlice')),
('complex', dict(op='Complex')),
('strided_slice_real', 'complex', {'in': 0}),
('strided_slice_imag', 'complex', {'in': 1}),
def replace_sub_graph(graph: Graph, match: dict):
strided_slice_real = match['strided_slice_real']
strided_slice_imag = match['strided_slice_imag']
complex_node = match['complex']
# make sure that both strided slice operations get the same data as input
assert strided_slice_real.in_port(0).get_source() == strided_slice_imag.in_port(0).get_source()
# identify the output port of the operation producing datat for strided slice nodes
input_node_output_port = strided_slice_real.in_port(0).get_source()
# change the connection so now all consumers of "complex_node" get data from input node of strided slice nodes

NOTE: The graph is in inconsistent state because the "ComplexAbs" operation consumes complex value tensor but "FFT" produces real value tensor.

Now lets implement a transformation which replace a "ComplexAbs" operation with a sub-graph of primitive operations which calculate the result using the following formulae: \(module(z) = \sqrt{real(z) \cdot real(z) + imag(z) \cdot imag(z)}\). Original "IFFT2D" operation produces tensor of complex values, but the "FFT" operation produces a real value tensor with the same format and shape as the input for the operation. So the input shape for the "ComplexAbs" will be [N, H, W, 2] with the innermost dimension containing tuple with real and imaginary part of a complex number. In order to calculate absolute values for the complex tensor we do the following:

  1. Raise all elements in the power of 2.
  2. Calculate a reduced sum over the innermost dimension.
  3. Calculate a square root.

The implementation should be saved to the file mo_extensions/front/tf/ and provided below:

import numpy as np
from extensions.ops.elementwise import Pow
from extensions.ops.ReduceOps import ReduceSum
from mo.front.common.replacement import FrontReplacementOp
from mo.graph.graph import Graph, Node
from mo.ops.const import Const
class ComplexAbs(FrontReplacementOp):
op = "ComplexAbs"
enabled = True
def replace_op(self, graph: Graph, node: Node):
pow_2 = Const(graph, {'value': np.float32(2.0)}).create_node()
reduce_axis = Const(graph, {'value': np.int32(-1)}).create_node()
pow_0_5 = Const(graph, {'value': np.float32(0.5)}).create_node()
sq = Pow(graph, dict(name=node.in_node(0).name + '/sq', power=2.0)).create_node([node.in_node(0), pow_2])
sum = ReduceSum(graph, dict( + '/sum')).create_node([sq, reduce_axis])
sqrt = Pow(graph, dict( + '/sqrt', power=0.5)).create_node([sum, pow_0_5])
return []

Now it is possible to convert the model using the following command line:

./<MO_INSTALL_DIR>/ --input_model <PATH_TO_MODEL>/wnet_20.pb -b 1 --extensions mo_extensions/

The sub-graph corresponding to the originally non-supported one is depicted on the image below:

NOTE: Model Optimizer performed conversion of the model from NHWC to NCHW layout that is why the dimension with the value 2 moved to another position.

Inference Engine Extension Implementation

Now it is necessary to implement the extension for the CPU plugin with operation "FFT" introduced previously. The code below is based on the template extension described on the Inference Engine Extensibility Mechanism.

CMake Build File

The first step is to create a CMake configuration file which builds the extension. The content of the "CMakeLists.txt" file is the following:

find_package(ngraph REQUIRED OPTIONAL_COMPONENTS onnx_importer)
find_package(InferenceEngine REQUIRED)
find_package(OpenCV REQUIRED COMPONENTS core)
set(TARGET_NAME fft_cpu_extension)
add_library(${TARGET_NAME} SHARED ${SRC})
target_link_libraries(${TARGET_NAME} PRIVATE ${InferenceEngine_LIBRARIES}

The CPU FFT kernel implementation uses OpenCV to perform the FFT that is why the extension library is linked with "opencv_core" which comes with the OpenVINO.

Custom nGraph Operation "FFT" Implementation

The next step is to create the nGraph operation FFT. The header file "fft_op.hpp" has the following content:

#pragma once
#include <ngraph/ngraph.hpp>
namespace FFTExtension {
class FFTOp : public ngraph::op::Op {
static constexpr ngraph::NodeTypeInfo type_info{"FFT", 0};
const ngraph::NodeTypeInfo& get_type_info() const override { return type_info; }
FFTOp() = default;
FFTOp(const ngraph::Output<ngraph::Node>& inp, bool inverse);
void validate_and_infer_types() override;
std::shared_ptr<ngraph::Node> clone_with_new_inputs(const ngraph::OutputVector& new_args) const override;
bool visit_attributes(ngraph::AttributeVisitor& visitor) override;
bool inverse;

The operation has just one boolean attribute inverse. Implementation of the necessary nGraph operation functions are in the "fft_op.cpp" file with the following content:

#include "fft_op.hpp"
using namespace FFTExtension;
constexpr ngraph::NodeTypeInfo FFTOp::type_info;
FFTOp::FFTOp(const ngraph::Output<ngraph::Node>& inp, bool _inverse) : Op({inp}) {
inverse = _inverse;
void FFTOp::validate_and_infer_types() {
auto outShape = get_input_partial_shape(0);
set_output_type(0, get_input_element_type(0), outShape);
std::shared_ptr<ngraph::Node> FFTOp::clone_with_new_inputs(const ngraph::OutputVector &new_args) const {
if (new_args.size() != 1) {
throw ngraph::ngraph_error("Incorrect number of new arguments");
return std::make_shared<FFTOp>(, inverse);
bool FFTOp::visit_attributes(ngraph::AttributeVisitor &visitor) {
visitor.on_attribute("inverse", inverse);
return true;

Refer to the Custom nGraph Operation for more details.

CPU FFT Kernel Implementation

The operation implementation for CPU plugin uses OpenCV to perform the FFT. The header file "fft_kernel.hpp" has the following content:

#pragma once
#include <ie_iextension.h>
#include <ngraph/ngraph.hpp>
namespace FFTExtension {
class FFTImpl : public InferenceEngine::ILayerExecImpl {
explicit FFTImpl(const std::shared_ptr<ngraph::Node>& node);
InferenceEngine::StatusCode getSupportedConfigurations(std::vector<InferenceEngine::LayerConfig> &conf,
InferenceEngine::ResponseDesc *resp) noexcept override;
InferenceEngine::ResponseDesc *resp) noexcept override;
InferenceEngine::StatusCode execute(std::vector<InferenceEngine::Blob::Ptr> &inputs,
std::vector<InferenceEngine::Blob::Ptr> &outputs,
InferenceEngine::ResponseDesc *resp) noexcept override;
ngraph::Shape inpShape;
ngraph::Shape outShape;
bool inverse;
std::string error;

The "fft_kernel.cpp" with the implementation of the CPU has the following content:

#include "fft_kernel.hpp"
#include "fft_op.hpp"
#include <ie_layouts.h>
#include <opencv2/opencv.hpp>
using namespace FFTExtension;
FFTImpl::FFTImpl(const std::shared_ptr<ngraph::Node> &node) {
auto castedNode = std::dynamic_pointer_cast<FFTOp>(node);
if (!castedNode)
THROW_IE_EXCEPTION << "Cannot create implementation for unknown operation!";
if (castedNode->inputs().size() != 1 || castedNode->outputs().size() != 1)
THROW_IE_EXCEPTION << "Cannot create implementation for operation with incorrect number of inputs or outputs!";
if (castedNode->get_input_partial_shape(0).is_dynamic() || castedNode->get_output_partial_shape(0).is_dynamic())
THROW_IE_EXCEPTION << "Cannot create implementation for op with dynamic shapes!";
if (castedNode->get_input_element_type(0) != ngraph::element::f32 || castedNode->get_output_element_type(0) != ngraph::element::f32)
THROW_IE_EXCEPTION << "Operation supports only FP32 tensors.";
inpShape = castedNode->get_input_shape(0);
outShape = castedNode->get_output_shape(0);
inverse = castedNode->inverse;
InferenceEngine::StatusCode FFTImpl::getSupportedConfigurations(std::vector<InferenceEngine::LayerConfig> &conf,
std::vector<InferenceEngine::DataConfig> inDataConfig;
std::vector<InferenceEngine::DataConfig> outDataConfig;
InferenceEngine::SizeVector order(inpShape.size());
std::iota(order.begin(), order.end(), 0);
// Allow any offset before data
size_t offset((std::numeric_limits<size_t>::max)());
// Input shape
inpConf.desc = InferenceEngine::TensorDesc(InferenceEngine::Precision::FP32, inpShape, {inpShape, order, offset});
// Output shape
outConf.desc = InferenceEngine::TensorDesc(InferenceEngine::Precision::FP32, outShape, {outShape, order, offset});
layerConfig.inConfs = inDataConfig;
layerConfig.outConfs = outDataConfig;
return InferenceEngine::StatusCode::OK;
try {
if (config.inConfs.size() != 1 || config.outConfs.size() != 1) {
THROW_IE_EXCEPTION << "Operation cannot be initialized with incorrect number of inputs/outputs!";
if (config.outConfs[0].desc.getPrecision() != InferenceEngine::Precision::FP32 ||
config.inConfs[0].desc.getPrecision() != InferenceEngine::Precision::FP32) {
THROW_IE_EXCEPTION << "Operation supports only FP32 precisions!";
} catch (InferenceEngine::details::InferenceEngineException& ex) {
if (resp) {
strncpy(resp->msg, error.c_str(), sizeof(resp->msg) - 1);
resp->msg[sizeof(resp->msg)-1] = 0;
return InferenceEngine::GENERAL_ERROR;
return InferenceEngine::OK;
static cv::Mat infEngineBlobToMat(const InferenceEngine::Blob::Ptr& blob)
// NOTE: Inference Engine sizes are reversed.
std::vector<size_t> dims = blob->getTensorDesc().getDims();
std::vector<int> size(dims.begin(), dims.end());
auto precision = blob->getTensorDesc().getPrecision();
CV_Assert(precision == InferenceEngine::Precision::FP32);
return cv::Mat(size, CV_32F, (void*)blob->buffer());
InferenceEngine::StatusCode FFTImpl::execute(std::vector<InferenceEngine::Blob::Ptr> &inputs,
std::vector<InferenceEngine::Blob::Ptr> &outputs,
cv::Mat inp = infEngineBlobToMat(inputs[0]);
cv::Mat out = infEngineBlobToMat(outputs[0]);
const int n = inp.size[0];
const int h = inp.size[2];
const int w = inp.size[3];
cv::Mat complex(h, w, CV_32FC2), interleavedOut(h, w, CV_32FC2);
for (int i = 0; i < n; ++i) {
std::vector<cv::Mat> components = {
cv::Mat(h, w, CV_32F, inp.ptr<float>(i, 0)),
cv::Mat(h, w, CV_32F, inp.ptr<float>(i, 1))
cv::merge(components, complex);
if (!inverse)
cv::dft(complex, interleavedOut);
cv::idft(complex, interleavedOut, cv::DFT_SCALE);
components = {
cv::Mat(h, w, CV_32F, out.ptr<float>(i, 0)),
cv::Mat(h, w, CV_32F, out.ptr<float>(i, 1))
cv::split(interleavedOut, components);
return InferenceEngine::OK;

Refer to the How to Implement Custom CPU Operations for more details.

Extension Implementation

The source code of the extension itself contains the "extension.hpp" and "extension.cpp" files.


#pragma once
#include <ie_iextension.h>
#include <ie_api.h>
#include <ngraph/ngraph.hpp>
#include <memory>
#include <vector>
#include <string>
#include <map>
namespace FFTExtension {
class Extension : public InferenceEngine::IExtension {
Extension() = default;
void GetVersion(const InferenceEngine::Version*& versionInfo) const noexcept override;
void Unload() noexcept override {}
void Release() noexcept override { delete this; }
std::map<std::string, ngraph::OpSet> getOpSets() override;
std::vector<std::string> getImplTypes(const std::shared_ptr<ngraph::Node>& node) override;
InferenceEngine::ILayerImpl::Ptr getImplementation(const std::shared_ptr<ngraph::Node>& node, const std::string& implType) override;


#include "extension.hpp"
#include "fft_kernel.hpp"
#include "fft_op.hpp"
#include <ngraph/factory.hpp>
#include <ngraph/opsets/opset.hpp>
#include <map>
#include <memory>
#include <string>
#include <unordered_map>
#include <vector>
using namespace FFTExtension;
void Extension::GetVersion(const InferenceEngine::Version *&versionInfo) const noexcept {
static InferenceEngine::Version ExtensionDescription = {
{1, 0}, // extension API version
"The CPU plugin extension with FFT operation" // extension description message
versionInfo = &ExtensionDescription;
std::map<std::string, ngraph::OpSet> Extension::getOpSets() {
std::map<std::string, ngraph::OpSet> opsets;
ngraph::OpSet opset;
opsets["fft_extension"] = opset;
return opsets;
std::vector<std::string> Extension::getImplTypes(const std::shared_ptr<ngraph::Node> &node) {
if (std::dynamic_pointer_cast<FFTOp>(node)) {
return {"CPU"};
return {};
InferenceEngine::ILayerImpl::Ptr Extension::getImplementation(const std::shared_ptr<ngraph::Node> &node, const std::string &implType) {
if (std::dynamic_pointer_cast<FFTOp>(node) && implType == "CPU") {
return std::make_shared<FFTImpl>(node);
return nullptr;
try {
ext = new Extension();
return OK;
} catch (std::exception &ex) {
if (resp) {
std::string err = ((std::string) "Couldn't create extension: ") + ex.what();
err.copy(resp->msg, 255);
return InferenceEngine::GENERAL_ERROR;

Building and Running the Custom Extension

In order to build the extension run the following:

mkdir build && cd build
source /opt/intel/openvino/bin/
cmake .. -DCMAKE_BUILD_TYPE=Release
make --jobs=$(nproc)

The result of this command is a compiled shared library (.so, .dylib or .dll). It should be loaded in the application using Core class instance method AddExtension like this core.AddExtension(make_so_pointer<IExtension>(compiled_library_file_name), "CPU");.

To test that the extension is implemented correctly we can run the Benchmark App the following way:

python3 $INTEL_OPENVINO_DIR/deployment_tools/tools/benchmark_tool/ \
-m <PATH_TO_IR>/wnet_20.xml \
-d CPU

Additional Resources

Converting Models:

std::vector< DataConfig > outConfs
Vector of output data configs.
Definition: ie_iextension.h:80
The macro defines a symbol import/export mechanism essential for Microsoft Windows(R) OS.
This structure describes data configuration.
Definition: ie_iextension.h:48
StatusCode CreateExtension(IExtension *&ext, ResponseDesc *resp) noexcept
Creates the default instance of the extension.
Represents detailed information for an error.
Definition: ie_common.h:245
Represents version information that describes plugins and the inference engine runtime library.
Definition: ie_version.hpp:21
This class defines Tensor description.
Definition: ie_layouts.h:158
@ FP32
Definition: ie_precision.hpp:29
This enum contains codes for all possible return values of the interface functions.
Definition: ie_common.h:222
std::vector< DataConfig > inConfs
Vector of input data configs.
Definition: ie_iextension.h:76
TensorDesc desc
Format of memory descriptor.
Definition: ie_iextension.h:52
This structure describes Layer configuration.
Definition: ie_iextension.h:68
This class is the main extension interface.
Definition: ie_iextension.h:149
A header file for the main Inference Engine exception.
std::shared_ptr< ILayerImpl > Ptr
A shared pointer to the ILayerImpl interface.
Definition: ie_iextension.h:92
std::vector< size_t > SizeVector
Represents tensor size.
Definition: ie_common.h:27
This is a header file for Inference Engine Extension Interface.
std::shared_ptr< Blob > Ptr
A smart pointer containing Blob object.
Definition: ie_blob.h:43
A header file for data layouts and conversion between them.
A macro used to throw the exception with a notable description.
Definition: ie_exception.hpp:25
This class provides interface for the implementation with the custom execution code.
Definition: ie_iextension.h:104