Custom Operations Guide

The Intel® Distribution of OpenVINO™ toolkit supports neural network models trained with multiple frameworks including TensorFlow*, Caffe*, MXNet*, Kaldi* and ONNX* file format. The list of supported operations (layers) is different for each of the supported frameworks. To see the operations supported by your framework, refer to Supported Framework Layers.

Custom operations are operations that are not included in the list of known operations. If your model contains any operation that is not in the list of known operations, the Model Optimizer is not able to generate an Intermediate Representation (IR) for this model.

This guide illustrates the workflow for running inference on topologies featuring custom operations, allowing you to plug in your own implementation for existing or completely new operation.

NOTE: Layer — The legacy term for an operation which came from Caffe* framework. Currently it is not used. Refer to the Deep Learning Network Intermediate Representation and Operation Sets in OpenVINO™ for more information on the topic.

## Terms Used in This Guide

• Intermediate Representation (IR) — Neural Network used only by the Inference Engine in OpenVINO abstracting the different frameworks and describing the model topology, operations parameters and weights.
• Operation — The abstract concept of a math function that is selected for a specific purpose. Operations supported by OpenVINO™ are listed in the supported operation set provided in the Available Operations Sets. Examples of the operations are: ReLU, Convolution, Add, etc.
• Kernel — The implementation of a operation function in the OpenVINO™ plugin, in this case, the math programmed (in C++ and OpenCL) to perform the operation for a target hardware (CPU or GPU).
• Inference Engine Extension — Device-specific module implementing custom operations (a set of kernels).

## Custom Operation Support Overview

There are three steps to support inference of a model with custom operation(s):

1. Add support for a custom operation in the Model Optimizer so the Model Optimizer can generate the IR with the operation.
2. Create an operation set and implement a custom nGraph operation in it as described in the Custom nGraph Operation.
3. Implement a customer operation in one of the Inference Engine plugins to support inference of this operation using a particular target hardware (CPU, GPU or VPU).

To see the operations that are supported by each device plugin for the Inference Engine, refer to the Supported Devices.

NOTE: If a device doesn't support a particular operation, an alternative to creating a new operation is to target an additional device using the HETERO plugin. The Heterogeneous Plugin may be used to run an inference model on multiple devices allowing the unsupported operations on one device to "fallback" to run on another device (e.g., CPU) that does support those operations.

### Custom Operation Support for the Model Optimizer

Model Optimizer model conversion pipeline is described in detail in "Model Conversion Pipeline" section of Model Optimizer Extensibility. It is best to read that article first for a better understanding of the following material.

Model Optimizer provides an extensions mechanism to support new operations and implement custom model transformations to generate optimized IR. This mechanism is described in the "Model Optimizer Extensions" section of Model Optimizer Extensibility.

Two types of the Model Optimizer extensions should be implemented to support custom operations, at a minimum:

1. Operation class for a new operation. This class stores information about the operation, its attributes, shape inference function, attributes to be saved to an IR and some others internally used attributes. Refer to the "Model Optimizer Operation" section of Model Optimizer Extensibility for detailed instructions on how to implement it.
2. Operation attributes extractor. The extractor is responsible for parsing framework-specific representation of the operation and uses corresponding operation class to update graph node attributes with necessary attributes of the operation. Refer to the "Operation Extractor" section of Model Optimizer Extensibility for detailed instructions on how to implement it.

NOTE: In some cases you may need to implement some transformation to support the operation. This topic is covered in the "Graph Transformation Extensions" section of Model Optimizer Extensibility.

## Custom Operations Extensions for the Inference Engine

Inference Engine provides extensions mechanism to support new operations. This mechanism is described in Inference Engine Extensibility Mechanism.

Each device plugin includes a library of optimized implementations to execute known operations which must be extended to execute a custom operation. The custom operation extension is implemented according to the target device:

• Custom Operation CPU Extension
• A compiled shared library (.so or .dll) needed by the CPU Plugin for executing the custom operation on a CPU. Refer to the How to Implement Custom CPU Operations for more details.
• Custom Operation GPU Extension
• OpenCL source code (.cl) for the custom operation kernel that will be compiled to execute on the GPU along with an operation description file (.xml) needed by the GPU Plugin for the custom operation kernel. Refer to the How to Implement Custom GPU Operations for more details.
• Custom Operation VPU Extension
• OpenCL source code (.cl) for the custom operation kernel that will be compiled to execute on the VPU along with an operation description file (.xml) needed by the VPU Plugin for the custom operation kernel. Refer to How to Implement Custom Operations for VPU for more details.

Also, it is necessary to implement nGraph custom operation according to Custom nGraph Operation so the Inference Engine can read an IR with this operation and correctly infer output tensor shape and type.

## Enabling Magnetic Resonance Image Reconstruction Model

This chapter provides step-by-step instructions on how to enable the magnetic resonance image reconstruction model implemented in the repository using a custom operation on CPU. The example is prepared for a model generated from the repository with hash 2ede2f96161ce70dcdc922371fe6b6b254aafcc8.

### Download and Convert the Model to a Frozen TensorFlow* Model Format

The original pre-trained model is provided in the hdf5 format which is not supported by OpenVINO directly and needs to be converted to TensorFlow* frozen model format first.

1. Download repository https://github.com/rmsouza01/Hybrid-CS-Model-MRI:
git clone https://github.com/rmsouza01/Hybrid-CS-Model-MRI
git checkout 2ede2f96161ce70dcdc922371fe6b6b254aafcc8
2. Convert pre-trained .hdf5 to a frozen .pb graph using the following script (tested with TensorFlow==1.15.0 and Keras==2.2.4) which should be executed from the root of the cloned repository:
import keras as K
import numpy as np
import Modules.frequency_spatial_network as fsnet
import tensorflow as tf
under_rate = '20'
stats = np.load("Data/stats_fs_unet_norm_" + under_rate + ".npy")
model = fsnet.wnet(stats[0], stats[1], stats[2], stats[3], kshape = (5,5), kshape2=(3,3))
model_name = "Models/wnet_" + under_rate + ".hdf5"
inp = np.random.standard_normal([1, 256, 256, 2]).astype(np.float32)
np.save('inp', inp)
sess = K.backend.get_session()
sess.as_default()
graph_def = sess.graph.as_graph_def()
with tf.gfile.FastGFile('wnet_20.pb', 'wb') as f:
f.write(graph_def.SerializeToString())

As a result the TensorFlow* frozen model file "wnet_20.pb" is generated.

### Convert the Frozen TensorFlow* Model to Intermediate Representation

Firstly, open the model in the TensorBoard or other TensorFlow* model visualization tool. The model supports dynamic batch dimension because the value for the batch dimension is not hardcoded in the model. Model Optimizer need to set all dynamic dimensions to some specific value to create the IR, therefore specify the command line parameter -b 1 to set the batch dimension equal to 1. The actual batch size dimension can be changed at runtime using the Inference Engine API described in the Using Shape Inference. Also refer to Converting a Model Using General Conversion Parameters and Convert Your TensorFlow* Model for more details and command line parameters used for the model conversion.

./<MO_INSTALL_DIR>/mo.py --input_model <PATH_TO_MODEL>/wnet_20.pb -b 1

NOTE: This conversion guide is applicable for the 2021.3 release of OpenVINO and that starting from 2021.4 the OpenVINO supports this model out of the box.

Model Optimizer produces the following error:

[ ERROR ] List of operations that cannot be converted to Inference Engine IR:
[ ERROR ] Complex (1)
[ ERROR ] lambda_2/Complex
[ ERROR ] IFFT2D (1)
[ ERROR ] lambda_2/IFFT2D
[ ERROR ] ComplexAbs (1)
[ ERROR ] lambda_2/Abs
[ ERROR ] Part of the nodes was not converted to IR. Stopped.

The error means that the Model Optimizer doesn't know how to handle 3 types of TensorFlow* operations: "Complex", "IFFT2D" and "ComplexAbs". In order to see more details about the conversion process run the model conversion with additional parameter --log_level DEBUG. It is worth to mention the following lines from the detailed output:

[ INFO ] Called "tf_native_tf_node_infer" for node "lambda_2/Complex"
[ <TIMESTAMP> ] [ DEBUG ] [ tf:228 ] Added placeholder with name 'lambda_2/lambda_3/strided_slice_port_0_ie_placeholder'
[ <TIMESTAMP> ] [ DEBUG ] [ tf:228 ] Added placeholder with name 'lambda_2/lambda_4/strided_slice_port_0_ie_placeholder'
[ <TIMESTAMP> ] [ DEBUG ] [ tf:241 ] update_input_in_pbs: replace input 'lambda_2/lambda_3/strided_slice' with input 'lambda_2/lambda_3/strided_slice_port_0_ie_placeholder'
[ <TIMESTAMP> ] [ DEBUG ] [ tf:249 ] Replacing input '0' of the node 'lambda_2/Complex' with placeholder 'lambda_2/lambda_3/strided_slice_port_0_ie_placeholder'
[ <TIMESTAMP> ] [ DEBUG ] [ tf:241 ] update_input_in_pbs: replace input 'lambda_2/lambda_4/strided_slice' with input 'lambda_2/lambda_4/strided_slice_port_0_ie_placeholder'
[ <TIMESTAMP> ] [ DEBUG ] [ tf:249 ] Replacing input '1' of the node 'lambda_2/Complex' with placeholder 'lambda_2/lambda_4/strided_slice_port_0_ie_placeholder'
[ <TIMESTAMP> ] [ DEBUG ] [ tf:148 ] Inferred shape of the output tensor with index '0' of the node 'lambda_2/Complex': '[ 1 256 256]'
[ <TIMESTAMP> ] [ DEBUG ] [ infer:145 ] Outputs:
[ <TIMESTAMP> ] [ DEBUG ] [ infer:32 ] output[0]: shape = [ 1 256 256], value = <UNKNOWN>
[ <TIMESTAMP> ] [ DEBUG ] [ infer:129 ] --------------------
[ <TIMESTAMP> ] [ DEBUG ] [ infer:130 ] Partial infer for lambda_2/IFFT2D
[ <TIMESTAMP> ] [ DEBUG ] [ infer:131 ] Op: IFFT2D
[ <TIMESTAMP> ] [ DEBUG ] [ infer:132 ] Inputs:
[ <TIMESTAMP> ] [ DEBUG ] [ infer:32 ] input[0]: shape = [ 1 256 256], value = <UNKNOWN>

This is a part of the log of the partial inference phase of the model conversion. See the "Partial Inference" section on the Model Optimizer Extensibility for more information about this phase. Model Optimizer inferred output shape for the unknown operation of type "Complex" using a "fallback" to TensorFlow*. However, it is not enough to generate the IR because Model Optimizer doesn't know which attributes of the operation should be saved to IR. So it is necessary to implement Model Optimizer extensions to support these operations.

Before going into the extension development it is necessary to understand what these unsupported operations do according to the TensorFlow* framework specification.

• "Complex" - returns a tensor of complex type constructed from two real input tensors specifying real and imaginary part of a complex number.
• "IFFT2D" - returns a tensor with inverse 2-dimensional discrete Fourier transform over the inner-most 2 dimensions of an input.
• "ComplexAbs" - returns a tensor with absolute values of input tensor with complex numbers.

The part of the model with all three unsupported operations is depicted below:

This model uses complex numbers during the inference but Inference Engine does not support tensors of this data type. So it is necessary to find a way how to avoid using tensors of such a type in the model. Fortunately, the complex tensor appear as a result of "Complex" operation, is used as input in the "IFFT2D" operation then is passed to "ComplexAbs" which produces real value tensor as output. So there are just 3 operations consuming/producing complex tensors in the model.

Let's design an OpenVINO operation "FFT" which get a single real number tensor describing the complex number and produces a single real number tensor describing output complex tensor. This way the fact that the model uses complex numbers is hidden inside the "FFT" operation implementation. The operation gets a tensor of shape [N, H, W, 2] and produces the output tensor with the same shape, where the innermost dimension contains pairs of real numbers describing the complex number (its real and imaginary part). As we will see further this operation will allow us to support the model. The implementation of the Model Optimizer operation should be saved to mo_extensions/ops/FFT.py file:

from mo.front.common.partial_infer.elemental import copy_shape_infer
from mo.graph.graph import Node, Graph
from mo.ops.op import Op
class FFT(Op):
op = 'FFT'
enabled = False
def __init__(self, graph: Graph, attrs: dict):
super().__init__(graph, {
'type': self.op,
'op': self.op,
'version': 'custom_opset',
'inverse': None,
'in_ports_count': 1,
'out_ports_count': 1,
'infer': copy_shape_infer
}, attrs)
def backend_attrs(self):
return ['inverse']

The attribute inverse is a flag specifying type of the FFT to apply: forward or inverse.

See the "Model Optimizer Operation" section of Model Optimizer Extensibility for detailed instructions on how to implement the operation.

Now it is necessary to implement extractor for the "IFFT2D" operation according to the "Operation Extractor" section of Model Optimizer Extensibility. The following snippet provides two extractors: one for "IFFT2D", another one for "FFT2D", however only on of them is used in this example. The implementation should be saved to the file mo_extensions/front/tf/FFT_ext.py.

from ...ops.FFT import FFT
from mo.front.extractor import FrontExtractorOp
from mo.utils.error import Error
class FFT2DFrontExtractor(FrontExtractorOp):
op = 'FFT2D'
enabled = True
@classmethod
def extract(cls, node):
attrs = {
'inverse': 0
}
FFT.update_node_stat(node, attrs)
return cls.enabled
class IFFT2DFrontExtractor(FrontExtractorOp):
op = 'IFFT2D'
enabled = True
@classmethod
def extract(cls, node):
attrs = {
'inverse': 1
}
FFT.update_node_stat(node, attrs)
return cls.enabled

NOTE: The graph is in inconsistent state after extracting node attributes because according to original operation "IFFT2D" semantic it should have an input consuming a tensor of complex numbers, but the extractor instantiated an operation "FFT" which expects a real tensor with specific layout. But the inconsistency will be resolved during applying front phase transformations discussed below.

The output shape of the operation "AddV2" from the picture above is [N, H, W, 2]. Where the innermost dimension contains pairs of real numbers describing the complex number (its real and imaginary part). The following "StridedSlice" operations split the input tensor into 2 parts to get a tensor of real and a tensor of imaginary parts which are then consumed with the "Complex" operation to produce a tensor of complex numbers. These "StridedSlice" and "Complex" operations can be removed so the "FFT" operation will get a real value tensor encoding complex numbers. To achieve this we implement the front phase transformation which searches for a pattern of two "StridedSlice" operations with specific attributes producing data to "Complex" operation and removes it from the graph. Refer to the "Pattern-Defined Front Phase Transformations" section of Model Optimizer Extensibility for more information on how this type of transformation works. The code snippet should be saved to the file mo_extensions/front/tf/Complex.py.

import logging as log
import numpy as np
from mo.front.common.replacement import FrontReplacementSubgraph
from mo.graph.graph import Graph
class Complex(FrontReplacementSubgraph):
enabled = True
def pattern(self):
return dict(
nodes=[
('strided_slice_real', dict(op='StridedSlice')),
('strided_slice_imag', dict(op='StridedSlice')),
('complex', dict(op='Complex')),
],
edges=[
('strided_slice_real', 'complex', {'in': 0}),
('strided_slice_imag', 'complex', {'in': 1}),
])
@staticmethod
def replace_sub_graph(graph: Graph, match: dict):
strided_slice_real = match['strided_slice_real']
strided_slice_imag = match['strided_slice_imag']
complex_node = match['complex']
# make sure that both strided slice operations get the same data as input
assert strided_slice_real.in_port(0).get_source() == strided_slice_imag.in_port(0).get_source()
# identify the output port of the operation producing datat for strided slice nodes
input_node_output_port = strided_slice_real.in_port(0).get_source()
input_node_output_port.disconnect()
# change the connection so now all consumers of "complex_node" get data from input node of strided slice nodes
complex_node.out_port(0).get_connection().set_source(input_node_output_port)

NOTE: The graph is in inconsistent state because the "ComplexAbs" operation consumes complex value tensor but "FFT" produces real value tensor.

Now lets implement a transformation which replace a "ComplexAbs" operation with a sub-graph of primitive operations which calculate the result using the following formulae: $$module(z) = \sqrt{real(z) \cdot real(z) + imag(z) \cdot imag(z)}$$. Original "IFFT2D" operation produces tensor of complex values, but the "FFT" operation produces a real value tensor with the same format and shape as the input for the operation. So the input shape for the "ComplexAbs" will be [N, H, W, 2] with the innermost dimension containing tuple with real and imaginary part of a complex number. In order to calculate absolute values for the complex tensor we do the following:

1. Raise all elements in the power of 2.
2. Calculate a reduced sum over the innermost dimension.
3. Calculate a square root.

The implementation should be saved to the file mo_extensions/front/tf/ComplexAbs.py and provided below:

import numpy as np
from extensions.ops.elementwise import Pow
from extensions.ops.ReduceOps import ReduceSum
from mo.front.common.replacement import FrontReplacementOp
from mo.graph.graph import Graph, Node
from mo.ops.const import Const
class ComplexAbs(FrontReplacementOp):
op = "ComplexAbs"
enabled = True
def replace_op(self, graph: Graph, node: Node):
pow_2 = Const(graph, {'value': np.float32(2.0)}).create_node()
reduce_axis = Const(graph, {'value': np.int32(-1)}).create_node()
pow_0_5 = Const(graph, {'value': np.float32(0.5)}).create_node()
sq = Pow(graph, dict(name=node.in_node(0).name + '/sq', power=2.0)).create_node([node.in_node(0), pow_2])
sum = ReduceSum(graph, dict(name=sq.name + '/sum')).create_node([sq, reduce_axis])
sqrt = Pow(graph, dict(name=sum.name + '/sqrt', power=0.5)).create_node([sum, pow_0_5])
return [sqrt.id]

Now it is possible to convert the model using the following command line:

./<MO_INSTALL_DIR>/mo.py --input_model <PATH_TO_MODEL>/wnet_20.pb -b 1 --extensions mo_extensions/

The sub-graph corresponding to the originally non-supported one is depicted in the image below:

NOTE: Model Optimizer performed conversion of the model from NHWC to NCHW layout that is why the dimension with the value 2 moved to another position.

### Inference Engine Extension Implementation

Now it is necessary to implement the extension for the CPU plugin with operation "FFT" introduced previously. The code below is based on the template extension described in Inference Engine Extensibility Mechanism.

#### CMake Build File

The first step is to create a CMake configuration file which builds the extension. The content of the "CMakeLists.txt" file is the following:

set(CMAKE_CXX_STANDARD 11)
set(TARGET_NAME "template_extension")
find_package(ngraph REQUIRED OPTIONAL_COMPONENTS onnx_importer)
find_package(InferenceEngine REQUIRED)
find_package(OpenCV QUIET COMPONENTS core)
set(SRC cpu_kernel.cpp extension.cpp op.cpp)
if (OpenCV_FOUND)
set(SRC ${SRC} fft_kernel.cpp fft_op.cpp) endif() add_library(${TARGET_NAME} MODULE ${SRC}) if (OpenCV_FOUND) target_compile_definitions(${TARGET_NAME} PRIVATE OPENCV_IMPORT_ENABLED)
target_link_libraries(${TARGET_NAME} PRIVATE opencv_core) endif() target_compile_definitions(${TARGET_NAME} PRIVATE IMPLEMENT_INFERENCE_EXTENSION_API)
target_link_libraries(${TARGET_NAME} PRIVATE IE::inference_engine${NGRAPH_LIBRARIES})
if (ngraph_onnx_importer_FOUND)
target_link_libraries(${TARGET_NAME} PRIVATE${ONNX_IMPORTER_LIBRARIES})
target_compile_definitions(${TARGET_NAME} PRIVATE NGRAPH_ONNX_IMPORT_ENABLED) endif() Inference Engine C++ API. Definition: cldnn_config.hpp:17 The CPU FFT kernel implementation uses OpenCV to perform the FFT that is why the extension library is linked with "opencv_core" which comes with the OpenVINO. #### Custom nGraph Operation "FFT" Implementation The next step is to create the nGraph operation FFT. The header file "fft_op.hpp" has the following content: #pragma once #include <ngraph/ngraph.hpp> namespace TemplateExtension { class FFTOp : public ngraph::op::Op { public: static constexpr ngraph::NodeTypeInfo type_info {"FFT", 0}; const ngraph::NodeTypeInfo& get_type_info() const override { return type_info; } FFTOp() = default; FFTOp(const ngraph::Output<ngraph::Node>& inp, bool inverse); void validate_and_infer_types() override; std::shared_ptr<ngraph::Node> clone_with_new_inputs(const ngraph::OutputVector& new_args) const override; bool visit_attributes(ngraph::AttributeVisitor& visitor) override; bool inverse; }; } // namespace TemplateExtension The operation has just one boolean attribute inverse. Implementation of the necessary nGraph operation functions are in the "fft_op.cpp" file with the following content: #include "fft_op.hpp" using namespace TemplateExtension; constexpr ngraph::NodeTypeInfo FFTOp::type_info; FFTOp::FFTOp(const ngraph::Output<ngraph::Node>& inp, bool _inverse): Op({inp}) { constructor_validate_and_infer_types(); inverse = _inverse; } void FFTOp::validate_and_infer_types() { auto outShape = get_input_partial_shape(0); set_output_type(0, get_input_element_type(0), outShape); } std::shared_ptr<ngraph::Node> FFTOp::clone_with_new_inputs(const ngraph::OutputVector& new_args) const { if (new_args.size() != 1) { throw ngraph::ngraph_error("Incorrect number of new arguments"); } return std::make_shared<FFTOp>(new_args.at(0), inverse); } bool FFTOp::visit_attributes(ngraph::AttributeVisitor& visitor) { visitor.on_attribute("inverse", inverse); return true; } void on_attribute(const std::string &name, AT &value) Refer to the Custom nGraph Operation for more details. #### CPU FFT Kernel Implementation The operation implementation for CPU plugin uses OpenCV to perform the FFT. The header file "fft_kernel.hpp" has the following content: #pragma once #include <ie_iextension.h> #include <ngraph/ngraph.hpp> namespace TemplateExtension { class FFTImpl : public InferenceEngine::ILayerExecImpl { public: explicit FFTImpl(const std::shared_ptr<ngraph::Node>& node); InferenceEngine::StatusCode getSupportedConfigurations(std::vector<InferenceEngine::LayerConfig>& conf, InferenceEngine::ResponseDesc* resp) noexcept override; InferenceEngine::StatusCode execute(std::vector<InferenceEngine::Blob::Ptr>& inputs, std::vector<InferenceEngine::Blob::Ptr>& outputs, InferenceEngine::ResponseDesc* resp) noexcept override; private: ngraph::Shape inpShape; ngraph::Shape outShape; bool inverse; std::string error; }; } // namespace TemplateExtension This class provides interface for the implementation with the custom execution code. Definition: ie_iextension.h:102 This is a header file for Inference Engine Extension Interface. StatusCode This enum contains codes for all possible return values of the interface functions. Definition: ie_common.h:231 This structure describes Layer configuration. Definition: ie_iextension.h:66 Represents detailed information for an error. Definition: ie_common.h:255 The "fft_kernel.cpp" with the implementation of the CPU has the following content: #include "fft_kernel.hpp" #include <ie_layouts.h> #include <opencv2/opencv.hpp> #include "fft_op.hpp" using namespace TemplateExtension; FFTImpl::FFTImpl(const std::shared_ptr<ngraph::Node>& node) { auto castedNode = std::dynamic_pointer_cast<FFTOp>(node); if (!castedNode) IE_THROW() << "Cannot create implementation for unknown operation!"; if (castedNode->inputs().size() != 1 || castedNode->outputs().size() != 1) IE_THROW() << "Cannot create implementation for operation with incorrect number of inputs or outputs!"; if (castedNode->get_input_partial_shape(0).is_dynamic() || castedNode->get_output_partial_shape(0).is_dynamic()) IE_THROW() << "Cannot create implementation for op with dynamic shapes!"; if (castedNode->get_input_element_type(0) != ngraph::element::f32 || castedNode->get_output_element_type(0) != ngraph::element::f32) IE_THROW() << "Operation supports only FP32 tensors."; inpShape = castedNode->get_input_shape(0); outShape = castedNode->get_output_shape(0); inverse = castedNode->inverse; } InferenceEngine::StatusCode FFTImpl::getSupportedConfigurations(std::vector<InferenceEngine::LayerConfig>& conf, InferenceEngine::ResponseDesc* resp) noexcept { std::vector<InferenceEngine::DataConfig> inDataConfig; std::vector<InferenceEngine::DataConfig> outDataConfig; InferenceEngine::SizeVector order(inpShape.size()); std::iota(order.begin(), order.end(), 0); // Allow any offset before data size_t offset((std::numeric_limits<size_t>::max)()); // Input shape inpConf.desc = InferenceEngine::TensorDesc(InferenceEngine::Precision::FP32, inpShape, {inpShape, order, offset}); inDataConfig.push_back(inpConf); // Output shape outConf.desc = InferenceEngine::TensorDesc(InferenceEngine::Precision::FP32, outShape, {outShape, order, offset}); outDataConfig.push_back(outConf); layerConfig.inConfs = inDataConfig; layerConfig.outConfs = outDataConfig; conf.push_back(layerConfig); return InferenceEngine::StatusCode::OK; } try { if (config.inConfs.size() != 1 || config.outConfs.size() != 1) { IE_THROW() << "Operation cannot be initialized with incorrect number of inputs/outputs!"; } if (config.outConfs[0].desc.getPrecision() != InferenceEngine::Precision::FP32 || config.inConfs[0].desc.getPrecision() != InferenceEngine::Precision::FP32) { IE_THROW() << "Operation supports only FP32 precisions!"; } if (resp) { strncpy(resp->msg, error.c_str(), sizeof(resp->msg) - 1); resp->msg[sizeof(resp->msg) - 1] = 0; } return InferenceEngine::GENERAL_ERROR; } return InferenceEngine::OK; } static cv::Mat infEngineBlobToMat(const InferenceEngine::Blob::Ptr& blob) { // NOTE: Inference Engine sizes are reversed. std::vector<size_t> dims = blob->getTensorDesc().getDims(); std::vector<int> size(dims.begin(), dims.end()); auto precision = blob->getTensorDesc().getPrecision(); CV_Assert(precision == InferenceEngine::Precision::FP32); return cv::Mat(size, CV_32F, (void*)blob->buffer()); } InferenceEngine::StatusCode FFTImpl::execute(std::vector<InferenceEngine::Blob::Ptr>& inputs, std::vector<InferenceEngine::Blob::Ptr>& outputs, cv::Mat inp = infEngineBlobToMat(inputs[0]); cv::Mat out = infEngineBlobToMat(outputs[0]); const int n = inp.size[0]; const int h = inp.size[2]; const int w = inp.size[3]; cv::Mat complex(h, w, CV_32FC2), interleavedOut(h, w, CV_32FC2); for (int i = 0; i < n; ++i) { std::vector<cv::Mat> components = {cv::Mat(h, w, CV_32F, inp.ptr<float>(i, 0)), cv::Mat(h, w, CV_32F, inp.ptr<float>(i, 1))}; cv::merge(components, complex); if (!inverse) cv::dft(complex, interleavedOut); else cv::idft(complex, interleavedOut, cv::DFT_SCALE); components = {cv::Mat(h, w, CV_32F, out.ptr<float>(i, 0)), cv::Mat(h, w, CV_32F, out.ptr<float>(i, 1))}; cv::split(interleavedOut, components); } return InferenceEngine::OK; } std::shared_ptr< Blob > Ptr A smart pointer containing Blob object. Definition: ie_blob.h:42 @ FP32 Definition: ie_precision.hpp:30 This class defines Tensor description. Definition: ie_layouts.h:158 #define IE_THROW(...) A macro used to throw specified exception with a description. Definition: ie_common.h:425 A header file for data layouts and conversion between them. std::vector< size_t > SizeVector Represents tensor size. Definition: ie_common.h:34 This structure describes data configuration. Definition: ie_iextension.h:46 TensorDesc desc Format of memory descriptor. Definition: ie_iextension.h:50 Base Inference Engine exception class. Definition: ie_common.h:307 std::vector< DataConfig > inConfs Vector of input data configs. Definition: ie_iextension.h:74 std::vector< DataConfig > outConfs Vector of output data configs. Definition: ie_iextension.h:78 Refer to the How to Implement Custom CPU Operations for more details. #### Extension Library Implementation The last step is to create an extension library "extension.cpp" and "extension.hpp" which will include the FFT operation for the CPU plugin. The code of the library is described in the Extension Library. ### Building and Running the Custom Extension To build the extension, run the following: mkdir build && cd build source /opt/intel/openvino_2021/bin/setupvars.sh cmake .. -DCMAKE_BUILD_TYPE=Release make --jobs=$(nproc)

The result of this command is a compiled shared library (.so or .dll). It should be loaded in the application using Core class instance method AddExtension like this core.AddExtension(std::make_shared<Extension>(compiled_library_file_name), "CPU");.

To test that the extension is implemented correctly we can run the "mri_reconstruction_demo.py" with the following content:

import numpy as np
import cv2 as cv
import argparse
import time
from openvino.inference_engine import IECore
def kspace_to_image(kspace):
assert(len(kspace.shape) == 3 and kspace.shape[-1] == 2)
fft = cv.idft(kspace, flags=cv.DFT_SCALE)
img = cv.magnitude(fft[:,:,0], fft[:,:,1])
return cv.normalize(img, dst=None, alpha=255, beta=0, norm_type=cv.NORM_MINMAX, dtype=cv.CV_8U)
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='MRI reconstrution demo for network from https://github.com/rmsouza01/Hybrid-CS-Model-MRI (https://arxiv.org/abs/1810.12473)')
parser.add_argument('-i', '--input', dest='input', help='Path to input .npy file with MRI scan data.')
parser.add_argument('-m', '--model', dest='model', help='Path to .xml file of OpenVINO IR.')
parser.add_argument('-l', '--cpu_extension', dest='cpu_extension', help='Path to extensions library with FFT implementation.')
help='Optional. Specify the target device to infer on; CPU, '
'GPU, HDDL or MYRIAD is acceptable. For non-CPU targets, '
'HETERO plugin is used with CPU fallbacks to FFT implementation. '
'Default value is CPU')
args = parser.parse_args()
xml_path = args.model
assert(xml_path.endswith('.xml'))
bin_path = xml_path[:xml_path.rfind('.xml')] + '.bin'
ie = IECore()
device = 'CPU' if args.device == 'CPU' else ('HETERO:' + args.device + ',CPU')
# Hybrid-CS-Model-MRI/Data/stats_fs_unet_norm_20.npy
stats = np.array([2.20295299e-01, 1.11048916e+03, 4.16997984e+00, 4.71741395e+00], dtype=np.float32)
num_slices, height, width = data.shape[0], data.shape[1], data.shape[2]
pred = np.zeros((num_slices, height, width), dtype=np.uint8)
data /= np.sqrt(height * width)
print('Compute...')
start = time.time()
for slice_id, kspace in enumerate(data):
kspace = kspace.copy()
# Apply sampling
kspace = (kspace - stats[0]) / stats[1]
# Forward through network
input = np.expand_dims(kspace.transpose(2, 0, 1), axis=0)
outputs = exec_net.infer(inputs={'input_1': input})
output = next(iter(outputs.values()))
output = output.reshape(height, width)
# Save predictions
pred[slice_id] = cv.normalize(output, dst=None, alpha=255, beta=0, norm_type=cv.NORM_MINMAX, dtype=cv.CV_8U)
print('Elapsed time: %.1f seconds' % (time.time() - start))
WIN_NAME = 'MRI reconstruction with OpenVINO'
slice_id = 0
def callback(pos):
global slice_id
slice_id = pos
kspace = data[slice_id]
img = kspace_to_image(kspace)
rec = pred[slice_id]
border_size = 20
render = cv.copyMakeBorder(render, border_size, 0, 0, 0, cv.BORDER_CONSTANT, value=255)
cv.putText(render, 'Original', (0, 15), cv.FONT_HERSHEY_SIMPLEX, 0.5, color=0)
cv.putText(render, 'Sampled (PSNR %.1f)' % cv.PSNR(img, masked), (width, 15), cv.FONT_HERSHEY_SIMPLEX, 0.5, color=0)
cv.putText(render, 'Reconstructed (PSNR %.1f)' % cv.PSNR(img, rec), (width*2, 15), cv.FONT_HERSHEY_SIMPLEX, 0.5, color=0)
cv.imshow(WIN_NAME, render)
cv.waitKey(1)
cv.namedWindow(WIN_NAME, cv.WINDOW_NORMAL)
print(num_slices)
cv.createTrackbar('Slice', WIN_NAME, num_slices // 2, num_slices - 1, callback)
callback(num_slices // 2) # Trigger initial visualization
cv.waitKey()

The script can be executed using the following command line:

python3 mri_reconstruction_demo.py \
-m <PATH_TO_IR>/wnet_20.xml \
-i <PATH_TO_SAMPLE_MRI_IMAGE>.npy \