Inference Engine Kernels Extensibility

The Inference Engine workflow involves the creation of custom kernels and either custom or existing layers.

A Layer is a convolutional neural network (CNN) building block implemented in the training framework, for example, Convolution in Caffe*. A Kernel is defined as the corresponding implementation in Inference Engine.

Please refer to the Custom Layers in the Model Optimizer section for the details of how a mapping between framework layers and Inference Engine kernels is registered.

In short, you can plug your own kernel implementations into the Inference Engine and map them to the layers in the original framework.

The rest of the section covers custom kernels and how do you integrate them into the Inference Engine.

Example of Custom Kernels Support in the Samples

Every sample uses the Inference Engine API to load custom kernels depending on the device type. Specifically, for the CPU, it is a shared library that exports certain interface that registers the kernels. For GPU or MYRIAD, it is an .xml file that lists the kernels along with parameters that the kernels accept and how these map to the specific Intermediate Representation (IR) values.

Example Custom Kernels

You can find the examples of CPU-targeted kernels in the <INSTALL_DIR>/deployment_tools/inference_engine/src/extension directory. You can also use as an example global GPU kernels delivered with the OpenVINO toolkit.

Several GPU-targeted kernels are also added to the binaries upon samples compilation so that the samples application can easy load them. Refer to the cldnn_global_custom_kernels folder in GPU plugin installation directory.

How to Implement Custom GPU Layers

The GPU codepath abstracts many details about OpenCL™. You need to provide the kernel code in the OpenCL C and the configuration file that connects the kernel and its parameters to the parameters of the layer.

There are two options of using custom layer configuration file:

All Inference Engine samples (except trivial hello_classification) feature a dedicated command-line option -c to load custom kernels. For example, to load custom layers for the classification sample:

$ ./classification_sample -m <path_to_model>/bvlc_alexnet_fp16.xml -i ./validation_set/daily/227x227/apron.bmp -d GPU
-c <absolute_path_to_config>/custom_layer_example.xml

Configuration File Format

The configuration file is expected to follow the .xml file structure with a node of type CustomLayer for every custom layer you provide.

The following definitions will use the notations:

CustomLayer Node and Sub-node Structure

CustomLayer node contains the entire configuration for a single custom layer.

Attribute Name # Description
name (1) The name of the layer type to be used. This name should be identical to the type used in the IR.
type (1) Must be SimpleGPU
version (1) Must be 1

Sub-nodes: Kernel (1), Buffers (1), CompilerOptions (0+), WorkSizes (0/1)

Kernel Node and Sub-node Structure

Kernel node contains all kernel source code configuration. No kernel node structure exists.

Sub-nodes: Source (1+), Define (0+)

Source Node and Sub-node Structure

Source node points to a single OpenCL source file.

Attribute Name # Description
filename (1) Name of the file containing OpenCL source code. Notice that path is relative to your executable. Multiple source nodes will have their sources concatenated in order.

Sub-nodes: None

Define Node and Sub-node Structure

Define node configures a single #define instruction to be added to the sources during compilation (JIT).

Attribute Name # Description
name (1) The name of the defined JIT. For static constants, this can include the value as well (taken as a string).
param (0/1) This parameter value will be used as the value of this JIT definition.
type (0/1) The parameter type. Accepted values: int, float, and int[], float[] for arrays.
default (0/1) The default value to be used if the specified parameters is missing from the layer in the IR.

Sub-nodes: None

The resulting JIT will be of the form: #define [name] [type] [value/default].

Buffers Node and Sub-node Structure

Buffers node configures all input/output buffers for the OpenCL entry function. No buffers node structure exists.

Sub-nodes: Data (0+), Tensor (1+)

Data Node and Sub-node Structure

Data node configures a single input with static data (for example, weight or biases).

Attribute Name # Description
name (1) Name of a blob attached to a layer in the IR
arg-index (1) 0-based index in the entry function arguments to be bound to

Sub-nodes: None

Tensor Node and Sub-node Structure

Tensor node configures a single input or output tensor.

Attribute Name # Description
arg-index (1) 0-based index in the entry function arguments to be bound to.
type (1) input or output
port-index (1) 0-based index in the layer’s input/output ports in the IR
format (0/1) Data layout declaration for the tensor. Accepted values: BFYX, BYXF, YXFB, FYXB (also in all lowercase). Default value: BFYX

CompilerOptions Node and Sub-node Structure

CompilerOptions node configures the compilation flags for the OpenCL sources.

Attribute Name # Description
options (1) Options string to be passed to the OpenCL compiler

Sub-nodes: None

WorkSizes Node and Sub-node Structure

WorkSizes node configures the global/local work sizes to be used when queuing the OpenCL program for execution.

Attribute Name # Description
global
local
(0/1)
(0/1)
An array of up to 3 integers (or formulas) for defining the OpenCL work-sizes to be used during execution.
The formulas can use the values of the B,F,Y,X dimensions and contain the operators: +,-,/,*,% (all evaluated in integer arithmetic).
Default value: global=”B*F*Y*X” local=””
dim (0/1) A tensor to take the work size from. Accepted values: input N, output, where N is an index of input tensor starting with 0. Default value: output

Sub-nodes: None

Example Configuration file

The following code sample provides an example configuration file (in .xml format). For information on configuration file structure, see Configuration File Format.

<CustomLayer name="ReLU" type="SimpleGPU" version="1">
<Kernel entry="example_relu_kernel">
<Source filename="custom_layer_kernel.cl"/>
<Define name="neg_slope" type="float" param="negative_slope" default="0.0"/>
</Kernel>
<Buffers>
<Tensor arg-index="0" type="input" port-index="0" format="BFYX"/>
<Tensor arg-index="1" type="output" port-index="0" format="BFYX"/>
</Buffers>
<CompilerOptions options="-cl-mad-enable"/>
<WorkSizes global="X,Y,B*F"/>
</CustomLayer>

Built-In Defines for Custom Layers

The following table includes definitions that will be attached before the user sources, where <TENSOR> is the actual input and output, (for example, INPUT0 or OUTPUT0).

For an example, see Example Kernel.

Name Value
NUM_INPUTS Number of the input tensors bound to this kernel
GLOBAL_WORKSIZE An array of global work sizes used to execute this kernel
GLOBAL_WORKSIZE_SIZE The size of the GLOBAL_WORKSIZE array
LOCAL_WORKSIZE An array of local work sizes used to execute this kernel
LOCAL_WORKSIZE_SIZE The size of the LOCAL_WORKSIZE array
<TENSOR>_DIMS An array of the tensor dimension sizes. Always ordered as BFYX
<TENSOR>_DIMS_SIZE The size of the <TENSOR>_DIMS array.
<TENSOR>_TYPE The data-type of the tensor: float, half or char
<TENSOR>_FORMAT_ The format of the tensor, BFYX, BYXF, YXFB , FYXB or ANY. The format will be concatenated to the defined name. You can use the tensor format to define codepaths in your code with #ifdef/#endif.
<TENSOR>_LOWER_PADDING An array of padding elements used for the tensor dimensions before they start. Always ordered as BFYX.
<TENSOR>_ LOWER_PADDING_SIZE The size of the <TENSOR>_LOWER_PADDING array
<TENSOR>_UPPER_PADDING An array of padding elements used for the tensor dimensions after they end. Always ordered as BFYX.
<TENSOR>_UPPER_PADDING_SIZE The size of the <TENSOR>_UPPER_PADDING array
<TENSOR>_PITCHES The number of elements between adjacent elements in each dimension. Always ordered as BFYX.
<TENSOR>_PITCHES_SIZE The size of the <TENSOR>_PITCHES array
<TENSOR>_OFFSET The number of elements from the start of the tensor to the first valid element (bypassing the lower padding)

All <TENSOR> values will be automatically defined for every tensor bound to this layer (INPUT0, INPUT1, OUTPUT0, and so on), as shown in the following for example:

#define INPUT0_DIMS_SIZE 4
#define INPUT0_DIMS (int []){ 1,96,55,55, }

Example Kernel

#pragma OPENCL EXTENSION cl_khr_fp16 : enable
__kernel void example_relu_kernel(
const __global INPUT0_TYPE* input0,
__global OUTPUT0_TYPE* output)
{
const uint idx = get_global_id(0);
const uint idy = get_global_id(1);
const uint idbf = get_global_id(2);//batches*features, as OpenCL supports 3D nd-ranges only
const uint feature = idbf%OUTPUT0_DIMS[1];
const uint batch = idbf/OUTPUT0_DIMS[1];
//notice that pitches are in elements, not in bytes!
const uint in_id = batch*INPUT0_PITCHES[0] + feature*INPUT0_PITCHES[1] + idy*INPUT0_PITCHES[2] + idx*INPUT0_PITCHES[3] + INPUT0_OFFSET;
const uint out_id = batch*OUTPUT0_PITCHES[0] + feature*OUTPUT0_PITCHES[1] + idy*OUTPUT0_PITCHES[2] + idx*OUTPUT0_PITCHES[3] + OUTPUT0_OFFSET;
INPUT0_TYPE value = input0[in_id];
//neg_slope (which is non-zero for leaky ReLU) is put automatically as #define, refer to the config xml
output[out_id] = value < 0 ? value * neg_slope : value;
}

NOTE: As described in the previous section, all the things like INPUT0_TYPE are actually defined as OpenCL (pre-)compiler inputs by the Inference Engine for efficiency reasons. See Debugging Tips for information on debugging the results.

Debugging Tips

How to Implement Custom CPU Layers

Since the primary vehicle for the performance of the CPU codepath in the Inference Engine is the Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN), new CPU kernels extend the Inference Engine plugin for the Intel MKL-DNN. Implementing the InferenceEngine::ILayerImplFactory defines a general CPU-side extension. There are no Intel MKL-DNN specifics in the way you need to implement a kernel.

All Inference Engine samples (except trivial hello_classification) feature a dedicated command-line option -l to CPU load custom kernels. Use the following command-line code to execute the Classification Sample with custom CPU kernels:

$ ./classification_sample -m <path_to_model>/CustomAlexNet.xml -i <path_to_image>/inputImage.bmp -d CPU
-l <absolute_path_to_library>/libmy_sample_extension.so

Consider simple CustomLayerFactory class that registers example kernels which make multiplication by two of its input data, but and does not change the dimensions:

  1. Create your custom layer factory CustomLayerFactory class:
    // custom_layer.h
    // A CustomLayerFactory class is an example layer which make exponentiation by 2 for the input and doesn't change dimensions
    class CustomLayerFactory {
    };
  2. Inherit it from the abstract class: InferenceEngine::ILayerImplFactory
    // custom_layer.h
    class CustomLayerFactory: public InferenceEngine::ILayerImplFactory {
    };
  3. Create a constructor, a virtual destructor, and a data member to keep the layer info:
    // custom_layer.h
    class CustomLayerFactory: public InferenceEngine::ILayerImplFactory {
    public:
    explicit CustomLayerFactory(const CNNLayer *layer): cnnLayer(*layer) {}
    private:
    CNNLayer cnnLayer;
    };
  4. Overload and implement the abstract methods getShapes and getImplementations of the InferenceEngine::ILayerImplFactory class:
    // custom_layer.h
    class CustomLayerFactory: public InferenceEngine::ILayerImplFactory {
    public:
    // ... constructor and destructor
    StatusCode getShapes(const std::vector<TensorDesc>& inShapes, std::vector<TensorDesc>& outShapes, ResponseDesc *resp) noexcept override {
    if (cnnLayer == nullptr) {
    std::string errorMsg = "Cannot get cnn layer!";
    errorMsg.copy(resp->msg, sizeof(resp->msg) - 1);
    return GENERAL_ERROR;
    }
    if (inShapes.size() != 1) {
    std::string errorMsg = "Incorrect input shapes!";
    errorMsg.copy(resp->msg, sizeof(resp->msg) - 1);
    return GENERAL_ERROR;
    }
    outShapes.clear();
    outShapes.emplace_back(inShapes[0]);
    return OK;
    }
    StatusCode getImplementations(std::vector<ILayerImpl::Ptr>& impls, ResponseDesc *resp) noexcept override {
    // You can add cnnLayer to implementation if it is necessary
    impls.push_back(ILayerImpl::Ptr(new CustomLayerImpl()));
    return OK;
    }
    };
  5. Create your custom layer implementation CustomLayerImpl class:
    // custom_layer.h
    // A CustomLayerImpl class is an example implementation
    class CustomLayerImpl {
    };
  6. Because the layer uses execute method to change data, inherit it from the abstract class InferenceEngine::ILayerExecImpl, overload and implement the abstract methods of this class:
    // custom_layer.h
    // A CustomLayerImpl class is an example implementation
    class CustomLayerImpl: public ILayerExecImpl {
    public:
    explicit CustomLayerImpl(const CNNLayer *layer): cnnLayer(*layer) {}
    StatusCode getSupportedConfigurations(std::vector<LayerConfig>& conf, ResponseDesc *resp) noexcept override;
    StatusCode init(LayerConfig& config, ResponseDesc *resp) noexcept override;
    StatusCode execute(std::vector<Blob::Ptr>& inputs, std::vector<Blob::Ptr>& outputs, ResponseDesc *resp) noexcept override;
    private:
    CNNLayer cnnLayer;
    };
  7. Implement the getSupportedConfigurations virtual method, which returns all supported configuration formats (input/output tensor layouts) for your implementation. To specify formats of data, use InferenceEngine::TensorDesc. Refer to the Memory Primitives section for instructions on how to do it.
    // custom_layer.cpp
    virtual StatusCode CustomLayerImpl::getSupportedConfigurations(std::vector<LayerConfig>& conf, ResponseDesc *resp) noexcept {
    try {
    // This layer can be in-place but not constant
    if (cnnLayer == nullptr)
    THROW_IE_EXCEPTION << "Cannot get CNN layer";
    if (cnnLayer->insData.size() != 1 || cnnLayer->outData.empty())
    THROW_IE_EXCEPTION << "Incorrect number of input/output edges";
    LayerConfig config;
    DataPtr dataPtr = cnnLayer->insData[0].lock();
    if (!dataPtr)
    THROW_IE_EXCEPTION << "Cannot get input data";
    DataConfig dataConfig;
    dataConfig.inPlace = -1;
    dataConfig.constant = false;
    SizeVector order;
    for (size_t i = 0; i < dataPtr->getTensorDesc().getDims().size(); i++) {
    order.push_back(i);
    }
    // Planar formats for N dimensions
    dataConfig.desc = TensorDesc(dataPtr->getTensorDesc().getPrecision(),
    dataPtr->getTensorDesc().getDims(),
    {dataPtr->getTensorDesc().getDims(), order});
    config.inConfs.push_back(dataConfig);
    DataConfig outConfig;
    outConfig.constant = false;
    outConfig.inPlace = 0;
    order.clear();
    for (size_t i = 0; i < cnnLayer->outData[0]->getTensorDesc().getDims().size(); i++) {
    order.push_back(i);
    }
    outConfig.desc = TensorDesc(cnnLayer->outData[0]->getTensorDesc().getPrecision(),
    cnnLayer->outData[0]->getDims(),
    {cnnLayer->outData[0]->getDims(), order});
    config.outConfs.push_back(outConfig);
    config.dynBatchSupport = 0;
    conf.push_back(config);
    return OK;
    } catch (InferenceEngine::details::InferenceEngineException& ex) {
    std::string errorMsg = ex.what();
    errorMsg.copy(resp->msg, sizeof(resp->msg) - 1);
    return GENERAL_ERROR;
    }
    }
  8. Implement the init method to get a runtime-selected configuration from a vector that populated in the previous step and check the parameters:
    // custom_layer.cpp
    virtual StatusCode CustomLayerImpl::init(LayerConfig& config, ResponseDesc *resp) noexcept {
    StatusCode rc = OK;
    if (config.dynBatchSupport) {
    config.dynBatchSupport = 0;
    rc = NOT_IMPLEMENTED;
    }
    for (auto& input : config.inConfs) {
    if (input.inPlace >= 0) {
    input.inPlace = -1;
    rc = NOT_IMPLEMENTED;
    }
    for (auto& offset : input.desc.getBlockingDesc().getOffsetPaddingToData()) {
    if (offset) {
    return GENERAL_ERROR; // our simplified implementation does not support data offsets
    }
    }
    if (input.desc.getBlockingDesc().getOffsetPadding()) {
    return GENERAL_ERROR; // our simplified implementation does not support padding
    }
    for (size_t i = 0; i < input.desc.getBlockingDesc().getOrder().size(); i++) {
    if (input.desc.getBlockingDesc().getOrder()[i] != i) {
    // our simplified tensors support only 4D dimensions with regular order
    if (i != 4 || input.desc.getBlockingDesc().getOrder()[i] != 1)
    return GENERAL_ERROR;
    }
    }
    }
    for (auto& output : config.outConfs) {
    if (output.inPlace < 0) {
    // no in-place support for the output
    return GENERAL_ERROR;
    }
    for (auto& offset : output.desc.getBlockingDesc().getOffsetPaddingToData()) {
    if (offset) {
    return GENERAL_ERROR;
    }
    }
    if (output.desc.getBlockingDesc().getOffsetPadding()) {
    return GENERAL_ERROR;
    }
    for (size_t i = 0; i < output.desc.getBlockingDesc().getOrder().size(); i++) {
    if (output.desc.getBlockingDesc().getOrder()[i] != i) {
    if (i != 4 || output.desc.getBlockingDesc().getOrder()[i] != 1)
    return GENERAL_ERROR;
    }
    }
    }
    return rc;
    }
  9. Implement the execute method, which accepts and processes the actual tenors as input/output blobs:
    // custom_layer.cpp
    virtual StatusCode CustomLayerImpl::execute(std::vector<Blob::Ptr>& inputs, std::vector<Blob::Ptr>& outputs, ResponseDesc *resp) noexcept {
    if (inputs.size() != 1 || outputs.empty()) {
    std::string errorMsg = "Incorrect number of input or output edges!";
    errorMsg.copy(resp->msg, sizeof(resp->msg) - 1);
    return GENERAL_ERROR;
    }
    const float* src_data = inputs[0]->buffer();
    float* dst_data = outputs[0]->buffer();
    for (size_t o = 0; o < outputs->size(); o++) {
    if (dst_data == src_data) {
    dst_data[o] *= dst_data[o];
    } else {
    dst_data[o] = src_data[o]*src_data[o];
    }
    }
    }
  10. Pack the kernels into a shared library:
    1. Create a factory for your own primitives inherited from the abstract class InferenceEngine::IExtension, which defines the functions that you need to implement:
      // custom_extension.h
      class CustomExtention : public InferenceEngine::IExtension {
      };
    2. Implement the utility methods Unload, Release, SetLogCallback:
      // custom_extension.h
      class CustomExtention : public InferenceEngine::IExtension {
      public:
      // cleans up resources, in this case, does nothing
      void Unload() noexcept override {
      }
      // is used when destruction happens
      void Release() noexcept override {
      delete this;
      }
      // logging is used to track what is going on inside
      void SetLogCallback(InferenceEngine::IErrorListener &listener) noexcept override {}
      };
    3. Implement the utility method GetVersion:
      // custom_extension.h
      class CustomExtention : public InferenceEngine::IExtension {
      private:
      static InferenceEngine::Version ExtensionDescription = {
      {1, 0}, // extension API version
      "1.0",
      "CustomExtention" // extension description message
      };
      public:
      // gets extension version information
      void GetVersion(const InferenceEngine::Version *& versionInfo) const noexcept override {
      versionInfo = &ExtensionDescription;
      }
      };
    4. Implement main extension methods:
      // custom_extension.h
      class CustomExtention : public InferenceEngine::IExtension {
      public:
      // ... utility methods
      // retrunes the list of supported kernels/layers
      StatusCode getPrimitiveTypes(char**& types, unsigned int& size, ResponseDesc* resp) noexcept override {
      std::string type_name = "CustomLayer";
      types = new char *[1];
      size = 1;
      types[0] = new char[type_name.size() + 1];
      std::copy(type_name.begin(), type_name.end(), types[0]);
      types[0][type_name.size()] = '\0';
      return OK;
      }
      // main function
      StatusCode getFactoryFor(ILayerImplFactory *&factory, const CNNLayer *cnnLayer, ResponseDesc *resp) noexcept override {
      if (cnnLayer->type != "CustomLayer") {
      std::string errorMsg = std::string("Factory for ") + cnnLayer->type + " wasn't found!";
      errorMsg.copy(resp->msg, sizeof(resp->msg) - 1);
      return NOT_FOUND;
      }
      factory = new CustomLayerFactory(cnnLayer);
      return OK;
      }
      };
  11. To use your custom layers, you need to compile the code as the shared library. After that use the AddExtension method of the general plugin interface to load your primitives:
    // Load the CPU plugin
    InferenceEngine::InferenceEnginePluginPtr plugin_ptr(selectPlugin(…, “CPU”));
    InferencePlugin plugin(plugin_ptr);
    // Load CPU extension as a shared library
    auto extension_ptr = make_so_pointer<InferenceEngine::IExtension>(“<shared lib path>”);
    // Add extension to the plugin list
    plugin.AddExtension(extension_ptr);

How to Implement Custom MYRIAD Layers

  1. Since OpenCL toolchain for MYRIAD supports offline compilation only, you should first compile OpenCL C code using standalone clc compiler with the following command:
    ./clc --strip-binary-header custom_layer.cl -o custom_layer.bin
  2. Write a configuration file with a kernel parameter description and bindings. For example, given the following OpenCL kernel signature:

    __kernel void reorg_nhwc(__global const half *src, __global half *out, int w, int h, int c, int stride);

    Configuration file for this kernel might be the following:

    <CustomLayer name="ReorgYolo" type="MVCL" version="1">
    <Kernel entry="reorg_nhwc">
    <Source filename="reorg.bin"/>
    </Kernel>
    <Parameters>
    <Tensor arg-name="src" type="input" port-index="0" format="BYXF"/>
    <Tensor arg-name="out" type="output" port-index="0" format="BYXF"/>
    <Scalar arg-name="w" type="int" port-index="0" source="I.X" />
    <Scalar arg-name="h" type="int" port-index="0" source="I.Y" />
    <Scalar arg-name="c" type="int" port-index="0" source="I.F" />
    <Scalar arg-name="stride" type="int" source="stride" />
    </Parameters>
    <WorkSizes dim="input,0" global="(Y+7)/8*8,1,1" local="8,1,1"/>
    </CustomLayer>

    Each custom layer is described with CustomLayer node. It has the following required nodes and attributes:

    • Root node CustomLayer must contain the following attributes:
      • name, which is the name of the Inference Engine layer to bind kernel with
      • type and version. Set them to MVCL and 1.
    • Sub-node Kernel must contain the following attributes:
      • entry, which is a name of your kernel function as you defined it in a source file (in the example above, it is reorg_nhwc)
      • Node Source must contain the following attributes:
        • filename, which is a path to a compiled binary relative to the .xml binding file
    • Sub-node Parameters, which describes parameters bindings
    • Sub-node WorkSizes, which describes local, global work group sizes and source for dimension deduction as a pair direction,port. In the example above, the work group is described relatively to the dimension of input tensor that comes thought port 0 in IR. global and local work group configurations support any simple math expressions with +,-,*,/, and () from B(batch), Y(height), X(width) and F(channels).

    Parameter description supports Tensor and Scalar nodes and has the following format:

    • Each Tensor node must contain the following attribute:
      • arg-name, which is a name of a kernel parameter in the kernel signature
      • type, which is input or output as in the IR
      • port-index, which is a number of input/output port as in the IR
      • format, which specifies channel order in tensor. Optional repacks are generated if custom layer format is not compatible with formats of neighboring layers.
    • Each Scalar node must contain the following attributes:
      • arg-name, which is a name of a kernel parameter in the kernel signature
      • type, which is int or float. It is used for correct argument extraction from IR parameters.
      • source, which contains the name of the parameter in IR file or input/output (I/O, In/On, where n is a port number) followed by dimension B(batch), Y(height), X(width) or F(channels).
  3. Before loading the network that features the custom layers, provide a separate configuration file and load it using IInferencePlugin::SetConfig() method with the PluginConfigParams::KEY_CONFIG_FILE key and the configuration file name as the value:
    // Load MYRIAD plugin
    InferenceEngine::InferenceEnginePluginPtr plugin_ptr("libmyriadPlugin.so");
    InferencePlugin plugin(plugin_ptr);
    // Load custom layers
    plugin.SetConfig({{PluginConfigParams::KEY_CONFIG_FILE, ”<path to the xml file>”}});
    Optionally, you can set path to custom layers description with a pair of VPU_CUSTOM_LAYERS and /path/to/your/customLayers.xml as a network configuration:
    // Load MYRIAD plugin
    InferenceEngine::InferenceEnginePluginPtr myriad("libmyriadPlugin.so");
    std::map<std::string, std::string> networkConfig;
    config["VPU_CUSTOM_LAYERS"] = "/path/to/your/customLayers.xml";
    // Load custom layers in network config
    IECALL(myriad->LoadNetwork(exeNetwork, cnnNetwork, networkConfig, &resp));

    NOTE: If both native and custom layer implementations are present, custom kernel has a priority over native code.

See Also