Intermediate Representation Notation Reference Catalog

Table of Сontents

Activation Layer

Back to top

Name: Activation

Category: Activation

Short description: Activation layer represents an activation function of each neuron in a layer, which is used to add non-linearity to the computational flow.

Detailed description: Reference

Parameters: Activation layer parameters should be specified in the data node, which is a child of the layer node.

Mathematical Formulation

Inputs:

Example

<layer ... type="Activation" ... >
<data type="sigmoid" />
<input> ... </input>
<output> ... </output>
</layer>

ArgMax Layer

Back to top

Name: ArgMax

Category: Layer

Short description: ArgMax layer computes indices and values of the top_k maximum values for each datum across all dimensions CxHxW.

Detailed description: Intended for use after a classification layer to produce a prediction. If parameter out_max_val is set to "true", output is a vector of pairs *(max_ind, max_val)* for each batch. The axis parameter specifies an axis along which to maximize.

Parameters: ArgMax layer parameters should be specified as the data node, which is a child of the layer node.

Inputs:

Mathematical Formulation

ArgMax generally does the following with the input blobs:

\[ o_{i} = \left\{ x| x \in S \wedge \forall y \in S : f(y) \leq f(x) \right\} \]

Example

<layer ... type="ArgMax" ... >
<data top_k="10" out_max_val="1" axis="-1"/>
<input> ... </input>
<output> ... </output>
</layer>

BatchNormalization Layer

Back to top

Name: BatchNormalization

Category: Normalization

Short description: Reference

Detailed description: Reference

Parameters: BatchNormalization layer parameters should be specified as the data node, which is a child of the layer node.

Inputs:

Mathematical Formulation

BatchNormalization is the normalization of the output in each hidden layer.

Example

<layer ... type="BatchNormalization" ... >
<data epsilon="9.99e-06" />
<input> ... </input>
<output> ... </output>
</layer>

BinaryConvolution Layer

Back to top

Name: BinaryConvolution

Category: Layer

Short description: BinaryConvolution convolution with binary weights.

Parameters: BinaryConvolution layer parameters should be specified as the data node, which is a child of the layer node. The layer has the same parameters as regular Convolution layer and several unique.

Inputs:


Clamp Layer

Back to top

Name: Clamp

Category: Layer

Short description: Clamp layer represents clipping activation operation.

Detailed description: Reference

Parameters: Clamp layer parameters should be specified as the data node, which is a child of the layer node.

Inputs:

Mathematical Formulation

Clamp generally does the following with the input blobs:

\[ out_i=\left\{\begin{array}{ll} max\_value \quad \mbox{if } \quad input_i>max\_value \\ min\_value \quad \mbox{if } \quad input_i \end{array}\right. \]

Example

<layer ... type="Clamp" ... >
<data min="10" max="50" />
<input> ... </input>
<output> ... </output>
</layer>

Concat Layer

Back to top

Name: Concat

Category: Layer

Short description: Reference

Parameters: Concat layer parameters should be specified in the data node, which is a child of the layer node.

Inputs:

Mathematical Formulation

Axis parameter specifies a blob dimension to concat values. For example, for two input blobs B1xC1xH1xW1 and B2xC2xH2xW2 if axis: 1, output blob is: B1xC1+C2xH1xW1. This is only possible if B1=B2, H1=H2, W1=W2.

Example

<layer ... type="Concat" ... >
<data axis="1"/>
<input> ... </input>
<output> ... </output>
</layer>

Const Layer

Back to top

Name: Const

Category: Layer

Short description: Const layer produces blob with a constant value specified in the blobs section.

Parameters: Const layer does not have parameters.

Example

<layer ... type="Const" ...>
<output>
<port id="1">
<dim>3</dim>
<dim>100</dim>
</port>
</output>
<blobs>
<custom offset="..." size="..."/>
</blobs>
</layer>

Convolution Layer

Back to top

Name: Convolution

Category: Layer

Short description: Reference

Detailed description: Reference

Parameters: Convolution layer parameters are specified in the data node, which is a child of the layer node.

Inputs:

Weights Layout

Weights layout is GOIYX (GOIZYX for 3D convolution), which means that X is changing the fastest, then Y, then Input, Output, then Group.

Mathematical Formulation

Example

<layer ... type="Convolution" ... >
<data auto_pad="same_upper" dilations="1,1" group="3" kernel="7,7" output="24" pads_begin="2,2" pads_end="3,3" strides="2,2"/>
<input> ... </input>
<output> ... </output>
<weights ... />
<biases ... />
</layer>

Crop (Type 1) Layer

Back to top

Name: Crop

Category: Layer

Short description: Crop layer changes selected dimensions of the input blob according to the specified parameters.

Parameters: Crop layer parameters should be specified in data section, which is placed as a child of the layer node. Crop Type 1 layer takes two input blobs, and the shape of the second blob specifies the Crop size. The layer has two attributes: axis and offset. The Crop layer of this type supports shape inference.

Inputs

Example

<layer id="39" name="score_pool4c" precision="FP32" type="Crop">
<data axis="2,3" offset="0,0"/>
<input>
<port id="0">
<dim>1</dim>
<dim>21</dim>
<dim>44</dim>
<dim>44</dim>
</port>
<port id="1">
<dim>1</dim>
<dim>21</dim>
<dim>34</dim>
<dim>34</dim>
</port>
</input>
<output>
<port id="2">
<dim>1</dim>
<dim>21</dim>
<dim>34</dim>
<dim>34</dim>
</port>
</output>
</layer>

Crop (Type 2) Layer

Back to top

Name: Crop

Category: Layer

Short description: Crop layer changes selected dimensions of the input blob according to the specified parameters.

Parameters: Crop layer parameters should be specified in data section, which is placed as a child of the layer node. Crop Type 2 layer takes one input blob to Crop and has three attributes: axis, offset, and dim. The Crop layer of this type supports shape inference only when shape propagation is applied to dimensions that are not specified in the axis attribute.

<layer id="39" name="score_pool4c" precision="FP32" type="Crop">
<data axis="2,3" offset="0,0" dim="34,34"/>
<input>
<port id="0">
<dim>1</dim>
<dim>21</dim>
<dim>44</dim>
<dim>44</dim>
</port>
</input>
<output>
<port id="1">
<dim>1</dim>
<dim>21</dim>
<dim>34</dim>
<dim>34</dim>
</port>
</output>
</layer>

Crop (Type 3) Layer

Back to top

Name: Crop

Category: Layer

Short description: Crop layer changes selected dimensions of the input blob according to the specified parameters.

Parameters: Crop layer parameters should be specified in data section, which is placed as a child of the layer node. Crop Type 3 layer takes one input blob to Crop and has three attributes: axis, crop_begin, and crop_end. The Crop layer of this type supports shape inference.

<layer id="39" name="score_pool4c" precision="FP32" type="Crop">
<data axis="2,3" crop_begin="4,4" crop_end="6,6"/>
<input>
<port id="0">
<dim>1</dim>
<dim>21</dim>
<dim>44</dim>
<dim>44</dim>
</port>
</input>
<output>
<port id="1">
<dim>1</dim>
<dim>21</dim>
<dim>34</dim>
<dim>34</dim>
</port>
</output>
</layer>

CTCGreedyDecoder Layer

Back to top

Name: CTCGreedyDecoder

Category: Layer

Short description: CTCGreedyDecoder performs greedy decoding on the logits given in input (best path).

Detailed description: Reference

Parameters: CTCGreedyDecoder layer parameters should be specified as the data node, which is a child of the layer node.

Mathematical Formulation

Given an input sequence $X$ of length $T$, CTCGreadyDecoder assumes the probability of a length $T$ character sequence $C$ is given by

\[ p(C|X) = \prod_{t=1}^{T} p(c_{t}|X) \]

Example

<layer ... type="CTCGreadyDecoder" ... >
<data stride="1"/>
<input> ... </input>
<output> ... </output>
</layer>

Deconvolution Layer

Back to top

Name: Deconvolution

Category: Layer

Short description: Deconvolution layer is applied for upsampling the output to the higher image resolution.

Detailed description: Reference

Parameters: Deconvolution layer parameters should be specified in the data node, which is a child of the layer node.

Inputs:

Weights Layout

Weights layout is the following: GOIYX, which means that X is changing the fastest, then Y, then Input, Output, then Group.

Mathematical Formulation

Deconvolution is also called transpose convolution and performs operation, reverse to convolution. The number of output features for each dimensions is calculated:

\[S_{o}=stride(S_{i} - 1 ) + S_{f} - 2pad \]

Where $S$ is size of output, input and filter. Output is calculated in the same way as for convolution layer:

\[out = \sum_{i = 0}^{n}w_{i}x_{i} + b\]

Example

<layer ... type="Deconvolution" ...>
<data auto_pad="valid" kernel="2,2,2" output="512" pads_begin="0,0,0" pads_end="0,0,0" strides="2,2,2"/>
<input>
<port id="0">
<dim>1</dim>
<dim>512</dim>
<dim>8</dim>
<dim>8</dim>
<dim>8</dim>
</port>
</input>
<output>
<port id="3">
<dim>1</dim>
<dim>512</dim>
<dim>16</dim>
<dim>16</dim>
<dim>16</dim>
</port>
</output>
<blobs>
<weights offset="..." size="..."/>
<biases offset="..." size="..."/>
</blobs>
</layer>

DepthToSpace Layer

Back to top

Name: DepthToSpace

Category: Layer

Short description: DepthToSpace permutes data from the depth dimension of the input blob into spatial dimensions.

Detailed description: DepthToSpace layer produces a copy of the input blob where values from the depth (features) dimension are moved in spatial blocks. Refer to ONNX* specification for example of the 4D input blob case.

Parameters: DepthToSpace layer parameters should be specified as the data node, which is a child of the layer node.

Inputs:

Mathematical Formulation

The operation is equivalent to the following transformation of the input blob x with K spatial dimensions of shape [N, C, D1, D2, D3 , ... , DK]:

x' = reshape(x, [N, block_size, block_size, ... , block_size, D1 * block_size, D2 * block_size, ... Dk * block_size])
x'' = transpose(x', [0, K + 1, K + 2, 1, K + 3, 2, K + 4, 3, ... K + K + 1, K])
y = reshape(x'', [N, C / block_size ^ K, D1 * block_size, D2 * block_size, D3 * block_size, ... , DK * block_size])

Example

<layer ... type="DepthToSpace">
<data block_size="2"/>
<input>
<port id="0">
<dim>5</dim>
<dim>4</dim>
<dim>2</dim>
<dim>3</dim>
</port>
</input>
<output>
<port id="1">
<dim>5</dim>
<dim>1</dim>
<dim>4</dim>
<dim>6</dim>
</port>
</output>
</layer>

DetectionOutput Layer

Back to top

Name: DetectionOutput

Category: Layer

Short description: DetectionOutput layer performs non-maximum suppression to generate the detection output using information on location and confidence predictions.

Detailed description: Reference. The layer has 3 mandatory inputs: blob with box logits, blob with confidence predictions and blob with box coordinates (proposals). It can have 2 additional inputs with additional confidence predictions and box coordinates described in the article. The 5-input version of the layer is supported with Myriad plugin only. The output blob contains information about filtered detections described with 7 element tuples: [batch_id, class_id, confidence, x_1, y_1, x_2, y_2]. The first tuple with batch_id equal to *-1* means end of output.

Parameters: DetectionOutput layer parameters should be specified as the data node, which is a child of the layer node.

Mathematical Formulation

At each feature map cell, DetectionOutput predicts the offsets relative to the default box shapes in the cell, as well as the per-class scores that indicate the presence of a class instance in each of those boxes. Specifically, for each box out of k at a given location, DetectionOutput computes class scores and the four offsets relative to the original default box shape. This results in a total of $(c + 4)k$ filters that are applied around each location in the feature map, yielding $(c + 4)kmn$ outputs for a m * n feature map.

Example

<layer ... type="DetectionOutput" ... >
<data num_classes="21" share_location="1" background_label_id="0" nms_threshold="0.450000" top_k="400" input_height="1" input_width="1" code_type="caffe.PriorBoxParameter.CENTER_SIZE" variance_encoded_in_target="0" keep_top_k="200" confidence_threshold="0.010000"/>
<input> ... </input>
<output> ... </output>
</layer>

Eltwise Layer

Back to top

Name: Eltwise

Category: Layer

Short description: Eltwise layer performs element-wise operation specified in parameters, over given inputs.

Parameters: Eltwise layer parameters should be specified in the data node, which is placed as a child of the layer node. Eltwise accepts 2 inputs of arbitrary number of dimensions. The operation supports broadcasting input blobs according to the NumPy specification.

Inputs

Mathematical Formulation Eltwise does the following with the input blobs:

\[ o_{i} = f(b_{i}^{1}, b_{i}^{2}) \]

where $b_{i}^{1}$ - first blob $i$-th element, $b_{i}^{2}$ - second blob $i$-th element, $o_{i}$ - output blob $i$-th element, $f(a, b)$ - is a function that performs an operation over its two arguments $a, b$.

Example

<layer ... type="Eltwise" ... >
<data operation="sum"/>
<input> ... </input>
<output> ... </output>
</layer>

Fill Layer

Back to top

Name: Fill

Category: Layer

Short description: Fill layer generates a blob of the specified shape filled with the specified value.

Parameters: Fill layer has no parameters.

Inputs:

Example

<layer ... type="Fill">
<input>
<port id="0">
<dim>2</dim>
</port>
<port id="1"/>
</input>
<output>
<port id="2">
<dim>3</dim>
<dim>4</dim>
</port>
</output>
</layer>

Flatten Layer

Back to top

Name: Flatten

Category: Layer

Short description: Flatten layer performs flattening of specific dimensions of the input blob.

Parameters: Flatten layer parameters should be specified as the data node, which is a child of the layer node.

Inputs

Example

<layer ... type="Flatten" ...>
<data axis="1" end_axis="-1"/>
<input>
<port id="0">
<dim>7</dim>
<dim>19</dim>
<dim>19</dim>
<dim>12</dim>
</port>
</input>
<output>
<port id="1">
<dim>7</dim>
<dim>4332</dim>
</port>
</output>
</layer>

FullyConnected Layer

Back to top

Name: FullyConnected

Category: Layer

Short description: Reference

Detailed description: Reference

Parameters: Specify FullyConnected layer parameters in the data node, which is a child of the layer node.

Inputs

Weights Layout

OI, which means that Input is changing the fastest, then Output.

Mathematical Formulation

Example

<layer ... type="FullyConnected" ... >
<data out-size="4096"/>
<input> ... </input>
<output> ... </output>
</layer>

Gather Layer

Back to top

Name: Gather

Category: Layer

Short description: Gather layer takes slices of data in the second input blob according to the indices specified in the first input blob. The output blob shape is input2.shape[:axis] + input1.shape + input2.shape[axis + 1:].

Parameters: Gather layer parameters are specified in the data section, which is placed as a child of the layer node.

Mathematical Formulation

\[ output[:, ... ,:, i, ... , j,:, ... ,:] = input2[:, ... ,:, input1[i, ... ,j],:, ... ,:] \]

Inputs

Example

<layer id="1" name="gather_node" precision="FP32" type="Gather">
<data axis=1 />
<input>
<port id="0">
<dim>15</dim>
<dim>4</dim>
<dim>20</dim>
<dim>28</dim>
</port>
<port id="1">
<dim>6</dim>
<dim>12</dim>
<dim>10</dim>
<dim>24</dim>
</port>
</input>
<output>
<port id="2">
<dim>6</dim>
<dim>15</dim>
<dim>4</dim>
<dim>20</dim>
<dim>28</dim>
<dim>10</dim>
<dim>24</dim>
</port>
</output>
</layer>

GRN Layer

Back to top

Name: GRN

Category: Normalization

Short description: GRN is Global Response Normalization with L2 norm (across channels only).

Parameters: GRN layer parameters should be specified as the data node, which is a child of the layer node.

Inputs

Mathematical Formulation

GRN computes L2 norm by channels for input blob. GRN generally does the following with the input blob:

\[ output_{i} = \frac{input_{i}}{\sqrt{\sum_{i}^{C} input_{i}}} \]

Example

<layer ... type="GRN" ... >
<data bias="1.0"/>
<input> ... </input>
<output> ... </output>
</layer>

GRUCell Layer

Back to top

Name: GRUCell

Category: Layer

Short description: GRUCell layer computes the output using the formula described in the paper.

Parameters: GRUCell layer parameters should be specified as the data node, which is a child of the layer node.

Inputs

Outputs


Input Layer

Back to top

Name: Input

Category: Layer

Short description: Input layer specifies input to the model.

Parameters: Input layer does not have parameters.

Example

<layer ... type="Input" ...>
<output>
<port id="0">
<dim>1</dim>
<dim>3</dim>
<dim>224</dim>
<dim>224</dim>
</port>
</output>
</layer>

Interp Layer

Back to top

Name: Interp

Category: Layer

Short description: Interp layer performs bilinear interpolation of the input blob by the specified parameters.

Parameters: Interp layer parameters should be specified as the data node, which is a child of the layer node.

Inputs

Example

<layer ... type="Interp" ...>
<data align_corners="0" factor="2.0" pad_beg="0" pad_end="0"/>
<input>
<port id="0">
<dim>1</dim>
<dim>2</dim>
<dim>48</dim>
<dim>80</dim>
</port>
</input>
<output>
<port id="1">
<dim>1</dim>
<dim>2</dim>
<dim>96</dim>
<dim>160</dim>
</port>
</output>
</layer>

LSTMCell Layer

Back to top

Name: LSTMCell

Category: Layer

Short description: LSTMCell layer computes the output using the formula described in the original paper Long Short-Term Memory.

Parameters: LSTMCell layer parameters should be specified as the data node, which is a child of the layer node.

Inputs

Outputs

Mathematical Formulation

Formula:
* - matrix mult
(.) - eltwise mult
[,] - concatenation
sigm - 1/(1 + e^{-x})
tanh - (e^{2x} - 1)/(e^{2x} + 1)
f = sigm(Wf*[Hi, X] + Bf)
i = sigm(Wi*[Hi, X] + Bi)
c = tanh(Wc*[Hi, X] + Bc)
o = sigm(Wo*[Hi, X] + Bo)
Co = f (.) Ci + i (.) c
Ho = o (.) tanh(Co)

Example

<layer ... type="LSTMCell" ... >
<input> ... </input>
<output> ... </output>
</layer>

Memory Layer

Back to top

Name: Memory

Category: Layer

Short description: Memory layer represents delay layer in terms of LSTM terminology. To read more about LSTM topologies please refer this link.

Detailed description: Memory layer saves state between two infer requests. In the topology, it is the single layer, however, in the Intermediate Representation, it is always represented as a pair of Memory layers. One of these layers does not have outputs and another does not have inputs (in terms of the Intermediate Representation).

Parameters: Memory layer parameters should be specified as the data node, which is a child of the layer node.

Mathematical Formulation

Memory save data from the input blob.

Example

<layer ... type="Memory" ... >
<data id="r_27-28" index="0" size="2" />
<input> ... </input>
<output> ... </output>
</layer>

MVN Layer

Back to top

Name: MVN

Category: Normalization

Short description: Reference

Parameters: MVN layer parameters should be specified as the data node, which is a child of the layer node.

Inputs

Mathematical Formulation

MVN subtracts mean from the input blob:

\[ o_{i} = i_{i} - \frac{\sum{i_{k}}}{C * H * W} \]

If normalize_variance is set to 1, the output blob is divided by variance:

\[ o_{i}=\frac{o_{i}}{\sum \sqrt {o_{k}^2}+\epsilon} \]

Example

<layer ... type="MVN">
<data across_channels="1" eps="9.999999717180685e-10" normalize_variance="1"/>
<input>
...
</input>
<output>
...
</output>
</layer>

Norm Layer

Back to top

Name: Norm

Category: Normalization

Short description: Reference

Detailed description: Reference

Parameters: Norm layer parameters should be specified in the data node, which is a child of the layer node.

Inputs

Mathematical Formulation

\[o_{i} = \left( 1 + \left( \frac{\alpha}{n} \right)\sum_{i}x_{i}^{2} \right)^{\beta}\]

Where $n$ is the size of each local region.

Example

<layer ... type="Norm" ... >
<data alpha="9.9999997e-05" beta="0.75" local-size="5" region="across"/>
<input> ... </input>
<output> ... </output>
</layer>

Normalize Layer

Back to top

Name: Normalize

Category: Normalization

Short description: Normalize layer performs l-p normalization of 1 of input blob.

Parameters: Normalize layer parameters should be specified as the data node, which is a child of the layer node.

Inputs

Mathematical Formulation

\[ o_{i} = \sum_{i}^{H*W}\frac{\left ( n*C*H*W \right )* scale}{\sqrt{\sum_{i=0}^{C*H*W}\left ( n*C*H*W \right )^{2}}} \]

Example

<layer ... type="Normalize" ... >
<data across_spatial="0" channel_shared="0" eps="0.000000"/>
<input> ... </input>
<output> ... </output>
</layer>

Pad Layer

Back to top

Name: Pad

Category: Layer

Short description: Pad layer extends an input blob on edges. New element values are generated based on the Pad layer parameters described below.

Parameters: Pad layer parameters should be specified in the data section, which is placed as a child of the layer node. The parameters specify a number of elements to added along each axis and a rule by which new element values are generated: for example, whether they are filled with a given constant or generated based on the input blob content.

Inputs

Outputs

pad_mode Examples

The following examples illustrate how output blob is generated for the Pad layer for a given input blob:

INPUT =
[[ 1 2 3 4 ]
[ 5 6 7 8 ]
[ 9 10 11 12 ]]

with the following parameters:

pads_begin = [0, 1]
pads_end = [2, 3]

depending on the pad_mode.

Example

<layer ... type="Pad" ...>
<data pads_begin="0,5,2,1" pads_end="1,0,3,7" pad_mode="constant" pad_value="666.0"/>
<input>
<port id="0">
<dim>1</dim>
<dim>3</dim>
<dim>32</dim>
<dim>40</dim>
</port>
</input>
<output>
<port id="2">
<dim>2</dim>
<dim>8</dim>
<dim>37</dim>
<dim>48</dim>
</port>
</output>
</layer>

Permute Layer

Back to top

Name: Permute

Category: Layer

Short description: Permute layer performs reordering of input blob dimensions.

Detailed description: Reference

Parameters: Permute layer parameters should be specified as the data node, which is a child of the layer node.

Inputs:

Mathematical Formulation

Permute layer performs reordering input blob. Source indexes and destination indexes are bound by formula:

\[ src\_ind_{offset} = n * ordered[1] * ordered[2] * ordered[3] + (h * ordered[3] + w) \]

\[ n \in ( 0, order[0] ) \]

\[ h \in ( 0, order[2] ) \]

\[ w \in ( 0, order[3] ) \]

Example

<layer ... type="Permute" ... >
<data order="0,2,3,1"/>
<input> ... </input>
<output> ... </output>
</layer>

Pooling Layer

Back to top

Name: Pooling

Category: Pool

Short description: Reference

Detailed description: Reference

Parameters: Pooling layer parameters are specified in the data node, which is a child of the layer node.

Inputs:

Mathematical Formulation

Example

<layer ... type="Pooling" ... >
<data auto_pad="same_upper" exclude-pad="true" kernel="3,3" pads_begin="0,0" pads_end="1,1" pool-method="max" strides="2,2"/>
<input> ... </input>
<output> ... </output>
</layer>

Power Layer

Back to top

Name: Power

Category: Layer

Short description: Power layer computes the output as (shift + scale * x) ^ power for each input element x.

Parameters: Power layer parameters should be specified as the data node, which is a child of the layer node.

Inputs:

Mathematical Formulation

\[ p = (shift + scale * x)^{power} \]

Example

<layer ... type="Power" ... >
<data power="2" scale="0.1" shift="5"/>
<input> ... </input>
<output> ... </output>
</layer>

PReLU Layer

Back to top

Name: PReLU

Category: Activation

Short description: PReLU is the Parametric Rectifier Linear Unit. The difference from ReLU is that negative slopes can vary across channels.

Parameters: PReLU layer parameters should be specified as the data node, which is a child of the layer node.

Inputs:

Mathematical Formulation

PReLU accepts one input with four dimensions. The produced blob has the same dimensions as input. PReLU does the following with the input blob:

\[ o_{i} = max(0, x_{i}) + w_{i} * min(0,x_{i}) \]

where $w_{i}$ is from weights blob.

Example

<layer ... type="PReLU" ... >
<data channel_shared="1"/>
<input> ... </input>
<output> ... </output>
</layer>

PriorBox Layer

Back to top

Name: PriorBox

Category: Layer

Short description: PriorBox layer generates prior boxes of specified sizes and aspect ratios across all dimensions.

Parameters: PriorBox layer parameters should be specified as the data node, which is a child of the layer node.

Inputs:

Mathematical Formulation:

PriorBox computes coordinates of prior boxes by following:

  1. First calculates center_x and center_y of prior box:

    \[ W \equiv Width \quad Of \quad Image \]

    \[ H \equiv Height \quad Of \quad Image \]

    • If step equals 0:

      \[ center_x=(w+0.5) \]

      \[ center_y=(h+0.5) \]

    • else:

      \[ center_x=(w+offset)*step \]

      \[ center_y=(h+offset)*step \]

      \[ w \subset \left( 0, W \right ) \]

      \[ h \subset \left( 0, H \right ) \]

  2. Then, for each $ s \subset \left( 0, min_sizes \right ) $ calculates coordinates of prior boxes:

    \[ xmin = \frac{\frac{center_x - s}{2}}{W} \]

    \[ ymin = \frac{\frac{center_y - s}{2}}{H} \]

    \[ xmax = \frac{\frac{center_x + s}{2}}{W} \]

    \[ ymin = \frac{\frac{center_y + s}{2}}{H} \]

Example

<layer ... type="PriorBox" ... >
<data step="64.000000" min_size="162.000000" max_size="213.000000" offset="0.500000" flip="1" clip="0" aspect_ratio="2.000000,3.000000" variance="0.100000,0.100000,0.200000,0.200000" />
<input> ... </input>
<output> ... </output>
</layer>

PriorBoxClustered Layer

Back to top

Name: PriorBoxClustered

Category: Layer

Short description: PriorBoxClustered layer generates prior boxes of specified sizes normalized to the input image size.

Parameters: PriorBoxClustered layer parameters should be specified as the data node, which is a child of the layer node.

Inputs:

Mathematical Formulation

PriorBoxClustered computes coordinates of prior boxes by following:

  1. Calculates the center_x and center_y of prior box:

    \[ W \equiv Width \quad Of \quad Image \]

    \[ H \equiv Height \quad Of \quad Image \]

    \[ center_x=(w+offset)*step \]

    \[ center_y=(h+offset)*step \]

    \[ w \subset \left( 0, W \right ) \]

    \[ h \subset \left( 0, H \right ) \]

  2. For each $s \subset \left( 0, W \right )$ calculates the prior boxes coordinates:

    \[ xmin = \frac{center_x - \frac{width_s}{2}}{W} \]

    \[ ymin = \frac{center_y - \frac{height_s}{2}}{H} \]

    \[ xmax = \frac{center_x - \frac{width_s}{2}}{W} \]

    \[ ymax = \frac{center_y - \frac{height_s}{2}}{H} \]

    If clip is defined, the coordinates of prior boxes are recalculated with the formula: $coordinate = \min(\max(coordinate,0), 1)$

Example

<layer ... type="PriorBoxClustered">
<data clip="0" flip="0" height="44.0,10.0,30.0,19.0,94.0,32.0,61.0,53.0,17.0" offset="0.5" step="16.0" variance="0.1,0.1,0.2,0.2"
width="86.0,13.0,57.0,39.0,68.0,34.0,142.0,50.0,23.0"/>
<input>
...
</input>
<output>
...
</output>
</layer>

Proposal Layer

Back to top

Name: Proposal

Category: Layer

Short description: Proposal layer performs filtering of only those bounding boxes and outputs with the highest confidence of prediction.

Parameters: Proposal layer parameters should be specified as the data node, which is a child of the layer node. The layer has three inputs: blob with probabilities whether particular bounding box corresponds to background and foreground, blob with logits for each of the bounding boxes, blob with input image size: [image_height, image_width, scale_height_and_width] or [image_height, image_width, scale_height, scale_width].

Mathematical Formulation

Proposal layer accepts three inputs with four dimensions. The produced blob has two dimensions: first one equals batch_size * post_nms_topn. Proposal layer does the following with the input blob:

  1. Generates initial anchor boxes Left top corner of all boxes in (0, 0). Width and height of boxes are calculated from base_size with scale and ratio parameters
  2. For each point in the first input blob:
    • pins anchor boxes to the image according to the second input blob that contains four deltas for each box: for x and y of center, for width and for height
    • finds out score in the first input blob
  3. Filters out boxes with size less than min_size
  4. Sorts all proposals (box, score) by score from highest to lowest
  5. Takes top pre_nms_topn proposals
  6. Calculates intersections for boxes and filter out all with $intersection/union > nms\_thresh$
  7. Takes top post_nms_topn proposals
  8. Returns top proposals

Inputs:

Example

<layer ... type="Proposal" ... >
<data base_size="16" feat_stride="16" min_size="16" nms_thresh="0.6" post_nms_topn="200" pre_nms_topn="6000"
ratio="2.67" scale="4.0,6.0,9.0,16.0,24.0,32.0"/>
<input> ... </input>
<output> ... </output>
</layer>

PSROIPooling Layer

Back to top

Name: PSROIPooling

Category: Pool

Short description: PSROIPooling layer compute position-sensitive pooling on regions of interest specified by input.

Detailed description: Reference

Parameters: PSRoiPooling layer parameters should be specified as the data node, which is a child of the layer node. PSROIPooling layer takes two input blobs: with feature maps and regions of interests (box coordinates). The latter are specified with 5 element tuples: [batch_id, x_1, y_1, x_2, y_2]. ROIs coordinates are specified in absolute values for the "average" mode and in normalized values (to [0,1] interval) for bilinear interpolation.

Inputs:

Example

<layer ... type="PSROIPooling" ... >
<data group_size="6" mode="bilinear" output_dim="360" spatial_bins_x="3" spatial_bins_y="3" spatial_scale="1"/>
<input>
<port id="0">
<dim>1</dim>
<dim>3240</dim>
<dim>38</dim>
<dim>38</dim>
</port>
<port id="1">
<dim>100</dim>
<dim>5</dim>
</port>
</input>
<output>
<port id="2">
<dim>100</dim>
<dim>360</dim>
<dim>6</dim>
<dim>6</dim>
</port>
</output>
</layer>

Quantize Layer

Back to top

Name: Quantize

Category: Layer

Short description: Element-wise linear quantization of floating point input values into a descrete set of floating point values.

Detailed description: Input and output ranges as well as number of levels of quantization are specified by dedicated inputs and attributes. There can be different limits for each element or groups of elements (channels) of the input blobs. Otherwise, one limit applies to all elements. It depends on shape of inputs that specify limits and regular broadcasting rules applied for input blobs. The output of the operator is floating point number of the same type as input blob. In general there are four values that specify quantization for each element: input_low, input_high, output_low, output_high. Values input_low and input_high specifies the input range of quantization. All input values, that are outside this range, clipped to the range before actual quantization. Values output_low and output_high define minimum and maximum quantized values at the output.

Parameters: Quantize layer parameters should be specified as the data node, which is a child of the layer node.

Inputs:

Mathematical Formulation

Each element of the output is defined as the result of the following expression:

if x <= input_low:
output = output_low
elif x > input_high:
output = output_high
else:
# input_low < x <= input_high
output = round((x - input_low) / (input_high - input_low) * (levels-1)) / (levels-1) * (output_high - output_low) + output_low

Range Layer

Back to top

Name: Range

Category: Layer

Short description: Range sequence of numbers according input values.

Detailed description: Range layers generates sequence of numbers starting from the value specified in the first input up to but not including the value in the second input with step equal to the value in the third input.

Parameters: Range layer has no parameters.

Inputs:

Example

<layer ... type="Range">
<input>
<port id="0"/>
<port id="1"/>
<port id="2"/>
</input>
<output>
<port id="3">
<dim>10</dim>
</port>
</output>
</layer>

RegionYolo Layer

Back to top

Name: RegionYolo

Category: Layer

Short description: RegionYolo computes coordinates of regions with probability for each class.

Detailed description: [Reference][p_yolo]

Parameters: RegionYolo layer parameters should be specified as the data node, which is a child of the layer node.

Inputs:

Example

<layer ... type="RegionYolo" ... >
<data axis="1" classes="80" coords="4" do_softmax="0" end_axis="3" mask="0,1,2" num="9"/>
<input> ... </input>
<output> ... </output>
<weights .../>
</layer>

ReLU Layer

Back to top

Name: ReLU

Category: Activation

Short description: Reference

Detailed description: Reference

Parameters: ReLU layer parameters can be (not mandatory) specified in the data node, which is a child of the layer node.

Mathematical Formulation

\[ Y_{i}^{( l )} = max(0, Y_{i}^{( l - 1 )}) \]

Inputs:

Example

<layer ... type="ReLU" ... >
<data negative_slope="0.100000"/>
<input> ... </input>
<output> ... </output>
</layer>

ReorgYolo Layer

Back to top

Name: ReorgYolo

Category: Layer

Short description: ReorgYolo reorganizes input blob taking into account strides.

Detailed description: [Reference][p_yolo]

Parameters: ReorgYolo layer parameters should be specified as the data node, which is a child of the layer node.

Inputs:

Example

<layer ... type="ReorgYolo" ... >
<data stride="1"/>
<input> ... </input>
<output> ... </output>
</layer>

Resample (Type 1]) Layer

Back to top

Name: Resample

Category: Layer

Short description: Resample layer scales the input blob by the specified parameters.

Parameters: Resample layer parameters should be specified as the data node, which is a child of the layer node. Resample Type 1 layer has one input blob containing image to resample.

Inputs:

Example

<layer type="Resample">
<data antialias="0" factor="2" type="caffe.ResampleParameter.LINEAR"/>
<input>
<port id="0">
<dim>1</dim>
<dim>3</dim>
<dim>25</dim>
<dim>30</dim>
</port>
</input>
<output>
<port id="1">
<dim>1</dim>
<dim>3</dim>
<dim>50</dim>
<dim>60</dim>
</port>
</output>
​</layer>

Resample (Type 2]) Layer

Back to top

Name: Resample

Category: Layer

Short description: Resample layer scales the input blob by the specified parameters.

Parameters: Resample layer parameters should be specified as the data node, which is a child of the layer node. Resample Type 2 layer has two input blobs containing image to resample and output dimensions.

Inputs:

Example

<layer type="Resample">
<data antialias="0" factor="1" type="caffe.ResampleParameter.LINEAR"/>
<input>
<port id="0">
<dim>1</dim>
<dim>3</dim>
<dim>25</dim>
<dim>30</dim>
</port>
<port id="1">
<dim>4</dim>
</port>
</input>
<output>
<port id="2">
<dim>1</dim>
<dim>3</dim>
<dim>50</dim>
<dim>60</dim>
</port>
</output>
​</layer>

Reshape Layer

Back to top

Name: Reshape

Category: Layer

Short description: Reshape layer changes dimensions of the input blob according to the specified order. Input blob volume is equal to output blob volume, where volume is the product of dimensions.

Detailed description: Reference

Parameters: Reshape layer does not have parameters. Reshape layer takes two input blobs: the blob to be resized and the output blob shape. The values in the second blob could be -1, 0 and any positive integer number. The two special values -1 and 0:

Inputs:

Example

<layer ... type="Reshape" ...>
<input>
<port id="0">
<dim>2</dim>
<dim>5</dim>
<dim>5</dim>
<dim>24</dim>
</port>
<port id="1">
<dim>3</dim>
</port>
</input>
<output>
<port id="2">
<dim>2</dim>
<dim>150</dim>
<dim>4</dim>
</port>
</output>
</layer>

ReverseSequence Layer

Back to top

Name: ReverseSequence

Category: Layer

Short description: ReverseSequence reverses variable length slices of data.

Detailed description: ReverseSequence first slices input along the dimension batch_axis, and for each slice i, reverses the first lengths[i] (the second input) elements along the dimension seq_axis.

Parameters: ReverseSequence layer parameters should be specified as the data node, which is a child of the layer node.

Inputs:

Example

<layer ... type="ReverseSequence">
<data batch_axis="0" seq_axis="1"/>
<input>
<port id="0">
<dim>3</dim>
<dim>10</dim>
<dim>100</dim>
<dim>200</dim>
</port>
<port id="1">
<dim>10</dim>
</port>
</input>
<output>
<port id="2">
<dim>3</dim>
<dim>10</dim>
<dim>100</dim>
<dim>200</dim>
</port>
</output>
</layer>

RNNCell Layer

Back to top

Name: RNNCell

Category: Layer

Short description: RNNCell layer computes the output using the formula described in the article.

Parameters: RNNCell layer parameters should be specified as the data node, which is a child of the layer node.

Inputs

Outputs


ROIPooling Layer

Back to top

Name: ROIPooling

Category: Pool

Short description: It is a pooling layer used over feature maps of non-uniform input sizes and outputs another feature map of a fixed size.

Detailed description: deepsense.io reference

Parameters: Specify ROIPooling layer parameters in the data node, which is a child of the layer node.

Inputs:

Mathematical Formulation

\[ output_{j} = MAX\{ x_{0}, ... x_{i}\} \]

Example

<layer ... type="ROIPooling" ... >
<data pooled_h="6" pooled_w="6" spatial_scale="0.062500"/>
<input> ... </input>
<output> ... </output>
</layer>

ScaleShift Layer

Back to top

Name: ScaleShift

Category: Layer

Short description: ScaleShift layer performs linear transformation of the input blobs. Weights denote scaling parameter, biases - a shift.

Parameters: ScaleShift layer does not have additional parameters.

Inputs:

Mathematical Formulation

\[ o_{i} =\gamma b_{i} + \beta \]

Example

<layer ... type="ScaleShift" ... >
<input> ... </input>
<output> ... </output>
</layer>

Shape Layer

Back to top

Name: Shape

Category: Layer

Short description: Shape produces blob with the input blob shape.

Parameters: Shape layer has no parameters.

Inputs:

Example

<layer ... type="Shape">
<input>
<port id="0">
<dim>2</dim>
<dim>3</dim>
<dim>224</dim>
<dim>224</dim>
</port>
</input>
<output>
<port id="1">
<dim>4</dim>
</port>
</output>
</layer>

ShuffleChannels Layer

Back to top

Name: ShuffleChannels

Category: Layer

Short description: ShuffleChannels permutes data in the channel dimension of the input blob.

Parameters: ShuffleChannels layer parameters should be specified as the data node, which is a child of the layer node.

Inputs:

Mathematical Formulation

The operation is equivalent with the following transformation of the input blob x of shape [N, C, H, W]:

x' = reshape(x, [N, group, C / group, H * W])
x'' = transpose(x', [0, 2, 1, 3])
y = reshape(x'', [N, C, H, W])

where group is the layer parameter described above. Example

<layer ... type="ShuffleChannels" ...>
<data group="3" axis="1"/>
<input>
<port id="0">
<dim>3</dim>
<dim>12</dim>
<dim>200</dim>
<dim>400</dim>
</port>
</input>
<output>
<port id="1">
<dim>3</dim>
<dim>12</dim>
<dim>200</dim>
<dim>400</dim>
</port>
</output>
</layer>

SimplerNMS Layer

Back to top

Name: SimplerNMS

Category: Layer

Short description: SimplerNMS layer performs filtering of bounding boxes and outputs only those with the highest confidence of prediction.

Parameters: SimplerNMS layer parameters should be specified as the data node, which is a child of the layer node.

Inputs:

Mathematical Formulation

SimplerNMS accepts three inputs with four dimensions. Produced blob has two dimensions, the first one equals post_nms_topn. SimplerNMS does the following with the input blob:

  1. Generates initial anchor boxes. Left top corner of all boxes is (0, 0). Width and height of boxes are calculated based on scaled (according to the scale parameter) default widths and heights
  2. For each point in the first input blob:
    • pins anchor boxes to picture according to the second input blob, which contains four deltas for each box: for x and y of center, for width, and for height
    • finds out score in the first input blob
  3. Filters out boxes with size less than min_bbox_size.
  4. Sorts all proposals (box, score) by score from highest to lowest
  5. Takes top pre_nms_topn proposals
  6. Calculates intersections for boxes and filters out all with $intersection/union > iou\_threshold$
  7. Takes top post_nms_topn proposals
  8. Returns top proposals

Example

<layer ... type="SimplerNMS" ... >
<data iou_threshold="0.700000" min_bbox_size="16" feat_stride="16" pre_nms_topn="6000" post_nms_topn="150"/>
<input> ... </input>
<output> ... </output>
</layer>

Slice Layer

Back to top

Name: Slice

Category: Layer

Short description: Slice layer splits the input blob into several pieces over the specified axis.

Parameters: Slice layer parameters should be specified as the data node, which is a child of the layer node.

Inputs:

Example

<layer ... type="Slice" ...>
<data axis="1"/>
<input>
<port id="0">
<dim>1</dim>
<dim>1048</dim>
<dim>14</dim>
<dim>14</dim>
</port>
</input>
<output>
<port id="1">
<dim>1</dim>
<dim>1024</dim>
<dim>14</dim>
<dim>14</dim>
</port>
<port id="2">
<dim>1</dim>
<dim>24</dim>
<dim>14</dim>
<dim>14</dim>
</port>
</output>
</layer>

SoftMax Layer

Back to top

Name: SoftMax

Category: Activation

Short description: Reference

Detailed description: Reference

Parameters: SoftMax layer parameters can be (not mandatory) specified in the data node, which is a child of the layer node.

Mathematical Formulation

\[ y_{c} = \frac{e^{Z_{c}}}{\sum_{d=1}^{C}e^{Z_{d}}} \]

where $C$ is a number of classes

Example

<layer ... type="SoftMax" ... >
<data axis="1" />
<input> ... </input>
<output> ... </output>
</layer>

Inputs:


Split Layer

Back to top

Name: Split

Category: Layer

Short description: Split layer splits the input along the specified axis into several output pieces.

Detailed description: Reference

Parameters: Split layer parameters should be specified in the data node, which is a child of the layer node.

Mathematical Formulation

For example, blob is BxC+CxHxW and "axis=1", "num_split=2". Then sizes of output blobs are BxCxHxW.

Inputs:

Example

<layer ... type="Split" ... >
<data axis="0" num_split="2"/>
<input> ... </input>
<output> ... </output>
</layer>

StridedSlice Layer

Name: StridedSlice

Short description: StridedSlice layer extracts a strided slice of a blob. It is similar to the generalized array indexing in Python*.

Parameters:

Inputs:

Example

<layer ... type="StridedSlice" ...>
<data begin_mask="0,1,0,0,0" ellipsis_mask="0,0,0,0,0" end_mask="0,1,0,0,0" new_axis_mask="0,0,0,0,0" shrink_axis_mask="0,1,0,0,0"/>
<input>
<port id="0">
<dim>1</dim>
<dim>2</dim>
<dim>384</dim>
<dim>640</dim>
<dim>8</dim>
</port>
<port id="1">
<dim>5</dim>
</port>
<port id="2">
<dim>5</dim>
</port>
<port id="3">
<dim>5</dim>
</port>
</input>
<output>
<port id="4">
<dim>1</dim>
<dim>384</dim>
<dim>640</dim>
<dim>8</dim>
</port>
</output>
</layer>

TensorIterator Layer

Back to top

Name: TensorIterator

Category: Layer

Short description: TensorIterator (TI) layer performs recurrent sub-graph execution iterating through the data.

Parameters: port_map and back_edges sections specifying data mapping rules:

Example

<layer ... type="Power" ... >
<input> ... </input>
<output> ... </output>
<port_map>
<input external_port_id="0" internal_layer_id="0" internal_port_id="0" axis="1" start="-1" end="0" stride="-1"/>
<input external_port_id="1" internal_layer_id="1" internal_port_id="1"/>
...
<output external_port_id="3" internal_layer_id="2" internal_port_id="1" axis="1" start="-1" end="0" stride="-1"/>
...
</port_map>
<back_edges>
<edge from-layer="1" from-port="1" to-layer="1" to-port="1"/>
...
</back_edges>
<body>
<layers> ... </layers>
<edges> ... </edges>
</body>
</layer>

Tile Layer

Back to top

Name: Tile

Category: Layer

Short description: Tile layer extends input blob with copies of data along specific axis.

Detailed description: Reference

Parameters: Tile layer parameters should be specified as the data node, which is a child of the layer node.

Mathematical Formulation

Tile extends input blobs and filling in output blobs following rules:

\[ out_i=input_i[inner\_dim*t] \]

\[ t \in \left ( 0, \quad tiles \right ) \]

Inputs:

Example

<layer ... type="Tile" ... >
<data axis="3" tiles="88"/>
<input> ... </input>
<output> ... </output>
</layer>