**Category**: Matrix multiplication

**Short description**: Generalized matrix multiplication

**Detailed description**

*MatMul* operation takes two tensors and performs usual matrix-matrix multiplication, matrix-vector multiplication or vector-matrix multiplication depending on argument shapes. Input tensors can have any rank >= 1. Two right-most axes in each tensor are interpreted as matrix rows and columns dimensions while all left-most axes (if present) are interpreted as multi-dimensional batch: [BATCH_DIM_1, BATCH_DIM_2,..., BATCH_DIM_K, ROW_INDEX_DIM, COL_INDEX_DIM]. The operation supports usual broadcast semantics for batch dimensions. It enables multiplication of batch of pairs of matrices in a single shot.

Before matrix multiplication, there is an implicit shape alignment for input arguments. It consists of the following steps:

- Applying transpositions specified by optional
`transpose_a`

and`transpose_b`

attributes. Only the two right-most dimensions are transposed, other dimensions remain the same. Transpose attributes are ignored for 1D tensors. - One-dimensional tensors unsqueezing is applied for each input independently. The axes inserted in this step are not included in the output shape.
- If rank of the
**first**input is equal to 1, it is always unsqueezed to 2D tensor**row vector**(regardless of`transpose_a`

) by adding axes with size 1 at ROW_INDEX_DIM, to the**left**of the shape. For example`[S]`

will be reshaped to`[1, S]`

. - If rank of the
**second**input is equal to 1, it is always unsqueezed to 2D tensor**column vector**(regardless of`transpose_b`

) by adding axes with size 1 at COL_INDEX_DIM, to the**right**of the shape. For example`[S]`

will be reshaped to`[S, 1]`

.

- If rank of the
- If ranks of input arguments are different after steps 1 and 2, the tensor with a smaller rank is unsqueezed from the left side of the shape by necessary number of axes to make both shapes of the same rank.
- Usual rules of the broadcasting are applied for batch dimensions.

Temporary axes inserted in **step 2** are removed from the final output shape after multiplying. After vector-matrix multiplication, the temporary axis inserted at ROW_INDEX_DIM is removed. After matrix-vector multiplication, the temporary axis inserted at COL_INDEX_DIM is removed. Output shape of two 1D tensors multiplication `[S] x [S]`

is squeezed to scalar.

Output shape inference logic examples (ND here means bigger than 1D):

- 1D x 1D:
`[X] x [X] -> [1, X] x [X, 1] -> [1, 1] => []`

(scalar) - 1D x ND:
`[X] x [B, ..., X, Y] -> [1, X] x [B, ..., X, Y] -> [B, ..., 1, Y] => [B, ..., Y]`

- ND x 1D:
`[B, ..., X, Y] x [Y] -> [B, ..., X, Y] x [Y, 1] -> [B, ..., X, 1] => [B, ..., X]`

- ND x ND:
`[B, ..., X, Y] x [B, ..., Y, Z] => [B, ..., X, Z]`

Two attributes, `transpose_a`

and `transpose_b`

specify embedded transposition for two right-most dimensions for the first and the second input tensors correspondingly. It implies swapping of ROW_INDEX_DIM and COL_INDEX_DIM in the corresponding input tensor. Batch dimensions and 1D tensors are not affected by these attributes.

**Attributes**

*transpose_a***Description**: transposes dimensions ROW_INDEX_DIM and COL_INDEX_DIM of the 1st input;**False**means no transpose,**True**means transpose. It is ignored if first input is 1D tensor.**Range of values**: False or True**Type**: boolean**Default value**: False**Required**:*no*

*transpose_b***Description**: transposes dimensions ROW_INDEX_DIM and COL_INDEX_DIM of the 2nd input;**False**means no transpose,**True**means transpose. It is ignored if second input is 1D tensor.**Range of values**: False or True**Type**: boolean**Default value**: False**Required**:*no*

**Inputs**:

**1**: Input batch of matrices A. Rank >= 1. Required.**2**: Input batch of matrices B. Rank >= 1. Required.

**Outputs**

**1**: Tensor with results of the multiplication.

**Example**

*Vector-matrix multiplication*

<layer ... type="MatMul">

<input>

<port id="0">

<dim>1024</dim>

</port>

<port id="1">

<dim>1024</dim>

<dim>1000</dim>

</port>

</input>

<output>

<port id="2">

<dim>1000</dim>

</port>

</output>

</layer>

*Matrix-vector multiplication*

<layer ... type="MatMul">

<input>

<port id="0">

<dim>1000</dim>

<dim>1024</dim>

</port>

<port id="1">

<dim>1024</dim>

</port>

</input>

<output>

<port id="2">

<dim>1000</dim>

</port>

</output>

</layer>

*Matrix-matrix multiplication (like FullyConnected with batch size 1)*

<layer ... type="MatMul">

<input>

<port id="0">

<dim>1</dim>

<dim>1024</dim>

</port>

<port id="1">

<dim>1024</dim>

<dim>1000</dim>

</port>

</input>

<output>

<port id="2">

<dim>1</dim>

<dim>1000</dim>

</port>

</output>

</layer>

*Vector-matrix multiplication with embedded transposition of the second matrix*

<layer ... type="MatMul">

<data transpose_b="true"/>

<input>

<port id="0">

<dim>1024</dim>

</port>

<port id="1">

<dim>1000</dim>

<dim>1024</dim>

</port>

</input>

<output>

<port id="2">

<dim>1000</dim>

</port>

</output>

</layer>

*Matrix-matrix multiplication (like FullyConnected with batch size 10)*

<layer ... type="MatMul">

<input>

<port id="0">

<dim>10</dim>

<dim>1024</dim>

</port>

<port id="1">

<dim>1024</dim>

<dim>1000</dim>

</port>

</input>

<output>

<port id="2">

<dim>10</dim>

<dim>1000</dim>

</port>

</output>

</layer>

*Multiplication of batch of 5 matrices by a one matrix with broadcasting*

<layer ... type="MatMul">

<input>

<port id="0">

<dim>5</dim>

<dim>10</dim>

<dim>1024</dim>

</port>

<port id="1">

<dim>1024</dim>

<dim>1000</dim>

</port>

</input>

<output>

<port id="2">

<dim>5</dim>

<dim>10</dim>

<dim>1000</dim>

</port>

</output>

</layer>