This is a network for text recognition scenario. It consists of ResNext50-like backbone (stage-1-2) and bidirectional LSTM encoder-decoder. The network is able to recognize case-insensitive alphanumeric text (36 unique symbols).


Metric Value
Accuracy on the alphanumeric subset of ICDAR13 0.8828
Text location requirements Tight aligned crop
GFlops 0.2726
MParams 1.4187
Source framework PyTorch


Input tensor is imgs. Shape: 1, 1, 32, 120 - An input image in the format B, C, H, W, where:

  • B - batch size
  • C - number of channels
  • H - image height
  • W - image width

Note that the source image should be tight aligned crop with detected text converted to grayscale.


The net outputs 2 blobs

  • logits with the shape 30, 1, 37 in the format W, B, L, where:
    • W - output sequence length
    • B - batch size
    • L - confidence distribution across alphanumeric symbols: "#0123456789abcdefghijklmnopqrstuvwxyz", where # - special blank character for CTC decoding algorithm.

The network output can be decoded by CTC Greedy Decoder or CTC Beam Search decoder.

Use text-detection demo

Model is supported by text-detection c++ demo. In order to use this model in the demo, user should pass the following options:

tr_o_blb_nm "logits"

For more information, please, see documentation of the demo.

