This is a network for text recognition scenario. It consists of ResNext101-like backbone (stage-1-2) and bidirectional LSTM encoder-decoder. The network is able to recognize case-insensitive alphanumeric text (36 unique symbols).
|Accuracy on the alphanumeric subset of ICDAR13||0.8887|
|Accuracy on the alphanumeric subset of ICDAR03||0.9077|
|Accuracy on the alphanumeric subset of ICDAR15||0.6908|
|Accuracy on the alphanumeric subset of SVT||0.83|
|Accuracy on the alphanumeric subset of IIIT5K||0.8157|
|Text location requirements||Tight aligned crop|
1, 1, 32, 128 in the format
B, C, H, W, where:
B- batch size
C- number of channels
H- image height
W- image width
Note that the source image should be tight aligned crop with detected text converted to grayscale.
The net output is a blob with name
logits and the shape
16, 1, 37 in the format
W, B, L, where:
W- output sequence length
B- batch size
L- confidence distribution across alphanumeric symbols:
#0123456789abcdefghijklmnopqrstuvwxyz, where # - special blank character for CTC decoding algorithm.
The network output can be decoded by CTC Greedy Decoder or CTC Beam Search decoder.
Model is supported by text-detection c++ demo. In order to use this model in the demo, user should pass the following options:
For more information, please, see documentation of the demo.
[*] Other names and brands may be claimed as the property of others.