This is a network for handwritten simplified Chinese text recognition scenario. It consists of a VGG16-like backbone, reshape layer and a fully connected layer. The network is able to recognize simplified Chinese text consisting of characters in the SCUT-EPT dataset.
|Accuracy on SCUT-EPT test subset (excluding images wider than 2000px after resized to height 96px with aspect ratio)||75.31%|
This model adopts label error rate as the metric for accuracy.
Grayscale image, name -
actual_input, shape - [1x1x96x2000], format is [BxCxHxW], where:
NOTE: the source image should be resized to specific height (such as 96) while keeping aspect ratio, and the width after resizing should be no larger than 2000 and then the width should be right-bottom padded to 2000 with edge values.
output, shape - [125x1x4059], format is [WxBxL], where:
The network output can be decoded by CTC Greedy Decoder.
[*] Other names and brands may be claimed as the property of others.