This is a lightweight landmarks regressor for the Smart Classroom scenario. It has a classic convolutional design: stacked 3x3 convolutions, batch normalizations, PReLU activations, and poolings. Final regression is done by the global depthwise pooling head and FullyConnected layers. The model predicts five facial landmarks: two eyes, nose, and two lip corners.
|Mean Normed Error (on VGGFace2)||0.0705|
|Face location requirements||Tight crop|
Normed Error (NE) for ith sample has the following form:
where N is the number of landmarks, p-hat and p are, correspondingly, the prediction and ground truth vectors of kth landmark of ith sample, and di is the interocular distance for ith sample.
1, 3, 48, 48 in the format
B, C, H, W, where:
B- batch size
C- number of channels
H- image height
W- image width
The expected color order is
The net outputs a blob with the shape:
1, 10, containing a row-vector of 10 floating point values for five landmarks coordinates in the form (x0, y0, x1, y1, ..., x4, y4). All the coordinates are normalized to be in range [0, 1].
[*] Other names and brands may be claimed as the property of others.