This is a lightweight landmarks regressor for the Smart Classroom scenario. It has a classic convolutional design: stacked 3x3 convolutions, batch normalizations, PReLU activations, and poolings. Final regression is done by the global depthwise pooling head and FullyConnected layers. The model predicts five facial landmarks: two eyes, nose, and two lip corners.
|Mean Normed Error (on VGGFace2)||0.0705|
|Face location requirements||Tight crop|
Normed Error (NE) for ith sample has the following form:
where N is the number of landmarks, p-hat and p are, correspondingly, the prediction and ground truth vectors of kth landmark of ith sample, and di is the interocular distance for ith sample.
Name: "data" , shape: [1x3x48x48] - An input image in the format [BxCxHxW], where:
The expected color order is BGR.
[*] Other names and brands may be claimed as the property of others.