Research/Blog
Face Recognition with MTCNN and FaceNet
- December 4, 2019
- Posted by: vsinghal
- Category: Auto and Manufacturing Computer Vision Deep Learning Security
#CellStratAILab #disrupt4.0 #WeCreateAISupertars
Last Saturday, CellStrat AI Lab Team Lead Niraj Kale presented an intuitive hands-on workshop on Face Recognition with MTCNN and FaceNet algorithms. The session included a theory presentation along with an extensive hands-on code workshop.
![](http://www.cellstrat.com/wp-content/uploads/2019/12/Collage-1024x768.jpg)
This model has two networks at play. First, the MTCNN localizes the face by creating a bounding box around it. Next the FaceNet identifies the face in the bounding box.
![](http://www.cellstrat.com/wp-content/uploads/2019/12/Face-Recognition-with-MTCNN-and-FaceNet-1024x307.png)
MTCNN has three convolutional networks (P-Net, R-Net, and O-Net) and is able to outperform many face-detection benchmarks while retaining real-time performance. This gives a bounding box around a face which is used as input to the FaceNet algorithm.
The first network P-Net (Proposal Network) in MTCNN is an FCN (fully-convolutional network). This creates multiple scaled copies of the image and proposes candidate windows (which contain the face) and bounding box coordinates for these windows. Non-Maximum Suppression (NMS) is used to merge highly overlapped candidates.
![](http://www.cellstrat.com/wp-content/uploads/2019/12/NMS.png)
Then all candidate windows from P-Net are fed to another CNN – the R-Net (Refine Network), which further rejects a large number of false candidates, calibrates bounding boxes and performs NMS again.
Finally, O-Net further reduces those bounding boxes that have low confidence scores, finds five facial landmarks, calibrates bounding boxes and landmarks further, and then does NMS.
The learning objective of MTCNN is a multi-task loss, with one loss as binomial cross-entropy loss (probability that box has a face), the second one as Euclidean distance loss for bounding box regression (prediction vs ground truth) and third one is Euclidean loss for facial landmark regression. The 3 losses are weighted and summed up in a cumulative multi-task formula.
The output of MTCNN is fed to FaceNet for face recognition in the bounding box.
FaceNet is a face recognition system developed in 2015 by researchers at Google that achieved then state-of-the-art results on a range of face recognition benchmark datasets. The FaceNet system can be used to extract high-quality features from faces, called face embeddings, that can then be used to train a face identification system. Finally, an SVM classifier is used to identify the face in the last stage.
The face embeddings are lower dimensional feature vectors that directly correspond to a measure of face similarity. Embeddings with dimension 128 perform better than other sizes. FaceNet embeddings are useful for face verification, face recognition/classification and clustering.
![](http://www.cellstrat.com/wp-content/uploads/2019/12/FaceNet.png)
The triplet loss involves comparing face embeddings for three images, one being a anchor (reference) image, second one the positive image (matching the anchor) and the third one negative image (not matching the anchor).
The embeddings are learnt by a Deep CNN network such that the positive embedding is closer to anchor embedding compared to negative embedding distance to the anchor i.e. F(A)-F(P)+margin < F(A)-F(N). This equation forms the basis of multi-task loss function that the FaceNet uses.
![](http://www.cellstrat.com/wp-content/uploads/2019/12/Triplet-1-5.jpg)
![](http://www.cellstrat.com/wp-content/uploads/2019/12/Triplet-2-3.png)
The triplet selection follows an interesting logic. Simply picking triplets that satisfy above constraint will lead to insignificant learning. So one starts with triplets that violate the above constraint – this leads to faster convergence.
Niraj also showed how to make this model compatible for a mobile app. A TFLite converter achieves the requisite model compression, which is then run by a TFLite Interpreter on a mobile device. This process involves some best practices such as quantization of weights and activations (to integral values which take less memory space), tweaking number of threads, device-specific hardware accelerators etc.
CellStrat AI Lab is India’s largest Open AI Lab and we are growing rapidly. I invite you to check out our AI Lab and AI/ML skilling program this Saturday in BLR / Gurugram :-
Bellandur BLR AI Lab meetup :-
Register : https://www.meetup.com/Disrupt-4-0/events/vcqljryzqbkb/
Topic : Pose Estimation with OpenPose, Generative Modelling with CycleGAN
Presenters : Shreeyash Pawar, Gouthaman Asokan, Jani Basha
Hebbal BLR AI Lab meetup :-
Register : https://www.meetup.com/Disrupt-4-0/events/265386342/
Topic : Recommender Systems, Intro to RL, Intro to GANs
Presenters : Gurumoorthy Loganathan, Salim Ansari, Abdul Azeez
Gurugram AI Lab meetup :-
Register : https://www.meetup.com/Disrupt-4-0/events/266826388/
Topic : Image Descriptors, BERT NLP
Presenters : Sonal Kukreja, Avni Gupta
See you this weekend for the AI Lab Workshop ! Let’s disrupt the world with AI, together !
Questions ? Call me at +91-9742800566 !
Best Regards,
Vivek Singhal
Co-Founder & Chief Data Scientist, CellStrat
+91-9742800566
Interested in learning facenet tflite model deployment in Android device