How to Implement TensorFlow Facial Recognition From Scratch

Karthik Shiraly
December 8, 2022

Face recognition has acquired a rather negative image over time, thanks to government and corporate surveillance. But did you know that it can also be used positively in personal spaces to improve your and your family's quality of life while avoiding all the privacy concerns? Today's open-source software and consumer hardware enable anyone to use face recognition privately for their benefit.

In this article, learn about the positive uses of face recognition and understand how to implement TensorFlow facial recognition from scratch.

What Is Face Recognition? What Can You Use It For?

Face recognition is the machine learning task of identifying a person from their face. It's essentially a specialized type of image classification that answers the question "who is this person and what's their name" based on their facial features.

You can use face recognition in your personal spaces to improve your quality of life in many ways:

  • At home: Voice assistants, home automation systems, and other such personal digital assistants can use face recognition & artificial intelligence to personalize their interactions and behaviors for each individual.
  • Health care: Devices can offer customized health advice or personalized care based on the person they're interacting with.
  • Assisted living: People with memory problems caused by dementia or Alzheimer's can use face recognition devices to help them recall the names of their kin without having to awkwardly ask.
  • Education: Educational voice assistants at home or school can identify a child using face recognition and interact differently based on the child's learning style.
  • Discovering ancestry: Facial recognition can be used on old photographs to discover family trees and long-lost relatives. All facial recognition systems are capable of face matching and facial similarity assessment even when the identity of the person is unknown.

That said, face recognition is not essential for identification by any means. While people rely on faces for identification, computers are capable of doing so from other subtle aspects like a person's gait, hair texture, body shape, patterns in their behavior or schedule, speech patterns, and more.

In many ways, face recognition can be less reliable compared to other methods because of the complexities of real-world conditions. So before implementing it, analyze alternative approaches that may be easier, friendlier, and more reliable.

Concepts of Face Recognition Using Deep Learning

Before learning to implement face recognition, you must become familiar with some concepts and terminology:

  • Face recognition: The task of associating a name or other label with a face.
  • Face verification: The task of determining if two face images are of the same person or not without necessarily knowing the identity of either person
  • Closed-set face recognition: Almost all the faces the system will see are already known to it from its training
  • Open-set face recognition: Most of the faces the system will see are not known to it from its training. It can only do face verification and facial similarity assessment between the face it's seeing and the faces it's already seen.
  • Face detection: Determining where one or more faces are present in an image.
  • Convolutional neural network (CNN): A family of deep learning architectures for neural networks that specialize in computer vision tasks.
  • Siamese network: A special neural network that consists of two identical and conjoined networks to assess images in pairs. They're useful for image-matching tasks.
  • Contrastive loss: A loss function that's used to cluster similar faces close to each other in an embedding face and maximize their distances from dissimilar faces. Very useful for reliable face recognition when there is a very large number of faces to identify.
  • Triplet loss: A loss function that's used when the only type of training data available are pair similarities. Every training sample consists of a pair of images where the only information given is whether they're similar or not. From that data, the Triplet Loss minimizes the distance between an anchor and a positive image, both sharing the same identity in reality while maximizing the distance between the anchor and a negative image with a different identity.

Tutorial Roadmap | Facial Recognition System

We explore face recognition using the TensorFlow learning framework systematically in three steps with a focus on using it for positive ends in our personal spaces:

  1. Get a simple TensorFlow face recognition model up and running quickly
  2. Fine-tune it on a custom dataset for closed-set personal face recognition
  3. Port it to TensorFlow Lite for smartphone usage

We'll use TensorFlow 2.0's Keras high-level application programming interfaces (APIs) and Python for all these experiments.

Getting Started With TensorFlow Face Recognition

The easiest way to get started is with a pre-trained face recognition model. These are models that are already trained on large face datasets and published for use by others.

Some good examples of ready-to-use pre-trained models include:

  • Keras-vggface: It publishes pre-trained Keras models based on three different base architectures — the venerable VGG-16, the versatile ResNet, and the efficient SENet. The models implement the 2015 paper on deep face recognition and are pre-trained on the VGGFace2 dataset. The code is a bit outdated and unmaintained now but is simple enough to understand and upgrade to new TensorFlow versions without any hassle.
  • FaceNet: This popular project implements Google's 2015 FaceNet paper and provides two pre-trained models, one based on the capable Inception-ResNet-v2 architecture and the other on the efficient SqueezeNet architecture.

In this tutorial, we'll start with keras-vggface because it's simple and good enough for the small-scale closed-set face recognition we want to implement in our homes or other private spaces.

1. Setup

Install the keras-vggface machine learning model from GitHub. Install MTCNN to detect faces. Import everything we'll be using.

Code example for setting up keras vggface

2. Load the Pre-Trained Models

Load the three pre-trained models. The pre-trained weights are automatically downloaded from its release page and cached on your system. You need just one model but comparing their architectures can be illuminating.

Setting up VGGFace

On examining each model's summary, downloaded weights, input, and output, we find the following:

Print out the layers of the model
  • VGG-16: It's a hefty 145 million parameters with a 500MB model file and is trained on a dataset of 2,622 people.
  • ResNet50: It's 3x lighter at 41 million parameters with a 160MB model but can identify 4x the number of people at 8,631.
  • SENet50: It's comparable to ResNet50 at 43 million parameters with a 170MB model and the same number of people, 8,631.

So what are these models capable of? They are full-fledged face classifiers but that ability to classify is just an outcome of their real strength — the ability to extract every little visual feature of the human face and encode it in mathematical form.

Their convolutional fully connected layers enable them to detect visual features of faces like shapes, colors, contours, textures, spatial arrangements of facial parts, and more. Each model is really a powerful feature extractor with a classification unit tacked on.

3. Can the Models Recognize a Face in an Unseen Photo?

We can check their ability to recognize a face in a photo that's not part of their training set. Since they're all closed-set recognizers as of now, we must first find a person they already know from their training and then see if they can recognize that person's face in a photo they haven't trained on.

Printing the models

We download such a photo and apply the MTCNN face detector on it to return a bounding box that crops out everything outside the face. MTCNN performs exceptionally well compared to traditional face detection.

face recognition example
Face detection (Source)

After a bit of image resizing to conform to the model, we run inference on the face. 

In less than two seconds, it correctly identifies a face in a photo it has never seen before, and it does so with 95%+ confidence without getting confused at all by 8,630 other faces!

results of the above model

Face Recognition at Home With Your Personal Photos

Most personal use cases for face recognition have simple goals. Perhaps you want your Raspberry Pi-based home automation system to customize the room temperature or ambient lighting based on who's in the room. For such a task, the model must identify just about 3-4 people at home, perhaps a dozen at max.

Our main reason for starting with pre-trained models is that you can train them quickly to recognize faces of personal interest to you at home, in a health care setting, or in some other personal space.

The face recognizer models are already powerful facial feature extractors. All we have to do is coax their last few classifier stages to associate those features with a new set of faces and names. The two tricks to do that: transfer learning and fine-tuning.

What Are Transfer Learning and Fine-Tuning?

The pre-trained models we saw are trained to recognize between 2,600-8,600 faces of public figures. But in reality, only about 20% of all the information in those models is related to those 2,600-8,600 faces. The remaining 80% of the information packed in those models is for general facial feature extraction that works equally well on all 8 billion human faces.

Transfer learning's goal is to retain that 80% general information and replace just the last 20% with the information of the new faces. More specifically, it retains the entire stack of convolutional and residual blocks that power general feature extraction and replaces just the pre-trained classification layers at the top with a new set of classification layers trained on your custom face dataset.

Fine-tuning is very similar to transfer learning. Like transfer learning, it too replaces the pre-trained classification layers with a new set and trains them. But instead of leaving the entire feature extraction stack untouched, it includes some of the top layers of that stack too in the training.

In the sections below, we'll demonstrate how to implement them.

Prepare Your Dataset

The first step is to prepare a custom dataset of your personal photos. We'll simulate it here by using a dataset of public figures called PubFig but using the photos of just two public figures for the training. Our model just has to learn to identify those two people correctly and identify everyone else, including the faces in its pre-training, as unknown.

You just need to gather about 20-30 images per person of interest. Plus, include about 100 images of random people and random non-faces (e.g., vehicles, scenery, cartoon faces, or animal faces) to represent unknown faces.

It's essential to organize them all in a directory structure like this:

training data breakdown

  • Create a top-level training directory.
  • Each person's name/label should be a subdirectory.
  • The images of each person's face should be under their respective subdirectory. All the images should match the dimensions of "model.inputs."
  • One subdirectory should have all the random photos to serve as the unknown or invalid image set.

Set up test and validation directories too if you want to be thorough about the accuracy, but for most practical personal uses, they're quite unnecessary.

Next, tell TensorFlow about your custom dataset along with batch size, shuffling options, and image size. It automatically infers the labels.

pubfig train

Set Up Data Augmentation

Most photos of faces tend to have the same orientation, with the faces somewhere near the top of the photo while the person is standing up. But in informal personal spaces, a camera may encounter faces with many other orientations like sleeping, bending, exercising face down, or with a disheveled look.

Data augmentation is a technique to automatically augment a custom dataset with random variations in orientation, rotation, colors, brightness levels, motion blurs, and other such real-world conditions. We can tell TensorFlow to automatically apply such random variations to images while training. They make your face recognizer more robust.

data augmentation keras sequential

Load the Pre-Trained Base Model

The base model is the core feature extractor portion of a model without its top classifier units. For your convenience, keras-vggface provides a separate set of pre-trained base models without classifier layers.

VGG Face Resnet Base

Set Up the Custom Model for Transfer Learning

The custom model we're going to train using transfer learning is just a pile of all the units we've already seen:

keras Flatten
  • Image inputs: First is the image input as in the original model.
  • Data augmentation unit: They're followed by the data augmentation stack to create image variations while training.
  • Pre-trained base model: Then comes the pre-trained base model which is nothing but the core feature extractor portions of the original model. Note that its "trainable" flag is set to false for transfer learning. This is called "freezing" the base model. We're essentially telling TensorFlow to leave that entire set of layers untouched while training on the new data.
  • Flatten layer: It just flattens all the three-dimensional convolutional maps coming from the base model into a long one-dimensional vector because the next layer expects that format.
  • Classification layer: The final layer is our replacement classification layer. Its size is no longer the 2,622 or 8,631 of the original model. Instead, we set its size to the number of people we want to identify plus one more for the unknown and invalid images.
VGG Face Resnet
Model with multiple layers

Notice how this model just has to train 6,147 network weights for the new set of faces while leaving 23.5 million of the base model untouched. That's not even 20% but just about 0.026% of the information. That still turns out to be sufficient.

Train the Custom Model Using Transfer Learning

The custom model is trained for several epochs like any other classifier model.

Custom VGG Model

Recognize Faces Using the Custom Model

The accuracy seems encouraging even with just 25-30 training photos. But how well does our custom model actually identify the new faces? We simulate this by downloading unseen photos of the two public figures and identifying them with our model.

Obama Face ROI

The custom model correctly identifies the public figure with high probability (they're the second name out of three in our custom dataset).

The other public figure is also identified correctly with high probability (they're the first name in our custom dataset).

image example

Face Recognition on Your Smartphone, Tablet, or IoT Device Using TensorFlow Lite

Many use cases become possible only if face recognition can run on the portable devices that we carry around with us. You can set up a tablet or Raspberry Pi in every room to identify the person and personalize their home automation experience.

The trick is to make the model small and lightweight to run on a resource-constrained device like a tablet or a Pi.

TensorFlow provides a framework called TensorFlow Lite (TFLite) for this. It uses several tricks like quantizing real numbers to integers and optimizing neural network layers to create a lightweight version of a model that can function like the original albeit with reduced accuracy. TFLite can compress by a factor of 2-3; a 150MB ResNet face recognizer model reduces to about 40MB.

Convert TensorFlow and Keras Models to TensorFlow Lite

TFLite uses a different storage format than TensorFlow core. The TensorFlow framework provides both programmatic access and command-line tools to convert a mainstream model into an equivalent TFLite model with optional optimizations.

VGG Resnet Quant TFlite

Use the Lite Model From an Android or iOS App

Tensorflow lite model
TFLite on Android (Source)

Bundle the converted TFLite model as part of your mobile app and deploy it on a smartphone or tablet. You can integrate the device's camera and video capabilities for real-time face recognition.

Use the Same Lite Model in Any Web Application

TensorFlow provides TensorFlow.js and tfjs-tflite for both client-side and server-side JavaScript components to load and run TensorFlow and TensorFlow Lite models. Since a web application can run on any mobile device, this is an excellent approach to deploying face recognition models without implementing native apps.

Use the Same Lite Model on a Raspberry Pi

Any single-board computer that can run Python in a Linux environment can use a TFLite model. The framework supports both learning and inference on resource-constrained CPUs and provides an interpreter API to use the model.

interpreter tf.lite.Interpreter

TensorFlow Facial Recognition Is Just a Subset of Its Powerful Computer Vision Capabilities

What you learned here is just a glimpse of TensorFlow's capabilities. New state-of-the-architectures like vision transformers are being applied to the facial recognition domain too and may uncover new ways of thinking about the domain. TensorFlow remains the machine learning (ML) framework of choice for ML engineers, including us, because of its deployment features like TensorFlow Serving. Contact us if you're looking for robust face recognition systems in your business or home!

width.ai logo