Automate Health Care Information Processing With EMR Data Extraction - Our Workflow
We dive deep into the challenges we face in EMR data extraction and explain the pipelines, techniques, and models we use to solve them.
Face recognition has acquired a rather negative image over time, thanks to government and corporate surveillance. But did you know that it can also be used positively in personal spaces to improve your and your family's quality of life while avoiding all the privacy concerns? Today's open-source software and consumer hardware enable anyone to use face recognition privately for their benefit.
In this article, learn about the positive uses of face recognition and understand how to implement TensorFlow facial recognition from scratch.
Face recognition is the machine learning task of identifying a person from their face. It's essentially a specialized type of image classification that answers the question "who is this person and what's their name" based on their facial features.
You can use face recognition in your personal spaces to improve your quality of life in many ways:
That said, face recognition is not essential for identification by any means. While people rely on faces for identification, computers are capable of doing so from other subtle aspects like a person's gait, hair texture, body shape, patterns in their behavior or schedule, speech patterns, and more.
In many ways, face recognition can be less reliable compared to other methods because of the complexities of real-world conditions. So before implementing it, analyze alternative approaches that may be easier, friendlier, and more reliable.
Before learning to implement face recognition, you must become familiar with some concepts and terminology:
We explore face recognition using the TensorFlow learning framework systematically in three steps with a focus on using it for positive ends in our personal spaces:
We'll use TensorFlow 2.0's Keras high-level application programming interfaces (APIs) and Python for all these experiments.
The easiest way to get started is with a pre-trained face recognition model. These are models that are already trained on large face datasets and published for use by others.
Some good examples of ready-to-use pre-trained models include:
In this tutorial, we'll start with keras-vggface because it's simple and good enough for the small-scale closed-set face recognition we want to implement in our homes or other private spaces.
Install the keras-vggface machine learning model from GitHub. Install MTCNN to detect faces. Import everything we'll be using.
Load the three pre-trained models. The pre-trained weights are automatically downloaded from its release page and cached on your system. You need just one model but comparing their architectures can be illuminating.
On examining each model's summary, downloaded weights, input, and output, we find the following:
So what are these models capable of? They are full-fledged face classifiers but that ability to classify is just an outcome of their real strength — the ability to extract every little visual feature of the human face and encode it in mathematical form.
Their convolutional fully connected layers enable them to detect visual features of faces like shapes, colors, contours, textures, spatial arrangements of facial parts, and more. Each model is really a powerful feature extractor with a classification unit tacked on.
We can check their ability to recognize a face in a photo that's not part of their training set. Since they're all closed-set recognizers as of now, we must first find a person they already know from their training and then see if they can recognize that person's face in a photo they haven't trained on.
We download such a photo and apply the MTCNN face detector on it to return a bounding box that crops out everything outside the face. MTCNN performs exceptionally well compared to traditional face detection.
After a bit of image resizing to conform to the model, we run inference on the face.
In less than two seconds, it correctly identifies a face in a photo it has never seen before, and it does so with 95%+ confidence without getting confused at all by 8,630 other faces!
Most personal use cases for face recognition have simple goals. Perhaps you want your Raspberry Pi-based home automation system to customize the room temperature or ambient lighting based on who's in the room. For such a task, the model must identify just about 3-4 people at home, perhaps a dozen at max.
Our main reason for starting with pre-trained models is that you can train them quickly to recognize faces of personal interest to you at home, in a health care setting, or in some other personal space.
The face recognizer models are already powerful facial feature extractors. All we have to do is coax their last few classifier stages to associate those features with a new set of faces and names. The two tricks to do that: transfer learning and fine-tuning.
The pre-trained models we saw are trained to recognize between 2,600-8,600 faces of public figures. But in reality, only about 20% of all the information in those models is related to those 2,600-8,600 faces. The remaining 80% of the information packed in those models is for general facial feature extraction that works equally well on all 8 billion human faces.
Transfer learning's goal is to retain that 80% general information and replace just the last 20% with the information of the new faces. More specifically, it retains the entire stack of convolutional and residual blocks that power general feature extraction and replaces just the pre-trained classification layers at the top with a new set of classification layers trained on your custom face dataset.
Fine-tuning is very similar to transfer learning. Like transfer learning, it too replaces the pre-trained classification layers with a new set and trains them. But instead of leaving the entire feature extraction stack untouched, it includes some of the top layers of that stack too in the training.
In the sections below, we'll demonstrate how to implement them.
The first step is to prepare a custom dataset of your personal photos. We'll simulate it here by using a dataset of public figures called PubFig but using the photos of just two public figures for the training. Our model just has to learn to identify those two people correctly and identify everyone else, including the faces in its pre-training, as unknown.
You just need to gather about 20-30 images per person of interest. Plus, include about 100 images of random people and random non-faces (e.g., vehicles, scenery, cartoon faces, or animal faces) to represent unknown faces.
It's essential to organize them all in a directory structure like this:
Set up test and validation directories too if you want to be thorough about the accuracy, but for most practical personal uses, they're quite unnecessary.
Next, tell TensorFlow about your custom dataset along with batch size, shuffling options, and image size. It automatically infers the labels.
Most photos of faces tend to have the same orientation, with the faces somewhere near the top of the photo while the person is standing up. But in informal personal spaces, a camera may encounter faces with many other orientations like sleeping, bending, exercising face down, or with a disheveled look.
Data augmentation is a technique to automatically augment a custom dataset with random variations in orientation, rotation, colors, brightness levels, motion blurs, and other such real-world conditions. We can tell TensorFlow to automatically apply such random variations to images while training. They make your face recognizer more robust.
The base model is the core feature extractor portion of a model without its top classifier units. For your convenience, keras-vggface provides a separate set of pre-trained base models without classifier layers.
The custom model we're going to train using transfer learning is just a pile of all the units we've already seen:
Notice how this model just has to train 6,147 network weights for the new set of faces while leaving 23.5 million of the base model untouched. That's not even 20% but just about 0.026% of the information. That still turns out to be sufficient.
The custom model is trained for several epochs like any other classifier model.
The accuracy seems encouraging even with just 25-30 training photos. But how well does our custom model actually identify the new faces? We simulate this by downloading unseen photos of the two public figures and identifying them with our model.
The custom model correctly identifies the public figure with high probability (they're the second name out of three in our custom dataset).
The other public figure is also identified correctly with high probability (they're the first name in our custom dataset).
Many use cases become possible only if face recognition can run on the portable devices that we carry around with us. You can set up a tablet or Raspberry Pi in every room to identify the person and personalize their home automation experience.
The trick is to make the model small and lightweight to run on a resource-constrained device like a tablet or a Pi.
TensorFlow provides a framework called TensorFlow Lite (TFLite) for this. It uses several tricks like quantizing real numbers to integers and optimizing neural network layers to create a lightweight version of a model that can function like the original albeit with reduced accuracy. TFLite can compress by a factor of 2-3; a 150MB ResNet face recognizer model reduces to about 40MB.
TFLite uses a different storage format than TensorFlow core. The TensorFlow framework provides both programmatic access and command-line tools to convert a mainstream model into an equivalent TFLite model with optional optimizations.
Bundle the converted TFLite model as part of your mobile app and deploy it on a smartphone or tablet. You can integrate the device's camera and video capabilities for real-time face recognition.
TensorFlow provides TensorFlow.js and tfjs-tflite for both client-side and server-side JavaScript components to load and run TensorFlow and TensorFlow Lite models. Since a web application can run on any mobile device, this is an excellent approach to deploying face recognition models without implementing native apps.
Any single-board computer that can run Python in a Linux environment can use a TFLite model. The framework supports both learning and inference on resource-constrained CPUs and provides an interpreter API to use the model.
What you learned here is just a glimpse of TensorFlow's capabilities. New state-of-the-architectures like vision transformers are being applied to the facial recognition domain too and may uncover new ways of thinking about the domain. TensorFlow remains the machine learning (ML) framework of choice for ML engineers, including us, because of its deployment features like TensorFlow Serving. Contact us if you're looking for robust face recognition systems in your business or home!