Face Recognition as a biometric authentication method is set to make its debut on the iOS platform starting with iPhone X.

We wanted to take a more in-depth look back at the technological advancements that allowed this technology to grow from being a niche project in research labs to a mainstream offering in smartphones.

In this blog, we focus on progress being made in three areas: Sensors, Algorithms, and Hardware.


1 - Sensors

Face Recognition as implemented on the iPhone X relies on building a 3-D template of the user’s face. This template is then compared against a database of known face templates of the user that are stored in a secure element on the phone. So the first challenge is to reliably estimate depth information for each pixel.

Since FaceID needs to work in poor lighting conditions, simply relying on RGB values of image pixels will not work. Excess light outdoors can easily saturate the camera sensors whereas poor lighting results in significant loss of pixel information.

The stereo cameras in iPhone 7 Plus, that are capable of determining depth in an image by calculating a *disparity map* based on the images captured from the left and right cameras. However, this technique does not work well in adverse lighting conditions. Furthermore, it is difficult to calibrate the cameras to reliably determine depth information.

Structured Light Depth Cameras

iPhone X’s structured light cameras work by projecting a grid or pattern of InfraRed or Laser light and then measuring the light reflected back from the target surface. Based on time-of-flight, angle-of-incidence or other techniques, it is possible to reliably determine depth information even under adverse lighting conditions.

Though the first consumer-grade structured light depth camera products hit the mass market in 2010 with the introduction of the Microsoft Kinect, it was still a major engineering challenge to miniaturize them into the form factor that fits into a smartphone while keeping down cost and power consumption.

Both Apple and Samsung have been actively involved in incorporating and developing these technologies further. In 2013 Apple acquired the Israeli company PrimseSense that developed the first generation of depth cameras incorporated in the Kinect. Samsung, on the other hand, has placed its bet on Iris scanning technologies, developed by DeltaID.

CMOS sensor camera prices have continued to fall in the last few years. That, coupled with an increase in resolution and sensitivity of these cameras, has enabled the incorporation of these devices into smartphones


2 - Algorithms and Advances in Neural Networks

Neural Networks have existed since the 80s; however, there has been a resurgence of interest in their applications and techniques in recent years. This is fueled mainly by the vast quantities of labeled data now available from the internet, along with improvements in neural network training techniques.

Starting in 2012, with the publication of the AlexNet architecture, improvements in image classification accuracy have continued to progress, driven by the increasing depth of Convolution Neural Networks and their novel architectures.

Source: Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. “Deep Residual Learning for Image Recognition”. arXiv 2015.


Specifically, when it comes to face recognition, two seminal papers established that it is possible for machines to recognize faces with almost human-like precision:

Source: Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. “Deep Residual Learning for Image Recognition”. arXiv 2015.


1) Google’s FaceNetrer architecture relies on a novel *Harmonic Triplet Loss* metric to train their deep CNN.

2) Facebook’s DeepFace[2] uses a constructed 3-D model of the face to generate a frontal or normalized view, which is then used to train a Deep Neural Network.

Source: Taigman,Yang, Ranzato, & Wolf (Facebook, Tel Aviv), CVPR 2014


3 - Hardware

Whether it is Augmented Reality or Face Recognition, convolutions are the workhorse underneath the hood that power Deep Neural Networks. NVIDIA realized this early and has successfully transformed its GPU products (traditionally used to power graphics for video games) into computing engines for DNNs.

NVIDIA, Intel and AMD are all in the race to develop better hardware solutions to accommodate the increasing complexity of these neural networks.

In addition to developing more powerful hardware, improvements in numerical computation techniques are allowing applications to squeeze more performance out of the hardware.

For example, the implementation of Winograd FFT technique outlined in Lavin and Gray[3], resulted in a 3-4x improvement over traditional CNNs.

The iPhone X will incorporate a custom GPU built by Apple to optimize for size and power constraints.  


In conclusion

Improvements in sensors, hardware and algorithms will lead to more powerful applications of Deep Neural Networks reaching the hands of the consumer.

Object detection in Augmented Reality and Face Recognition are two such applications that are powered by the recent improvements in this technology.

We see face recognition being incorporated in other Apple devices in the near future. Hands-free applications that can authenticate based on a user’s face alone stand to benefit tremendously from this technology.

Some examples:

  • A dentist or doctor needs to unlock a medical record system while attending to a patient
  • A mechanic or craftsman with greasy hands needs access to data on a mobile device

Since Apple has exposed FaceID as an authentication mechanism in the LocalAuthentication framework[5], app developers will be able to seamlessly integrate this mechanism into their apps.



  1. FaceNet: A Unified Embedding For Face Recognition and Clustering 
  2. Deepface Closing The Gap To Human-Level Performance In Face Recognition 
  3. "Not So Fast, FFT": Winograd 
  4. Fast Algorithms for Convolutional Neural Networks
  5. Local Authentication Framework