Matching an infrared image of a face to its visible light counterpart is
a difficult task, but one that deep neural networks are now coming to
grips with.
One problem with infrared surveillance videos or infrared CCTV images
is that it is hard to recognize the people in them. Faces look
different in the infrared and matching these images to their normal
appearance is a significant unsolved challenge.
The problem is that the link between the way people look in infrared
and visible light is highly nonlinear. This is particularly tricky for
footage taken in the mid- and far-infrared, which tends to use passive
sensors that detect emitted light rather than the reflected variety.
Today, Saquib Sarfraz and Rainer Stiefelhagen at the Karlsruhe
Institute of Technology in Germany say they’ve worked out how to connect
a mid- or far-infrared image of a face with its visible light
counterpart for the first time. The trick they’ve perfected is to teach a
neural network to do all the work.
The way a face emits infrared light is entirely different from the
way it reflects it. These emissions vary according to the temperature of
the air and the temperature of the skin, which in turn depends on the
person’s activity levels, whether he or she has a fever and so on.
There’s another problem that makes comparisons difficult. Visible
light images tend to have a high resolution while far infrared pictures
tend to have a much lower resolution because of the nature of the
cameras that take them. Together, these factors make it hard to match an
infrared face with its visible light counterpart.
But the recent improvements in deep neural networks in tackling all
kinds of complex problems gave Sarfraz and Stiefelhagen an idea. Why not
train a network to recognize visible light faces by looking at infrared
versions?
There are two important factors that have combined in recent years to
make neural networks much more powerful. The first is a better
understanding of how to build and tweak the networks to perform their
task, a technique that has led to the creation of so-called deep neural
nets. That’s something Sarfraz and Stiefelhagen could learn from other
work.
The second is the availability of huge annotated datasets that can be
used to train these networks. For example, accurate automated face
recognition has only become possible because of the creation of vast
banks of images in which people’s faces have been isolated and
identified by human observers thanks to crowdsourcing services such as
Amazon’s Mechanical Turk.
These data sets are much harder to come by for infrared/visible light
comparisons. However, Sarfraz and Stiefelhagen found one they thought
could do the trick. This was created at the University of Notre Dame and
consists of 4,585 images of 82 people taken either in visible light at a
resolution of 1600 x 1200 pixels or in the far infrared at 312 x 239
pixels.
The data set contains images of people smiling, laughing and with a
neutral expression taken in different sessions to capture the way
people’s appearance changes from day to day, and in two different
lighting conditions.
They then divided each image into a set of overlapping patches, 20 x
20 pixels in size, to dramatically increase the size of the database.
Finally, Sarfraz and Stiefelhagen used the images of the first 41
people to train their neural net and the images of the other 41 people
to test it.
The results make for interesting reading. “The presented approach
improves the state-of-the-art by more than 10 percent,” say Sarfraz and
Stiefelhagen.
What’s more, the net can match a thermal image to its visible
counterpart in just 35 milliseconds. “This is therefore, very fast and
capable of running in real-time at ∼ 28 fps,” they say.
It is by no means perfect, however. At best, its accuracy is just
over 80 percent when it has a wide range of visible light images to
compare the thermal image against. The one-to-one comparison accuracy is
just 55 percent, however.
Better accuracy is clearly possible with bigger datasets and a more
powerful network. Of these, the creation of a data set that is bigger by
orders of magnitude will be by far the harder of the two tasks.
But it’s not difficult to imagine such a database being created
relatively quickly, given that interested customers are likely to be the
military, law enforcement agencies and governments who generally have
deeper pockets when it comes to security-related technology.
source
Ref: arxiv.org/abs/1507.02879 : Deep Perceptual Mapping for Thermal to Visible Face Recognition
No comments:
Post a Comment