Wednesday, 30 September 2015

Part 1 : Facial Recognition : Facts, Techniques, Required Background and Tools.

2 A.M. | NIT Trichy | 1-October-2015 | Learning more about AC-DC-AC PWM Converters. | Reading "Animal Farm" (again) | Listening to Manna Dey.

Hello Guys,

I hope all of you are doing great. So, it was yesterday that I was looking through the YouTube and I came across a Def-con Talk by Josua Marpet. He rocked the show and the talk was fabulous. You can find the video here in YouTube| Defcon | Josua Marpet.

Also, I have recently started writing research papers, so the typography will significantly differ from what I have written in my earlier blogs. However, its still me :p. 


1.FACIAL RECOGNITION:

Face: The one we all have, the one we see in the mirror. Importantly, the frame the holds our most vital sensory organs and the smile :D. 

Recognize : To find out something from a given set or group of data or items. To look at a toy store and instantly say, "Hey!! That is Buzz Lightyear."

Facial Recognition : It is a process of automatically identifying a person (or anything with a face, for that matter :p ) and then verifying the result with a pre-fed database of images or video.


I shall now quote Wikipedia and explain about some of the most common Facial Recognition Techniques. 

So, Wikipedia has put three techniques on its page, namely:
            1. Traditional Facial Recognition system.
            2. 3 Dimensional Facial Recognition system.
            3. Skin Texture Analysis.

1.1. Traditional Facial Recognition system:
  
From Wikipedia:
Some facial recognition algorithms identify facial features by extracting landmarks, or features, from an image of the subject's face. For example, an algorithm may analyze the relative position, size, and/or shape of the eyes, nose, cheekbones, and jaw. These features are then used to search for other images with matching features. Other algorithms normalize a gallery of face images and then compress the face data, only saving the data in the image that is useful for face recognition. A probe image is then compared with the face data. One of the earliest successful systems is based on template matching techniques applied to a set of salient facial features, providing a sort of compressed face representation.
Recognition algorithms can be divided into two main approaches, geometric, which looks at distinguishing features, or photometric, which is a statistical approach that distills an image into values and compares the values with templates to eliminate variances.
Popular recognition algorithms include Principal Component Analysis using eigenfaces, Linear Discriminate Analysis, Elastic Bunch Graph Matching using the Fisherface algorithm, the Hidden Markov model, the Multilinear Subspace Learning using tensor representation, and the neuronal motivated dynamic link matching.
Okay, if we have a look at the above article, most of us would be comfortable till it reaches the third paragraph. There, its no biggie. What they have done is, they have just put forward the mathematics that goes behind Facial Recognition.

As far as the traditional technique of Facial Recognition is concerned, we can conclude that the analysis was more or less based on the overall facial structure of the face. The positioning of the nose, your eyes, the hairline, the positioning of your eyebrows etc. all of them play a role. And, on the basis of that, a code will be generated (I am assuming this, as I would have done the same if I was doing some facial recognition work.). Now, if we want to find some one, what we do is, we simply read the face, generate the code and then match it with the one we have. I am not sure if this method is already into play, but if it isn't, I am on my way to the next research paper :p. The best advantage of this method is, it brings down the time-complexity by half of its original value.

The article also mentions that in the earlier systems, they only checked some salient features of the face instead of the entire face (Thus, making the information required for facial recognition significantly smaller.(Because, obviously the entire face is not stored in the database (The Entire face is not even required))). However, now, as we have more computing power, we do not hesitate in calculating or storing more data.

I am sure most of you are very tensed by looking at the words in the article like : Principal Component Analysis using Eigenfaces, Linear Discriminate Analysis, Elastic Bunch Graph matching using Fisherface Algorithm, the Hidden Markov Model, and the rest. I will give you a brief idea about all of those now. Keep in mind that the explanation I give will be useful on the future blogs that I write.

We will handle one topic at a time, I will try to keep it short and simple. Moreover, I will be providing links to certain sites which you might dabble with, if you want a broader and better insight.


Principal Component Analysis :

Commonly known as PCA (We will be calling it PCA). This is an useful statistical tool. It finds its origins in Elementary Linear Algebra. PCA is a way of finding pattern in data, in a such a way that we highlight its similarities and differences [1].

The basic algorithm for principal component analysis is as follows :
  1. Get the Data.
  2. We subtract the mean of the data set from each of the data values. This causes use to come across a new data set whose mean is zero.
  3. Calculate the co-variance matrix (no biggie, use MATLAB, refer this link).
  4.  Calculate the Eigenvalue and Eigen-vector of the co-variance matrix. Again, use MATLAB, follow this link.
  5. Choosing components and forming the feature vector. This is the step where the concept of data compression comes into play. This link was helpful for me in understanding it properly. This done to reduce the data size and to remove the noisy boundaries [2].   
  6. This is the final step and also the easiest step in PCA [1]. What we do here is, after we have decided the feature vector, we take a transpose of it and multiply it on the left of the original data set (Matrix multiplication is not commutative, hence the sense of right-and-left being used :p).
N.B. Eigen Values = Eigen Faces. So, Chill :p.

Now, we are done with PCA. If you are a fan of python, I suggest you to have a look at the third link provided in the reference section.
We shall now discuss about the Linear Discriminant Analysis.


[TIRED FOR TODAY]
[WILL BE BACK TOMORROW]

[GOOD NIGHT NERDS]

Linear Discriminant Analysis :

Pretalk : Before I dive into this topic, I would like to recommend each and everyone of you to have a good command over mathematical and scientific languages and Packages like R, Python and MATLAB. These tools make computation significantly easier. One important thing to realize at this early stage is, there are certain things a human mind cannot do. Like finding a determinant of a 255 X 255 matrix, and in Image Processing, this is used on a common basis. 

I was, as usual going through Google when I came across this site. Here, the concept of LDA (Linear Discriminant Analysis is commonly called LDA), is beautifully explained. 
We use LDA to classify objects into one, two or more groups on the basis of some predetermined set of features that describe the object. [4].
Like, notebooks can be classified into a number of groups on the basis of Pages, Cost, Paper Quality, Cover Design etc. Here, the Notebook is the object. Suppose, we are looking for the class category of the notebooks with the best paper quality, we are assigning "Paper Quality" as the dependent variable. The other measurements, which are at present, are of no significance to us (because we want to look the "Paper Quality" only), but can be useful in the future, is called Independent Variable or Feature.
What I just explained was a class-dependent approach. We can also classify the objects, without actually specifying any class or category, in this case each and every variation is treated as a class, thus, maximizing the data accuracy.

One thing to keep in mind is that, this is a topic that finds its applications in Military to Biostatistics. So, there are multiple approaches. 

As you might have noticed, I have not gone into the mathematics at all. That is because, if I start getting into the math, this blog post will never end. If you face any trouble understanding anything, please drop a comment. I will get back to you for sure and ASAP. 

[HOT AFTERNOON IN TRICHY]
[GOT A FREE AFTERNOON AFTER A LONG TIME]
[SLEEPY ME]
[I WILL GET BACK IN THE EVENING]

[BACK AFTER TWO DAYS, SORRY FOR THE DELAY]

Elastic Bunch Graph matching using Fisherface Algorithm : 
***I hope all of you are following the blog, if not, I am always happy to address your doubts.
Okay, before I start the explanation, I will give you a link to my most liked Article on EBG (Three letter acronym for Elastic Bunch Grouping). I found it on Scholarpedia [5].

I will be breaking this part of the tutorial into two parts, firstly we will discuss about EBG and then the fisher face algorithm. 

EBG, this is a algorithm that finds its roots in Biology. There are two sources from where EBG draws its inspiration [5].

  • The features are found using the wavelets from a Gabor Filter (linear filter). This has proved to be an ideal model for early visual processing.
  • The matching algorithm is a subset of DLM (Dynamic Link Matching (Very less information about it is available on Wikipedia)).
I believe, talking in detail about DLM is worth it. I have done some reading on the internet on it, also came across some research papers and to be honest, it is really very cool. 

Okay, so, for our traditional facial recognition system to work properly, we need to train our system extensively. Moreover, this has to be done repeatedly based on the specific class or feature we want to recognize. There will be a significant delay in the results of the system, also there will be extra unnecessary memory usage, and loss of data. We take care of this problem by creating an one-on-one relation between the image and our fitting model. This is called Dynamic Link Matching. If you are interested in reading more on it, please go to the Official DLM paper [6].

Getting back to EBG,we already spoke about it. Now, we will see how it is used for Matching, thus, EBGM (EBG Matching). 
This method, we initially obtain a Model Graph (say Gm). And We then obtain an image (say I). Now, we find a graph on the Image (seems like Neural Network, doesn't it? Yes! We got it right :D. That is the exact reason why DLM is said to be a "Neuronal Model" [7]),and we do the fitting onto the model graph according to a given function (This is an empirical formula, which gives a high output if the two graphs are identical and a low value as the differences between the two graphs increase).There you, you now know about it. As far as the implementation is concerned, I suggest you to go here [8] to see the required math and with some good hands over programming, you can implement it.
We are done with EBG and EBGM now, I shall now try to put forward the FisherFace Algorithm now. 

While explaining PCA, I told you that the PCA will yield eigen faces. Similarly, LDA will yield Fisher faces. We prefer Fisher faces when the task is Classification rather than Representation[9].

I have always been a very big fan of Python, hence, I shall quote a brilliant and comprehensive explanation from the open-cv tutorials[10]site.
The Principal Component Analysis (PCA), which is the core of the Eigenfaces method, finds a linear combination of features that maximizes the total variance in data. While this is clearly a powerful way to represent data, it doesn’t consider any classes and so a lot of discriminative information may be lost when throwing components away. Imagine a situation where the variance in your data is generated by an external source, let it be the light. The components identified by a PCA do not necessarily contain any discriminative information at all, so the projected samples are smeared together and a classification becomes impossible (see http://www.bytefish.de/wiki/pca_lda_with_gnu_octave for an example).
The Linear Discriminant Analysis performs a class-specific dimensionality reduction and was invented by the great statistician Sir R. A. Fisher. He successfully used it for classifying flowers in his 1936 paper The use of multiple measurements in taxonomic problems [Fisher36]. In order to find the combination of features that separates best between classes the Linear Discriminant Analysis maximizes the ratio of between-classes to within-classes scatter, instead of maximizing the overall scatter. The idea is simple: same classes should cluster tightly together, while different classes are as far away as possible from each other in the lower-dimensional representation. This was also recognized by Belhumeur, Hespanhaand Kriegman and so they applied a Discriminant Analysis to face recognition in [BHK97].
(copy-paste the links from the above article to see what's inside) 

I don't think Fisher Face algorithm is anything new to you, now that we know that it is nothing but LDA :D. This is pretty cool. Different names, however, when broken down, it is the same old, simple mathematics. 

[MY EXAMS ARE UP]
[THE BLOG WILL BE UPDATED ON A REGULAR BASIS, THOUGH NOT SO FREQUENTLY]
[LIFE IS SHOWING ME NEW FACES NOW, I WAS ONCE A ELECTRONICS ENTHUSIAST, NOW FINDING HAVEN IN DATA ANALYSIS]
[HAVE FUN LEARNING]
[WILL BE BACK WITH MARKOV'S HIDDEN MODEL LATER].


The hidden Markov Model :

I was surfing the internet for some cool explanations for the Application of Hidden Markov Model in Face Recognition. I came across this site [11].

I will start this with a simple introduction, No. That is not possible. I will be creating a proper background and finally reach at, HMM (Hidden Markov Model). 

What is a DAG? It expands to Directed Acyclic Graphs. It has two conditions, first is, it has to be directed, this means if there is a path from a.node to b.node , there cannot be any path from b.node to a.node. And the other condition is, in no way, a node starting from say, a.node , under no circumstances can travel all the way across the graph and reach a.node again. 
From Wikipedia : 
it is formed by a collection of vertices and directed edges, each edge connecting one vertex to another, such that there is no way to start at some vertex v and follow a sequence of edges that eventually loops back to v again.
I found the following illustration of Wikipedia. It gives a good idea about DAG :
From Wikipedia.
What is a Bayesian Network? A Bayesian Network uses Directed Acyclic Graph to represent a set of Random Variables and their conditional dependencies. [12]. The nodes of the DAG used here are Random Variables. We use this approach to simulate events that are not observable (hence the word, Hidden, wasn't pun intended). 

Wikipedia has provided a very simple and nice example, that is as follows :
 For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases.

What is a Dynamic Bayesian Network? Commonly known as DBNs, these are basically a kind of Bayesian Networks used for Dynamic processes. [13]. 
I will now attach the Wikipedia article for further information :
A Dynamic Bayesian Network (DBN) is a Bayesian Network which relates variables to each other over adjacent time steps. This is often called a Two-Timeslice BN (2TBN) because it says that at any point in time T, the value of a variable can be calculated from the internal regressors and the immediate prior value (time T-1). DBNs were developed by Paul Dagum in the early 1990s when he led research funded by two National Science Foundation grants at Stanford University's Section on Medical Informatics.Dagum developed DBNs to unify and extend traditional linear state-space models such as Kalman filters, linear and normal forecasting models such as ARMA and simple dependency models such as hidden Markov models into a general probabilistic representation and inference mechanism for arbitrary nonlinear and non-normal time-dependent domains.

What is Markov Property ? What is a Markov Process/Model? This is a kind of process, whose future values solely depend upon its present values. A process/model which satisfies this relation is called a Markov Process/Model. In other words, such processes are memory-less.

Now, if we have a look at the Wikipedia article on Hidden Markov Model
A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (hidden) states. A HMM can be presented as the simplest dynamic Bayesian network. The mathematics behind the HMM was developed by L. E. Baum and coworkers.It is closely related to an earlier work on the optimal nonlinear filtering problem by Ruslan L. Stratonovich, who was the first to describe the forward-backward procedure.
Easy isn't it?  
N.B. HMMs  are  generally  used  for  the  statistical  modelling of non-stationary vector time series. By considering the facial configurable information as a time varying sequence,  HMMs  can  be applied  to  face  recognition.  The  most  significant  facial features  of  a frontal  face image,  including  the  hair, forehead, eyes, nose  and mouth,  occur in  a natural  order from top to bottom,  even  if the image has  small  rotations  in  the image  plane, and/or  rotations  in the plane perpendicular to the image plane. Based on this observation, the image of a face may be modeled using  a one-dimensional HMM by  assigning  each  of these regions a  state [15].

[VERY TIRED]
[GOTTA SLEEP]
[I WILL BE BACK TOMORROW WITH "Multilinear Subspace Learning using tensor representation"]
[EXAMS UP. GOTTA STUDY THE FREQUENCY DOMAIN STABILITY ANALYSIS.]
[GOOD NIGHT NERDS]

Multilinear Subspace Learning using tensor representation:

As we have done for all the previous topics, we shall again break the heading into smaller parts, then understand them and put them back together to get a holistic point of understanding.

What is Multilinear Subspace Learning? Commonly known by its TLA (Three letter Acronym of Three Letter Acronym :P), MSL. It is a way of looking at a smaller part of a big problem (Hence the name "Subspace"). Each subspace deals with a particular trait or quality for the learning. Here is what Wikipedia [16] says about MSL :
Multilinear subspace learning (MSL) aims to learn a specific small part of a large space of multidimensional objects having a particular desired property. It is a dimensionality reduction approach for finding a low-dimensional representation with certain preferred characteristics of high-dimensional tensor data through direct mapping, without going through vectorization. The term tensor in MSL refers to multidimensional arrays.
No need to worry about some of the confusing terms used. For now, take "High Dimensional Tensor Data" as "Data". :D 
Dimensionality reduction is a method of reducing the number of random variables using some conditions (or considerations (this considerations make the other random variables useless in our particular analysis)).

Vectorization is a method of transforming any matrix into a column vector. It is a linear transformation.

High-Dimensional Tensor Data means a very large and complex set of multidimensional arrays.
(I am not going into the Depth of it, otherwise, this top will take another separate blog to finish :p, do the reading on your own (If you have doubts, you are free to contact me)). And Google is your friend.

What is Tensor Representation? It is a sub-topic in the field of Representation Theory [17]. It is a way of studying abstract algebraic structures using Matrices. Tensor, as such, is a Abstract Algebraic Structure. We use Tensor for our representation, because Tensor Representation makes our data easier to be analysed, as they are mostly shown as Multidimensional Arrays. I will quote Wikipedia here :
a representation makes an abstract algebraic object more concrete by describing its elements by matrices and the algebraic operations in terms of matrix addition and matrix multiplication. 
Well in our case, it is not only a Representation. We are using Tensor Representation, thus, we will see Multidimensional Arrays in place of Matrices (BOTH ARE ESSENTIALLY THE SAME. A fact, most programmers will know :) ).

Now, putting all of it together, our final understanding is, in Multilinear Subspace Matching using Tensor Analysis, we take a particular feature (to reduce the number of random variables) of a given face and train our recognition system on that. We might even have multiple Subspace (plural sense) for training our recognition system on multiple facial traits. The results of the trait are represented using Tensor Representation, for easier analysis and visualization.

Today, we are finally done with the "Traditional Facial Recognition system."

Next topic is : "3 Dimensional Facial Recognition system."
Time for some small talk :p.
This is by far the longest blog I have written, so I am gonna tap on my shoulder, look at myself and be proud of myself. 
The other 60% of the blog will have a lot of contents from the topics I have done till now. Please read through all of them before starting 3D Recognition.

[GOOD NIGHT FOR NOW]
[HAPPY LEARNING]
[WILL BE BACK TOMORROW]



2.3D Facial Recognition:

Wikipedia Says:
    A newly emerging trend, claimed to achieve improved accuracies, is three-dimensional face recognition.This technique uses 3D sensors to capture information about the shape of a face. This information is then used to identify distinctive features on the surface of a face, such as the contour of the eye sockets, nose, and chin.[18][18.1]
However, there are some problems too. One of the most common one is there is a lack of large and widely accepted databases for
objectively testing the performance of 3D face recognition systems[19], and the 3D dataset will usually be pretty large. Thus, more memory will be required.

Tools and Softwares Used in Facial Recognition:

There are a lot of APIs and tools available for Facial Recognition, for a quick answer, we can easily do some primary Facial recognition using MATLAB. For starters, here is a video by Angry Pawn|Facial Recognition Using MATLAB, this is a cool video, if you just want to get started with it. 

Here is a piece of literature from the official Mathworks website[20]:
Face recognition leverages computer vision to extract discriminative information from facial images, and pattern recognition or machine learning techniques to model the appearance of faces and to classify them.
You can use computer vision techniques to perform feature extraction to encode the discriminative information required for face recognition as a compact feature vector using techniques and algorithms such as:
  1. Dense local feature extraction with SURF, BRISK or FREAK descriptors
  2. Histogram of oriented gradients
  3. Distance between detected facial landmarks such as eyes, noses, and lips.

Machine learning techniques can applied to the extracted features to perform face recognition or classification using:
  1. Supervised learning techniques such as support vector machines ( SVM) and decision trees
  2. Ensemble learning methods
  3. Deep neural networks
I suggest you to read on SURF, BRISK and FREAK Descriptors on your own. Here are some good papers on SURF ( Speeded Up Robust Features (SURF)), BRISK(BRISK: Binary Robust Invariant Scalable Keypoints) and FREAK(FREAK: Fast Retina Keypoint).

There are other software tools that play a very prominent role in Computer Vision and Facial Recognition. There is everyone's favorite, OpenCV and there is every beginner's favorite SimpleCV.

The official SimpleCV website says:
SimpleCV is an open source framework for building computer vision applications. With it, you get access to several high-powered computer vision libraries such as OpenCV – without having to first learn about bit depths, file formats, color spaces, buffer management, eigenvalues, or matrix versus bitmap storage. This is computer vision made easy.
Here is a freely available book for SimpleCv : Book.
I need not provide any extra idea of OpenCV, I  guess. Still, I will just drop a cool book for starters and developers, here : Book.

Needless to say, both SimpleCV and OpenCV run on Python, preferably Python 2.x (Find out why).

I haven't worked much with APIs, but I am sure they are a good way of implementation for Facial Recognition in a large scale. Here is a list of 10 face detection APIs : List.  


[THE BLOG IS FINALLY OVER, IT TOOK A LITTLE OVER 5 MONTHS TO COMPLETE AND ALMOST 552 HOURS/23 DAYS OF WORK].
[4300 Words/25632 Characters/308 Sentences/ 175 Paragraphs.]

[I hope I made some positive impact on your mind.]
[Positive Criticism is welcome.]

I am Pragyaditya Das.
Signing off.

References :

[2]. Feature Vector Extraction or Formation Of Feature Vector.
[3]. This was very helpful for me, as I like python a lot too.
[4]. Awesome place to learn LDA in detail, in a short time.
[5]. Use of Elastic Bunch Graph for Facial Recognition.
[6]. Dynamic Link Matching.
[7]. DLM is a Neuronal Model.
[8]. Research paper on EBG. Required mathematics for implementation.
[9]. Fisherfaces, detailed explanation.
[10]. Explanation on Open-Cv site. Comprehensive, with relevant links.
[11]. The Hidden Markov Model and Facial Recognition.
[12]. Bayesian Network.
[13]. Dynamic Bayesian Network.
[14]. Detailed Math on Hidden Markov Model.
[15]. Eye detection using wavelets.
[16]. Multilinear Subspace Learning (MSL) at Wikipedia.
[17]. Representation Theory at Wikipedia.
[18]. How Stuff Works--3D Facial Recognition.
[18.1]. 3D face recognition at Wikipedia.
[19]. Paper on 3D Face Recognition .
[20]. Official MATLAB guide to Face Recognition.


As you must have known by now that, I am no Superman :p.I have made my fare share of Grammatical as well as Lexical Errors, but first things first. I have tried my best to not make any factual error.