How to track multiple people in a single frame using KNN?

8 min readJul 26, 2020

How to track multiple people in a single frame using KNN? Can we track this with high accuracy and minimum time?

This paper consists of a solution using KNN algorithm in order to track all the people independently in one frame by mapping with their trained names. Using this approach, faces of people can be analysed at real time. This method can identify the person within 0.01 seconds.

In this globalized era, there is a cutthroat competition, so people follow shortcuts to live happy which gives rise to criminal activities. By the procedure of face tracking at a crime spot, it is easy to catch the criminal. The method introduced in this paper also takes care of the time that is required to recognize the person. So, after performing these kinds of activities when the person tries to run fast, it still captures the person and recognize him within a fraction of seconds. Thus, the time taken was the major parameter considered for the solution proposed. Moreover, it can be also useful for various advertising campaigns to recognize the brand ambassador who generated the maximum profit, healthcare where face bio-metrics are important for the patient identity in the world of large population. Furthermore, it is already common for the payments in ATMs to use face recognition but for online payments also this can be the next step which can increase the security. The major aim for the proposed method is that it can be used as a generic approach in all the applications listed.

The proposed solution works for four people in a frame at real time. This solution also works on the efficiency i.e the people should be recognized by their first name in minimum time Several approaches were applied before coming up to the proposed approach. As the computational performance was high in all the approaches followed before, so this approached was applied to give most accurate results.

First step of the solution is to collect different data in the form of NumPy array. This was done in PyCharm. The code is able to capture the face and save it into .png format in the same folder by the first names of the people. NumPy is the core library that is used for the scientific computing in Python. It provides high performance for multidimensional array. Numpy array is grid of values of same type stored in a tuple of non-negative values. HAAR cascade frontal face is used to capture the face. There are four different stages for HAAR cascade algorithm.

1. Haar Feature Selection

2. Creating Integral Images

3. Adaboost Training

4. Cascading Classifiers

The first step looks for features. It considers several adjacent rectangles in the detection window, sum up the pixel’s intensity for every region and then finds the distance between two regions. A concept called ‘Adaboost’ is useful as it selects the best features and trains the model accordingly to classify them. This algorithm results in strong classifiers as the result of linearly weighed weak classifiers. So, this algorithm consists of many positives and many negative images to train the classifier. The positive ones are the images containing the image to be detect and nothing else. Negative image consists of all the images that should not be recognised as we do not want to detect it as the part of the solution. Based on the inputs, it classifies the images in the testing set.

The face is detected by the code in Jupyter notebook. Jupyter notebook was used for high productivity and easy collaboration. KNN algorithm was used to classify the faces with their names listed. The values of K in KNN is taken as five in the proposed paper. The experiment was also performed till k = 1, 2, 3, 5, 7. The accuracy was very low with small values of K. k = 5 and k = 7 was giving nearly same results so k=5 was chosen to avoid overfitting of the model. The balance of K is kept so that bias and variance come in a point that no error is maximized. This is bias- variance trade off.

To check the final performance, matplotlib was used to visualize the distplot for the time taken to recognize a person. The matplotlib is a python library used for visualizing the mathematical operations as the result to check accuracy. Further, Regression was also used to do so. Parameters such as R square, Multiple R, standard error in regression is useful for giving a accuracy parameter.

The standard error refers to the estimated standard deviation of the error.

Where xi and yi are the number of people in the frame and time taken to detect in seconds, respectively. Here, alpha is the constant value.

Multiple R is the correlation coefficient that tells that how strong is the relationship between the coordinates of the proposed model. If the values close to 1 , it means that it is a strong correlation and if the values come close to 0, it means that there is no relation between the data points.

R square is the coefficient of determination which tells that the variation in the parameter y which is dependent variable is how much affected by the variation in x that is the independent and proctor variable. Here x is the number of people in the frame and y is the time taken to detect the people in the frame. The graph is obtained by different values of x and y. Normal probability plot is formed. It is a graphical technique to check whether the data is normally distributed or not. One advantage of the normal probability plot is that the intercept and slope estimates of fitted line also estimates the location and scale of the distribution.

The solution provided in this paper is done in Python. The open source library used is OpenCV.

A. Parameters Used For Detection Of Face

Euclidian distance — Firstly, Manhattan distance was used to capture the distance between captured image and the real time image.

Manhattan distance is given by: -

Where x and y are the coordinates.

Since, accuracy was coming much higher for Euclidian distance, so it was used in the proposed solution.

It is the straight line between two data points in the plane.

If (x1, y1) and (x2, y2) are two data points, then Euclidean distance is calculated as shown.

It is given by: -

which is the Pythagoras theorem also.

B. Face ROI

It is responsible for the formation of rectangle on the face to capture the face and detect it.

The colour chosen for the rectangle is yellow and the colour chosen for the first name is red.

C. Parameter Used For Calculating Efficiency

Time taken to recognize person(s) was used as a parameter to calculate performance of the model. For many applications, time might be the more important criteria to check performance. Whether it is the attendance of the employee in a firm, or to check students who attended the lecture in the university, or prisoner who ran and is hidden in a supermarket and much more. When there is a single or two people in a single frame, the time taken to detect the person is less than 0.02 seconds.

This graph which is created which the help of matplotlib and seaborn library in Jupyter notebook. Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics

%matplot lib inline is used so that graph is created inside the notebook which makes it even easy to analyze. A distplot plots a univariate distribution of observations. The distplot() function combines the matplotlib hist function with the seaborn kdeplot() and rugplot() functions.

When there are more than two people in a frame,

The model gets confused when the number of people in the frame increases, so it takes more time to recognize. The time taken to recognize in this case is 0.017 seconds.

To check its accuracy and performance further, the number of people were increased to four.

When there are maximum people in the frame, the time taken to detect increases to 0.019 seconds. This is because the when there are many people, the model tends to find the correct names but while doing so it mismatches many times and then finally takes much more time in comparison to less people in the frame. However, the maximum time taken to detect even for four people in the frame is not more than 0.019 seconds. So, this is very fast and can be used in many applications.

RESULTS

The experiment was performed on different number of people in a frame- one, two, three and four. The time taken to detect the face varied in each case.

By performing regression function, the value of R square us 0.89 (up to 2 decimal places). It means the data is highly co-related when taken number of persons as the input range and the time to detect as an output range. The value of Multiple R is 0.94 (up to 2 decimal places) which implies that the data taken is variable. It means the test results are not biased. The standard error coming in this case is 0.03 (up to 2 decimal places).

The following conclusions can be obtained from the procedure performed:

1. The time taken to detect people in the frame is less than 0.019 seconds for the maximum four people in the frame. For three people, this comes to 0.017 seconds and for one and two person(s), it comes to 0.01 and 0.02 seconds, respectively.

2. The standard error of the model is 0.035(up to 3 decimal places.)

3. The number of people in the frame and the time taken to detect them is highly correlated. It is nearly 89%.

4. The data taken for the testing is variable up to 94%.

The results led to a decision that the given model is working fine, and the performance can be measured unlike the prior research where there was no parameter to decide the efficiency of the model.

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Aki Kapoor

No responses yet

More from Aki Kapoor

Building song recommendation system using Million Song Dataset

This project focusses on building a recommendation system using different techniques and machine learning models after analyzing dataset.

Online Social Networking and Mental Health by Igor Pantic

I read an article on Online Social Networking and Mental Health by Igor Pantic. I strongly believe the covered pointed by the author —…

The Psychology of Social Media

The Psychology of Social Media is a quite broad topic, so it is narrowed to “How social media is affecting human?” Technology, in general…

Complex systems and emergent property- An Introduction to social networks

Let us understand social networks with the help of an example. Suppose people are asked to clap in a group, all clap in a different order…

Recommended from Medium

The 5 paid subscriptions I actually use in 2025 as a Staff Software Engineer

Tools I use that are cheaper than Netflix

Jeff Bezos Says the 1-Hour Rule Makes Him Smarter. New Neuroscience Says He’s Right

Jeff Bezos’s morning routine has long included the one-hour rule. New neuroscience says yours probably should too.

Lists

Staff picks

Stories to Help You Level-Up at Work

Self-Improvement 101

Productivity 101

How I Am Using a Lifetime 100% Free Server

Get a server with 24 GB RAM + 4 CPU + 200 GB Storage + Always Free

I used OpenAI’s o1 model to develop a trading strategy. It is DESTROYING the market

It literally took one try. I was shocked.

Active Learning for Data Labeling

Problem Overview

Google just confirmed the AI reality many programmers are desperately trying to deny

AI is slowly taking over coding but many programmers are still sticking their head in the sand about what’s coming…