Drowsiness (Sleep) Detection Using Machine Learning – Part 3 of 5

 Drowsiness (Sleep) Detection Using Machine Learning – Part 3 of 5

This is based on the Haar Wavelet technique to analyze pixels in the image into squares by function. This uses machine learning techniques to get a high degree of accuracy from what is called “training data”. This uses “integral image” concepts to compute the “features” detected. When it comes to object detection in an image or a video, Haar cascade is the most popular algorithm used in machine learning. There was a paper published about rapid object detection using a boosted cascade of simple features by Paul Viola and Michael Jones in the year 2001. That is a huge approach to machine learning where cascade function is trained with the use of positive and negative images. After that, it is used to detect objects in other images.

The algorithm has four stages:

  • Haar Feature Selection
  • Creating Integral Images
  • Adaboost Training
  • Cascading Classifiers

It is well known in detecting faces and body parts in an image but can be trained to identify almost any object.

In face detection kinds of models, initially, the algorithm needed a lot of positive images and negative images without faces to train the classifier. After that, the features can be extracted. As the first step, the model needs to collect Haar Features. Adjacent rectangular regions at a specific location in a detection window, and sums up the pixel intensities in each region and calculates the difference between these sums gives the haar feature as the answer.

To make this super-fast, you can use Integral Images. But most of them are irrelevant among all these features we have calculated. Consider the image below as an example. Two good features are there in the top two rows. The first feature seems to focus on the property that the region of the eyes is often darker than the nose and cheeks. The second feature relies on the property that the eyes are darker than the bridge of the nose. But if the windows are applying on cheeks or any other place is trivial.

So how do we can select the best features out of 160000+ features? This is adept using a concept called Adaboost. This is a method that selects the best features and trains the classifiers that are using them. This constructs a strong classifier as a combination of the weighted weak classifier. The process is as follows. A window of the targeted size is moved over the input image, and for each subsection of the image and Haar features are calculated during the phase of detection. Then it is comparing with a learned threshold that separates non-objects from objects. Since each Haar feature is only a ‘Weak classifier’ (its detection quality is marginally better than random guessing), a large number of Haar features are required to classify an item with adequate accuracy and therefore grouped into cascade classifiers to form a strong classifier.

The training data used in this project are XML files called:

Here in this project, the detectMultiscale module from OpenCV is used. What this does is create a rectangle with coordinates (x,y,w,h) around the face detected in the image. This contains code parameters that are the most important to consider:

This is the third out of a series of five articles. See you tomorrow with more! Stay tuned!!!

Day 01

Day 02

Day 03 –

Day 04 –

Day 05 –

Full code available at:


Comments are closed.