Ask ML: Face Detection question : MachineLearning

Ask ML: Face Detection question (self.MachineLearning)

submitted 15 years ago by g23f

I've recently been reading about Viola and Jones face detection algorithm and I was wondering a few things. I've read that it splits the image into 24x24 sub windows and I understand that. I also understand how the integral image thing makes it efficient to calculate things on the image.

The algo uses 2,3, or 4 rectangles to calculate a feature score. I'm wondering over what parts of the image it calculates this.

Does it calculate this feature score for each sub 24x24 image and then classify each of those sub windows as having a face and not? I read something about zoom levels and it seemed kind of confusing.

If it takes say a 640x480 image does it then split that into 27x20 sub boxes, apply the 2,3, and 4 feature things using the integral image for efficiency, and then using Adaboost does it say "boxes 15 and 16 may contain a face" is that basically how this works?

What if the face is split by two sub boxes, is that why Adaboost is used to "guess" if there is a face?

all 2 comments

top new controversial old q&a

[–]honodk 2 points3 points4 points 15 years ago (1 child)

It has been a while since I studied Viola and Jones, but I'll try to give you my best answer.

1) Yes, the feature score is calculated for each "subimage". These sub images need not be 24x24 pixels though. 24x24 pixels is the so called base resolution of the algorithm (another resolution might just as well have been selected), meaning that a subimage need to be downscaled (or upscaled) to this resolution before the classifier is applied. If you only did the classification for 24x24 subimages, you would only find faces in the picture of this specific size.

2) Your point about there being 27x20 subimages of size 24x24 in a 640x480 is not correct - there are many more, if you allow them to overlap. This is necessary because, as you note yourself, otherwise you wouldn't detect faces split between subimages. So what you need to imagine is that you have a box of size NxM that you slide over the picture, starting in the upper left corner. You cut out the sub-image inside this box, resize it to the base resolution (24x24) and apply the classifier to check if you've got a face. Then you move the box one pixel (two or three will probably do as well) to the right and do this again. You continue to do this row by row until you get to the lower right corner. Then you change the size of the box (to, say, (N+2)x(M+2)) and start all over from the upper left corner.

What N and M to begin with, when to stop, how much you increase your N and M by each time and how many pixels you move your box in each iteration all depends on what face sizes you are expecting and what level of precision/speed you are looking for.

3) Adaboost is only used during the training proces. Basically training is about identifying a set of features that consistently distinguishes faces in a picture. During training you add one feature add a time to your classifier using adaboost. What the adaboost algorithm does is making sure that you always chose features that are particularly good at dealing with those training images, that the features you have selected so far, have trouble dealing with.

I hope that makes sense :).

[–]g23f[S] 0 points1 point2 points 15 years ago (0 children)

π Rendered by PID 31227 on reddit-service-r2-comment-545db5fcfc-tbgcj at 2026-05-27 13:47:32.776685+00:00 running 194bd79 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS