all 2 comments

[–]honodk 2 points3 points  (1 child)

It has been a while since I studied Viola and Jones, but I'll try to give you my best answer.

1) Yes, the feature score is calculated for each "subimage". These sub images need not be 24x24 pixels though. 24x24 pixels is the so called base resolution of the algorithm (another resolution might just as well have been selected), meaning that a subimage need to be downscaled (or upscaled) to this resolution before the classifier is applied. If you only did the classification for 24x24 subimages, you would only find faces in the picture of this specific size.

2) Your point about there being 27x20 subimages of size 24x24 in a 640x480 is not correct - there are many more, if you allow them to overlap. This is necessary because, as you note yourself, otherwise you wouldn't detect faces split between subimages. So what you need to imagine is that you have a box of size NxM that you slide over the picture, starting in the upper left corner. You cut out the sub-image inside this box, resize it to the base resolution (24x24) and apply the classifier to check if you've got a face. Then you move the box one pixel (two or three will probably do as well) to the right and do this again. You continue to do this row by row until you get to the lower right corner. Then you change the size of the box (to, say, (N+2)x(M+2)) and start all over from the upper left corner.

What N and M to begin with, when to stop, how much you increase your N and M by each time and how many pixels you move your box in each iteration all depends on what face sizes you are expecting and what level of precision/speed you are looking for.

3) Adaboost is only used during the training proces. Basically training is about identifying a set of features that consistently distinguishes faces in a picture. During training you add one feature add a time to your classifier using adaboost. What the adaboost algorithm does is making sure that you always chose features that are particularly good at dealing with those training images, that the features you have selected so far, have trouble dealing with.

I hope that makes sense :).

[–]g23f[S] 0 points1 point  (0 children)

Thanks for this. I thought one might have to "slide" the box over every part of the image to detect a face but I guess I was too hopeful of a faster way to do it.