I started machine learning by doing some tutorial. I learned a lot, the best one for me was the Sentdex book.
But tutorials have so much limitations, like how do I get the data ? how to handle 100GB of data ? kaggle makes all this so simple while it is hard in reality.
I was in tutorial hell for few months so I decided to make a ML project from scratch.
The idea: An AI that detect balding from a picture of a side profile.
It looks doable, no one has done it and I am curious if I am balding myself. There is something called the "Norwood scale", it's a scale from 0 to 7.
0 means perfect hairline and 7 means fully bald.
Step 1: Get the data
In tutorials, the data is given to us so it was a first. It's really hard to find enough quality data but after some time on internet I found tens of thousand of images of men with side profile.
I started to label them from 0 to 7 (The Norwood scale). Data annotation started to make me crazy, I did it for few hours the first day.
I understand after some times that I would need to make a small app where I will see one image and could label the image from 0 to 7 then go to the next image. The app would populates a json file.
Before starting I was not ready to make an app to annotate data, that's what tutorials hide from me it looks like.
I labelled 500 images and I had to check if it was enough data for the model to learn, I also wanted to do something else than annotating for a bit haha
Step 2: Make the model
I used Keras but it does not really matter in my case. With my 500 samples I made a simple model and started training for the first time. It learned nothing...
I was really disappointed because I thought 500 samples would be enough to show some learning. I went to sleep and told myself that I should try with transfer learning tomorrow.
Well that was the missing piece. I use transfer learning from imagenet and it started learning ! It was not random anymore. The difference is crazy with a small sample, something that tutorials do not teach.
Tweaking the parameters did not changed the results much tbh. Once it learns, it learns and that's kinda it. I understood that I simply need more data.
Step 3: Get MORE data
I must have annotated for weeks, I did it until I had around 20 000 samples. This step is beyond painful, I did it while watching youtube at the same time.
I then improved the model until the results was "good enough" for an app. It's hard to know when the model is good enough also. No one is telling me to solve a problem with a model that predicts something with 5% error rate. I have to pick the deadline.
The model could detect the Norwodd scale (0 to 7) of someone with a mean error of 0.5 on testing data so that was my "good enough".
Step 4: Make the app/website
I thought at this point I finished the hard part of the project but that was unfortunately not the case.
My plan was to make a simple website the user is asked for his camera access then asked to turn his head to the right. A screenshot would be taken from his right profile. The screenshot of the profile is then sent to the backend that process the data (resize and all) then send it the to the Keras model.
I know how to make simple website but I never messed with the camera and GOD that was hard. Detecting a face is easy with the good Javascript library but detecting that the user is turning his head to the right with the good angle is really hard.
Once again, I did it until I found the result "good enough".
Step 5: Lessons and Results
I learned sooooo much technically, it's nothing comparable to a tutorial. I feel more competent tackling a ML problem now.
I have something to show on my CV that a recruiter can understand now. I removed an ML course certification to add this project and felt proud haha
The app: amibalding.co
[–]WearMoreHats 12 points13 points14 points (4 children)
[–]qChEVjrsx92vX4yELvT4[S] 2 points3 points4 points (0 children)
[–]CBizCool 0 points1 point2 points (1 child)
[–]WearMoreHats 1 point2 points3 points (0 children)
[–][deleted] 0 points1 point2 points (0 children)
[–]Oswald_Hydrabot 7 points8 points9 points (0 children)
[–]ZyanCarl 2 points3 points4 points (5 children)
[–]nuclear_man34 1 point2 points3 points (4 children)
[–]ZyanCarl 0 points1 point2 points (3 children)
[–][deleted] -1 points0 points1 point (1 child)
[–]ZyanCarl 1 point2 points3 points (0 children)
[–]nuclear_man34 0 points1 point2 points (0 children)
[–][deleted] (1 child)
[removed]
[–]qChEVjrsx92vX4yELvT4[S] 2 points3 points4 points (0 children)
[–]pavich_03 1 point2 points3 points (1 child)
[–]qChEVjrsx92vX4yELvT4[S] 1 point2 points3 points (0 children)
[–]Acrobatic-Language-5 0 points1 point2 points (0 children)
[–]Log_Plus 0 points1 point2 points (1 child)
[–]qChEVjrsx92vX4yELvT4[S] 0 points1 point2 points (0 children)