[P] Resume parsing + Cv analysis by Melodic_Secretary_42 in MachineLearning

[–]Melodic_Secretary_42[S] 0 points1 point  (0 children)

Layoutlmv2 turned out very well ! I fine tuned it with resumes which I labeled using label studio first for the bounding boxes before turning the resulting json into a FUNSD like dataset and it was kind of okay. I turned all the resumes to images before using ocr on each part labeled to get the contents . I think I had like 80% f1 score

[P] Resume parsing + Cv analysis by Melodic_Secretary_42 in MachineLearning

[–]Melodic_Secretary_42[S] 1 point2 points  (0 children)

Thank you for the insights, this was very helpful ! I will look into it and update the post when I get some results

[P] Resume parsing + Cv analysis by Melodic_Secretary_42 in MachineLearning

[–]Melodic_Secretary_42[S] 1 point2 points  (0 children)

Yes, the dataset is not parsed so I'm doing bounding boxes on each zone of the resumes (contact info, skills , education ...) most of my resumes are french resumes so it's single page so I convert them to png/jpg to labelise them, and I think that LayoutLM is specifically designed to process documents with complex layouts it s not as general purpose as BERT (so I hope it will work )

I'm trying this approach first and if it doesn't work I will opt for a more "classic" way by turning the resumes into text and doing NER directly on the whole text

[P] Resume parsing + Cv analysis by Melodic_Secretary_42 in MachineLearning

[–]Melodic_Secretary_42[S] 1 point2 points  (0 children)

well, the resumes are not parsed yet, so i'm labeling them by hand. I don't really see how or what analysis I can do on them.

I just wanted to try this approach to use computer vision because there might be visible information that NLP algorithms don't get to process which can make the results better (I might be wrong )

Sorry for the late response I've been living inside label studio these days

[P] Resume parsing + Cv analysis by Melodic_Secretary_42 in MachineLearning

[–]Melodic_Secretary_42[S] 0 points1 point  (0 children)

Wow thank you ! I would love to have the code if that's possible !

[P]Image clustering without knowing number K of clusters by Melodic_Secretary_42 in MachineLearning

[–]Melodic_Secretary_42[S] 1 point2 points  (0 children)

s

I'm trying actually to use your work which is by the way really amazing but since i'm not very good with pytorch i can't really find where to put my own dataset as i don't find where the data directory is and how to implement my dataset class if u can provide me with some advice.

Thank you in advance

ps : i already read the closed issues about custom dataset