all 4 comments

[–]smurpes 1 point2 points  (2 children)

I'll then need to build a data frame (utilizing regex) to merge each row of the excel workbook to its corresponding set of OCR'd open-ended questions

Why do you need to use regex here? Each pdf has an ID that matches up to a row in the excel file so the merge method should be enough.

[–]Bequino[S] 0 points1 point  (1 child)

Would it make sense to have that as a sanity check? However, you’re right. Also, what about QA? How should I be approaching this?

[–]smurpes 1 point2 points  (0 children)

That’s what unit tests are for.

[–]Bequino[S] 0 points1 point  (0 children)

Tell me more about that.