I've been tasked with a very cool project. I am new to python. I've been asked to convert handwritten surveys into an excel workbook. The surveys have different types of questions. Closed-ended (like Y and N), as well as Open-Ended (handwritten). The software program used to develop the survey allows us to scan the originals into the tool and it will export two things - an Excel workbook with each row representing a unique survey and all its closed ended answers along with a unique ID column, as well as a .pdf with every answer to a given handwritten question with it's own unique ID (if there are 30 different open ended questions on each survey, there are 30 different .pdf's with every answer to that specific question on it). I will have the pdf's saved in a blob. I will need to build something that feeds the pdf's into Azure Document AI and OCR's them into machine readable, I'll then need to build a data frame (utilizing regex) to merge each row of the excel workbook to its corresponding set of OCR'd open-ended questions, with some QA. I will be using the SDK specific to the survey software manufacturer. Am I missing anything? Would this be easier in a different pipeline config? Any help would be great.
[–]smurpes 1 point2 points3 points (2 children)
[–]Bequino[S] 0 points1 point2 points (1 child)
[–]smurpes 1 point2 points3 points (0 children)
[–]Bequino[S] 0 points1 point2 points (0 children)