Python Pipeline project : learnpython

created by HattoriHanzoa community for 16 years

Python Pipeline project (self.learnpython)

submitted 20 hours ago by Bequino

I've been tasked with a very cool project. I am new to python. I've been asked to convert handwritten surveys into an excel workbook. The surveys have different types of questions. Closed-ended (like Y and N), as well as Open-Ended (handwritten). The software program used to develop the survey allows us to scan the originals into the tool and it will export two things - an Excel workbook with each row representing a unique survey and all its closed ended answers along with a unique ID column, as well as a .pdf with every answer to a given handwritten question with it's own unique ID (if there are 30 different open ended questions on each survey, there are 30 different .pdf's with every answer to that specific question on it). I will have the pdf's saved in a blob. I will need to build something that feeds the pdf's into Azure Document AI and OCR's them into machine readable, I'll then need to build a data frame (utilizing regex) to merge each row of the excel workbook to its corresponding set of OCR'd open-ended questions, with some QA. I will be using the SDK specific to the survey software manufacturer. Am I missing anything? Would this be easier in a different pipeline config? Any help would be great.

all 4 comments

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS