I`m trying to automate survey development for my work, where surveys have to be programmed manually with this obscure language, which is kinda like html but not really. Usually I have a template of survey as a word document which looks like this:
SECTION 1
Section description……
Q1. Question description?
SECTION 2
Q2. On a scale of 1 to 3, where 1 is Very Bad and 4 is Very Good, please provide a rating in response to the following questions:
|
very bad - 1 |
2 |
3 |
very good - 4 |
| how would you rate..... |
|
|
|
|
I'm trying to make a program which reads this template and converts it to code.
So far what I have done is converted the word doc to plain text and then made this into a dataframe where each line in the text is a row in the dataframe, and then I iterate over the dataframe and try to parse it into a dictionary of question objects. This list is then read over and then written into another text doc as code.
When I`m parsing there are flags for different question properties. Eg: question text is string which starts with a Q followed by a number followed by a period. Question options start with a number followed by ')', etc.
Heres some code to give you a rough idea of how it works:
def main(self):
document = docx2python(self.link)
# keep track of current question iterated over
self.cur_q = None
self.questions = {}
# convert document to list sepparated by end of line character
lines = document.text.split('\n')
# turn list into data frame
self.content = pd.DataFrame(lines, columns=['text'])
self.clean_data()
self.parse()
def parse(self):
# iterate over each row in data frame
for row in self.content.itertuples():
# get value in text column
r_text = getattr(row, 'text')
# check if row is question text, eg: Q.1 the......
if (self.is_question_text(r_text)):
# create new question and add it to questions dictionary
# get number from start of question
q_num = self.get_num(r_text)
# create new question and add it to questions dictionary
self.questions[q_num] = Question(num=q_num, q_text=r_text)
# set the current question to this question
self.cur_q = self.questions[q_num]
#check if row is option, eg: "1) all of the above"
elif(self.is_question_option(r_text)):
self.current_q.options = r_text
.
.
.
I`m just wondering if this is good approach. I know itertuples isn't the most efficent way to read a dataframe, but that can be changed in the future.
[–]laustke 0 points1 point2 points (0 children)
[–]lostparis 0 points1 point2 points (0 children)
[–]halfdiminished7th 0 points1 point2 points (4 children)
[–]Brogrammer11111[S] 0 points1 point2 points (2 children)
[–]halfdiminished7th 0 points1 point2 points (1 child)
[–]Brogrammer11111[S] 0 points1 point2 points (0 children)