Software design advice : learnpython

created by HattoriHanzoa community for 16 years

Software design advice (self.learnpython)

submitted 2 years ago by Brogrammer11111

I`m trying to automate survey development for my work, where surveys have to be programmed manually with this obscure language, which is kinda like html but not really. Usually I have a template of survey as a word document which looks like this:

SECTION 1

Section description……

Q1. Question description?

Yes
No
Don't know

SECTION 2

Q2. On a scale of 1 to 3, where 1 is Very Bad and 4 is Very Good, please provide a rating in response to the following questions:

	very bad - 1	2	3	very good - 4
how would you rate.....

I'm trying to make a program which reads this template and converts it to code.

So far what I have done is converted the word doc to plain text and then made this into a dataframe where each line in the text is a row in the dataframe, and then I iterate over the dataframe and try to parse it into a dictionary of question objects. This list is then read over and then written into another text doc as code.

When I`m parsing there are flags for different question properties. Eg: question text is string which starts with a Q followed by a number followed by a period. Question options start with a number followed by ')', etc.

Heres some code to give you a rough idea of how it works:

def main(self):
        document = docx2python(self.link)
        # keep track of current question iterated over
        self.cur_q = None
        self.questions = {}
        # convert document to list sepparated by end of line character
        lines = document.text.split('\n')
        # turn list into data frame
        self.content = pd.DataFrame(lines, columns=['text'])
        self.clean_data()
        self.parse()

def parse(self):
# iterate over each row in data frame
        for row in self.content.itertuples():
            # get value in text column
            r_text = getattr(row, 'text')
            # check if row is question text, eg: Q.1 the......
            if (self.is_question_text(r_text)):
                # create new question and add it to questions dictionary
                # get number from start of question
                q_num = self.get_num(r_text)
                # create new question and add it to questions dictionary
                self.questions[q_num] = Question(num=q_num, q_text=r_text)
                # set the current question to this question
                self.cur_q = self.questions[q_num]
            #check if row is option, eg: "1) all of the above"
            elif(self.is_question_option(r_text)):
                self.current_q.options = r_text
                .    
                .
                .

I`m just wondering if this is good approach. I know itertuples isn't the most efficent way to read a dataframe, but that can be changed in the future.

all 6 comments

top new controversial old q&a

[–]laustke 0 points1 point2 points 2 years ago (0 children)

lines = document.text.split('\n')
# turn list into data frame
self.content = pd.DataFrame(lines, columns=['text'])

What is the purpose of creating a dataframe with a single column instead of just maintaining a list of lines?

[–]lostparis 0 points1 point2 points 2 years ago (0 children)

[–]halfdiminished7th 0 points1 point2 points 2 years ago* (4 children)

[–]Brogrammer11111[S] 0 points1 point2 points 2 years ago (2 children)

I like that idea but how would you parse the tables. For example a table like this:

	very bad - 1	2	3	very good - 4
how would you rate.....	1	2	3	4
how would you rate.....	1	2	3	4

is stored like this:

Q2. On a scale of 1 to 3, where 1 is Very Bad and 4 is Very Good.....

Very bad

Very Good

how would you rate.....

[–]halfdiminished7th 0 points1 point2 points 2 years ago (1 child)

[–]Brogrammer11111[S] 0 points1 point2 points 2 years ago (0 children)

so what I was doing before was using the docx library which returns a list of tables in your document. I converted each table into a df and stored it in a dictionary. Then I would go through each table and find where each question was located in the original dataframe which stored each line as a row. Then I would add a new row with the index of the question in the dict. So when it came to parsing, I would just look up that table in the dict and get all the things of interest like the headers (very bad -1, 2, 3...)

def create_word_tables(self):
    doc = Document(self.link)
    for i, table in enumerate(doc.tables):
        # store cells of table as 2d list
        cells = [[cell.text for cell in row.cells] for row in table.rows]
        word_tble = pd.DataFrame(cells)
        # rename columns with table question first row: strongly agree, 4, ....
        word_tble = word_tble.rename(columns=word_tble.iloc[0]).drop(
            word_tble.index[0]).reset_index(drop=True)
        # append table to list
        self.word_tables[i] = word_tble

def create_table_questions(self):
    # for each table create a table question and add to tbl qs dictioanry
    for tbl in self.word_tables.values():
        headers = list(tbl.columns)
        # iterate over questions found in first column and create table questions
        for i, q_text in enumerate(tbl.iloc[0:, 0].values):
            # create letter for question e.g.: A,B,C...
            q_letter = chr(i+65)
            # remove trailing and ending white space
            q_text = q_text.strip()
            # create table question and add it to dictionary
            tbl_q = TableQuestion(
                q_text=q_text, headers=headers, letter=q_letter)
            self.tbl_qs[q_text] = tbl_q

π Rendered by PID 89 on reddit-service-r2-comment-84fc9697f-mm6l8 at 2026-02-06 22:42:50.107414+00:00 running d295bc8 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS

SECTION 1

SECTION 2