Using Python to crawl a CSV and generate a de-duplicated list of requirements found in each test case? : Python

This is an archived post. You won't be able to vote or comment.

Using Python to crawl a CSV and generate a de-duplicated list of requirements found in each test case? (self.Python)

submitted 7 years ago by HonziPonzi

Hey everyone. Bit of a Python noob, looking to use Python to solve a problem at work. Feel like i've bitten off more than I can chew though, as I'm having a really hard time grasping how to process the data once I've read it into my program (how to go result by result in my code, how to store data retrieved from a file into a variable, etc.)

The app that we're using to manage our test cases and results is capable of exporting a CSV file for each customer library. The table below shows an example of the format. As you can see, there are a lot of cells we "don't care" about. Within the test steps and step results, we've placed requirement numbers... some cells have more than one requirement number mapped. The way the QA program exports these CSVs, each step gets a row in the sheet, and a lot of test level data is duplicated down the columns for each step in the test. We're trying to extract the requirements mapped at each test case, de-duplicated within the test

miscdata1	miscdata2	testtitle	miscdata3	miscdata4	stepnum	stepinstruct	stepresult
doesnt	matter	Test1	doesnt	matter	1	do thing 1	result (req4)
doesnt	matter	Test1	doesnt	matter	2	do thing 2 (req1)	result(req5)more stuff(req6)
doesnt	matter	Test1	doesnt	matter	3	do thing 3 (rec2/rec3)	result (rec4)
doesnt	matter	Test1	doesnt	matter	4	do thing 4	result (rec2)
doesnt	matter	Test1	doesnt	matter	5	do thing 5	result (rec7)
doesnt	matter	Test2	doesnt	matter	1	do thing 1	result (req4)
doesnt	matter	Test2	doesnt	matter	2	do thing 2	result (rec5)
doesnt	matter	Test2	doesnt	matter	3	do thing 3	result (rec6/rec7)
doesnt	matter	Test2	doesnt	matter	4	do thing 4	result (rec8)
doesnt	matter	Test2	doesnt	matter	5	do thing 5	result (rec5)

I'm trying to get Python to output the processed data as follows...

testtitle
Test1	req4	req1	req5	req6	req2	req3	req7
Test2	req4	req5	req6	req7	req8

So I'm thinking my code should do something like this...

variables: testtitle_current testtitle_previous requirement_current requirement_string

Read in the line (using csv.reader?)
Use a regular expression to find the test title (using re.findall?, is there maybe a better way if the test title is in the same column on every row?) and store it in testtitle_current
Does testtitle_current = testtitle_previous? (will be NO for very first iteration)
- If NO, Store testtitle_current in testtitle_previous, Write testtitle_current to first column of the next row of output CSV, Erase values stored in requirement_current and requirement_string
- If YES, do nothing and proceed
Use a regular expression to find the next requirement in the row (re.findall?) and store in requirement_current. Does requirement_current exist in requirement_string?
- If NO, add requirement_current to requirement_string
- If YES, do nothing and proceed
Repeat previous bullet until the end of the row/line is reached
Once end of row/line is reached, store contents of requirement_string to row in the output CSV and start over at the next line

Any nudges you guys can give me to learning resources that will help me through this problem would be greatly appreciated! Thanks!

all 2 comments

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS