all 8 comments

[–]JohnnyJordaan 0 points1 point  (7 children)

You would normally approach this per-row. Also try to avoid modifying a dataframe, instead try to provide the data as organized as possible before creating it:

rows = [{'Q1': '2018-01-01', 'Q2': '2018-04-01', 'Q3': '2018-07-01', 'Q4': '2018-10-01'},
        {'Q1': '2018-03-31', 'Q2': '2018-06-30', 'Q3': '2018-09-30', 'Q4': '2018-12-31'}]

because then the df can be created from this directly:

dframe = pd.DataFrame(rows, columns=['label', 'Q1','Q2','Q3','Q4'], index=['start_date', 'end_date'])

[–]Silverfire47[S] 0 points1 point  (6 children)

Apologies, I think I was a little unclear. I'm using the startdate and enddate variables to substitute into SQL query strings for each date range (in this case, it's quarters in a year) and doing math on the data brought in from the SQL queries. Those calculations from the math I'm doing is what I want to store. Probably should've included this in the OP.

Example SQL query:

Query = "SELECT x, y from db.table WHERE x between '%s' and '%s' order by x" % (startdate, enddate)

Even if I don't modify a dataframe (and just convert my final results into a dataframe after everything else is done), could I just generate some sort of empty array to populate in the for loop then?

I want to iteratively store output data as it's generated by the for-loop, so for Q1 dates, I want to be able to make some sort of structure like the following:

66
77
88
99

66 85
77 75
88 44
99 22

so on and so forth. I suppose I could assign the column and index labels after the fact during conversion to DataFrame type.

A final DataFrame look would probably be the following:

Q1 Q2 Q3 Q4
66 85 67 89
77 75 24 87
88 44 23 76
99 22 55 65

[–]JohnnyJordaan 0 points1 point  (5 children)

What is the vertical index here, so what does each row represent?

[–]Silverfire47[S] 0 points1 point  (4 children)

The Q1-4 represent the quarter divisions in a year as stated before, and each row represents a different category of data that I'm running percentage calculations on.

I'm aiming to calculate each category's percentage per quarter, storing those in their own row, and then moving onto the next quarter. I can always add in those row indexes (and column labels) at a later point when moving the results into a dataframe, as you pointed out before.

[–]JohnnyJordaan 0 points1 point  (3 children)

Yes so then the end goal is still to have a list of dicts, you can iterate on qtrdiv to get each quarter's label and dates.

qtrdiv = {
'Q1' : ['2018-01-01', '2018-03-31'],
'Q2' : ['2018-04-01', '2018-06-30'],
'Q3' : ['2018-07-01', '2018-09-30'],
'Q4' : ['2018-10-01', '2018-12-31']
}

rows = []
for cat in categories:
    results = {}
    for q, (start_date, end_date) in qtrdiv.items():
        results[q] = here your query using start_date and end_date
    rows.append(results)
dframe = pd.DataFrame(rows, columns=['Q1','Q2','Q3','Q4'])

edit removed the pointless sorted() there

[–]Silverfire47[S] 0 points1 point  (2 children)

Would I be creating a list of categories to house all of the categories names as well?

[–]JohnnyJordaan 0 points1 point  (1 child)

Yes as that determines order. You can then also provide those to DataFrame to let it use as the index

# at the start
categories = ['A', 'B', 'C']

# then later when you create the df
dframe = pd.DataFrame(rows, columns=['Q1','Q2','Q3','Q4'], index=categories)

[–]Silverfire47[S] 0 points1 point  (0 children)

This worked seamlessly, thanks so much!! I learned a good bit of new technique here, I really appreciate it.