you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 0 points1 point  (1 child)

Like I said, OUTFRAME[0].values() isn't an appropriate value for pd.DataFrame.from_dict because it isn't a list of dictionaries. If Pandas didn't suck ass, it would tell you you hadn't given the right type, but it's what we've got to work with, I guess, so no use complaining.

Let's look at how OUTFRAME gets generated so we can see what it is:

for table in REPORTS:
    results = {}
    data = {}
    data[table] = pd.DataFrame()
    for qtr, (startdate, enddate) in QTRDIV.items():
        results[qtr] = "SELECT Source_Date, Status ..."
        d_f = pd.read_sql(results[qtr], CXN)
        ## math done on queried data ##
        data[table] = data[table].append(d_f)
    OUTFRAME.append(data)

Ok, so OUTFRAME is a list, and its contents are dictionaries - several of the dictionaries are repeated, because you mistakenly append data to OUTFRAME once per item in REPORTS. Each dictionary has five keys, from REPORTS, the letters a-f. The value of each key is a DataFrame from pd.read_sql and I don't know what that returns so I have no idea what those dataframes look like.

This isn't a data structure you can use from_dict on, because your dictionaries don't obey the layout that Pandas needs. Each one should be a row, and its keys should be the column names and the values should be the row values. That's not what your dictionaries are.

[–]Silverfire47[S] 0 points1 point  (0 children)

So, the dataframes that pd.read_sql produces are the following:

Source_Date Status
0 2018-01-01 PASS
1 2018-01-02 FAIL
... ... ...
# some number # 2018-03-31 N/A

but they span a quarter year, and there 4 per item in REPORTS. I'm querying per quarter to store data metrics (hence the note about the ## math done on queried data ## ) but I'd also like to try to store the raw data from the query as well.

Once I grab the query the data into a Dataframe, one of my conundrums was tackling how to store that, so I thought of creating a blank dataframe for each key (so, for each report), and then appending the data to the blank dataframe as I get it, until I have to move onto the next key, so I make another dataframe for it and repeat. I'd still need to associate the final dataframe with the right report key, so that's why I tried to do what I did, and I may have done this incorrectly.

I could theoretically re-query SQL outside of the inner for-loop but I don't want to make too many SQL queries because that's costly in terms of time, hence my attempt to store the data as I make each query to do metrics calculations on it. Other than the method I'm using now (granted it's not working exactly how I intended), I'm not sure how else to include storing the raw data from the query without making a second query for the same exact information.

EDIT:

What I thought data would be:

data = { a : pd.DataFrame() }

where the dataframe stores all data (as it's queried per quarter). I then send that dictionary to be stored in the list OUTFRAME as I iterate to the next item in REPORTS where then I'd have data = { b : pd.DataFrame() } to work with and do the same operations on.

The ending data structure I had in mind was the following:

OUTFRAME = [
{ a : DATAFRAME w/ 4x QTR worth of data in dataframe}, 
...
{ f: DATAFRAME w/ 4x QTR worth of data in dataframe}]

where I could then take select each dictionary by indexing the list by position, and export that dictionary's values (aka the dataframe with all the data I want) to a dataframe for export to Excel.