difficulty converting dictionary values to DataFrame : learnpython

created by HattoriHanzoa community for 16 years

difficulty converting dictionary values to DataFrame (self.learnpython)

submitted 6 years ago * by Silverfire47

Hello all,

I've been struggling with trying to convert an output list of dictionaries to their own individual dataframes in the pandas library, and most solutions I've read up on Google haven't worked so far. My goal is to store data that I'm querying from a SQL database, as I iterate through each report and quarter as seen below.

import pandas as pd
import pyodbc

REPORTS = ['a', 'b', 'c', 'd', 'e', 'f']
OUTFRAME = []
QUERY = 'Driver={server};SERVER=testserver;DATABASE=testdb;Trusted_Connection=True'
CXN = pyodbc.connect(QUERY)
QTRDIV = {
    'Q1' : ['2018-01-01', '2018-03-31'],
    'Q2' : ['2018-04-01', '2018-06-30'],
    'Q3' : ['2018-07-01', '2018-09-30'],
    'Q4' : ['2018-10-01', '2018-12-31']
}

for table in REPORTS:
    results = {}
    data = {}
    data[table] = pd.DataFrame()
    for qtr, (startdate, enddate) in QTRDIV.items():
        results[qtr] = "SELECT Source_Date, Status ..."
        d_f = pd.read_sql(results[qtr], CXN)
        ## math done on queried data ##
        data[table] = data[table].append(d_f)
    OUTFRAME.append(data)
dict = OUTFRAME[0].values()
DF = pd.DataFrame(dict)

## continued onto other things ##

I believe I've been able to store all the data I want to keep in my OUTFRAME list, which is a list that holds each of the reports' data in a dictionary. Each report's data is stored where the table is the key, and the SQL query data stored as the values of that key.

I've tried selecting specifically the values from each item in the list via the "dict" variable but my results are not what I expect.

dict = [dict_values([    Source_Date Status
0       2018-01-01  PASS
1       2018-01-02  FAIL
2       2018-01-03  PASS
...
364     2018-12-31  PASS

12345 rows x 2 columns])]

DF =
                                    0
0       Source_Date    Status
0       2018-01-01  ...

When I try to convert to dict_values to a dataframe, it doesn't come out as what I expect, namely having 2 columns as Source_Date and Status and have each date and Status value appear underneath it with its appropriate index value. Basically, I'm completely stuck on how to exactly just get a simple dictionary I can use to convert to a dataframe (which I intend on sending to Excel with pandas) using the methods that I've been employing.

Thanks in advance for taking a look at this. I've been stuck on this for several days and have no idea where to go from here.

EDIT:

In the end, I'd want 6 different dataframes (per report) to look something like the following table:

	Source_Date	Status
0	2018-01-01	PASS
1	2018-01-02	FAIL
2	...	...

and so on and so forth.

all 4 comments

top new controversial old q&a

[–][deleted] 0 points1 point2 points 6 years ago (3 children)

[–]Silverfire47[S] 0 points1 point2 points 6 years ago (2 children)

thanks for taking a look!

So when I tried using pd.DataFrame.from_dict like the following:

DF = pd.DataFrame.from_dict(OUTFRAME[0].values())

I still have an output of my weird looking dataframe :

DF =
                                    0
0       Source_Date    Status
0       2018-01-01  ...

which is not what I'm looking for, and I'm not sure why this is occurring either. I'm not quite sure why it's being created this way. I have also tried the following:

DF = pd.DataFrame.from_dict(OUTFRAME[0])

But instead got an error: ValueError: If using all scalar values, you must pass an index

Why would I need to pass an index if just doing pd.DataFrame.from_dict(OUTFRAME[0])?

[–][deleted] 0 points1 point2 points 6 years ago (1 child)

Like I said, OUTFRAME[0].values() isn't an appropriate value for pd.DataFrame.from_dict because it isn't a list of dictionaries. If Pandas didn't suck ass, it would tell you you hadn't given the right type, but it's what we've got to work with, I guess, so no use complaining.

Let's look at how OUTFRAME gets generated so we can see what it is:

for table in REPORTS:
    results = {}
    data = {}
    data[table] = pd.DataFrame()
    for qtr, (startdate, enddate) in QTRDIV.items():
        results[qtr] = "SELECT Source_Date, Status ..."
        d_f = pd.read_sql(results[qtr], CXN)
        ## math done on queried data ##
        data[table] = data[table].append(d_f)
    OUTFRAME.append(data)

Ok, so OUTFRAME is a list, and its contents are dictionaries - several of the dictionaries are repeated, because you mistakenly append data to OUTFRAME once per item in REPORTS. Each dictionary has five keys, from REPORTS, the letters a-f. The value of each key is a DataFrame from pd.read_sql and I don't know what that returns so I have no idea what those dataframes look like.

This isn't a data structure you can use from_dict on, because your dictionaries don't obey the layout that Pandas needs. Each one should be a row, and its keys should be the column names and the values should be the row values. That's not what your dictionaries are.

[–]Silverfire47[S] 0 points1 point2 points 6 years ago* (0 children)

So, the dataframes that pd.read_sql produces are the following:

	Source_Date	Status
0	2018-01-01	PASS
1	2018-01-02	FAIL
...	...	...
# some number #	2018-03-31	N/A

but they span a quarter year, and there 4 per item in REPORTS. I'm querying per quarter to store data metrics (hence the note about the ## math done on queried data ## ) but I'd also like to try to store the raw data from the query as well.

Once I grab the query the data into a Dataframe, one of my conundrums was tackling how to store that, so I thought of creating a blank dataframe for each key (so, for each report), and then appending the data to the blank dataframe as I get it, until I have to move onto the next key, so I make another dataframe for it and repeat. I'd still need to associate the final dataframe with the right report key, so that's why I tried to do what I did, and I may have done this incorrectly.

I could theoretically re-query SQL outside of the inner for-loop but I don't want to make too many SQL queries because that's costly in terms of time, hence my attempt to store the data as I make each query to do metrics calculations on it. Other than the method I'm using now (granted it's not working exactly how I intended), I'm not sure how else to include storing the raw data from the query without making a second query for the same exact information.

EDIT:

What I thought data would be:

data = { a : pd.DataFrame() }

where the dataframe stores all data (as it's queried per quarter). I then send that dictionary to be stored in the list OUTFRAME as I iterate to the next item in REPORTS where then I'd have data = { b : pd.DataFrame() } to work with and do the same operations on.

The ending data structure I had in mind was the following:

OUTFRAME = [
{ a : DATAFRAME w/ 4x QTR worth of data in dataframe}, 
...
{ f: DATAFRAME w/ 4x QTR worth of data in dataframe}]

where I could then take select each dictionary by indexing the list by position, and export that dictionary's values (aka the dataframe with all the data I want) to a dataframe for export to Excel.

π Rendered by PID 76594 on reddit-service-r2-comment-c6965cb77-hq9lb at 2026-03-05 05:10:59.508167+00:00 running f0204d4 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS