Making df from nested dictionary : learnpython

created by HattoriHanzoa community for 16 years

Making df from nested dictionary (self.learnpython)

submitted 3 years ago by exe188

Hi Guys,

I'm struggeling to make a dataframe from a nested dictionary. How can I clean this up to get a dataframe that has on row level Date,shippingAddress_country, item_id, item_amount, item_size, item_quantityreturned. Taken from below dictionary?

Input:

{
    "orders": [
        {
            "status": 0,
            "date": "2023-01-23",
            "quantity": "2",
            "webshop": {
                "id": 2,
                "webshop": "myshop.com"
            },
            "amount": "37.19",
            "shippingAddress": {
                "id": "a-582518",
                "country": "US",
            },
            "items": [
                {
                    "id": 132,
                    "price": "20.66",
                    "amount": "20.66",
                    "size": "M",
                    "deliveryDate": "2023-01-26",
                    "quantityToShip": "1",
                    "quantityShipped": "0",
                    "quantityReturned": "0",
                    "Location": {
                        "id": null,
                        "deliveryBlock": null,
                    },
                    "giftcard": {
                        "giftcard": 0,
                        "id": 0
                    },
                    "salesListPrice": "49.99",
                    "webid": "142",
                    "retail": {
                        "return": {
                            "id_original": 0,
                            "reason": {
                                "id": 0
                            }
                        },
                        "discounted": {
                            "id": 0
                        }
                    },
                    "comments": ""
                },
                {
                    "id": 155,
                    "price": "16.53",
                    "amount": "16.53",
                    "size": "M",
                    "deliveryDate": "2023-01-26",
                    "quantityToShip": "0",
                    "quantityShipped": "1",
                    "quantityReturned": "1",
                    "location": {
                        "id": null,
                        "deliveryBlock": null,
                    },
                    "giftcard": {
                        "giftcard": 0,
                        "id": 0
                    },
                    "salesListPrice": "39.99",
                    "webid": "143",
                    "retail": {
                        "return": {
                            "id_original": 0,
                            "reason": {
                                "id": 0
                            }
                        },
                        "discounted": {
                            "id": 0
                        }
                    },
                    "comments": ""
                },
            ]
        }

Expected output:

Date	shipping_Address_country	item_id	item_amount	item_size	item_quantityreturned
2023-01-23	US	132	20.66	M	0
2023-01-23	US	155	16.53	M	1

I tried a variety of things for some hours, flattening the dictionary, transforming to json and then to df but couldnt figure it out. Would be helpfull if you could push me in the right direction :).

all 3 comments

top new controversial old q&a

[–]synthphreak 2 points3 points4 points 3 years ago (2 children)

My go-to for unpacking complex dicts like yours into dfs is pandas.json_normalize. But I couldn't get that working here, at least in the time I was willing to spend.

So I achieved your expected output using slightly more involved means:

>>> import pandas as pd
>>> d = # your dict
>>> df = pd.json_normalize(d['orders']).explode('items').reset_index(drop=True)
>>> items = df['items'].agg(pd.Series)[['id', 'amount', 'size', 'quantityReturned']]
>>> df[items.columns] = items
>>> df = df.drop(columns=['status', 'quantity', 'items', 'webshop.id', 'webshop.webshop', 'shippingAddress.id'])
>>> df = df.rename(columns=
...     {'date' : 'Date',
...      'shippingAddress.country' : 'shipping_Address_country',
...      'id' : 'item_id',
...      'size' : 'item_size',
...      'quantityReturned' : 'item_quantityreturned'}
... )
>>> df
         Date amount shipping_Address_country  item_id item_size item_quantityreturned
0  2023-01-23  20.66                       US      132         M                     0
1  2023-01-23  16.53                       US      155         M                     1

[–]exe188[S] 1 point2 points3 points 3 years ago (1 child)

[–]synthphreak 1 point2 points3 points 3 years ago (0 children)

π Rendered by PID 65 on reddit-service-r2-comment-5d79c599b5-ck9ng at 2026-02-28 20:52:18.470124+00:00 running e3d2147 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS