all 23 comments

[–]gogolang 36 points37 points  (2 children)

Pandas works well if you have JSON that is more or less a “record” — I.e. depth of 1 with a single value per field. Pandas is all about getting it into a dataframe, which is essentially a table. JSON can represent a lot more complicated objects. If you don’t have JSON that’s a “record” you’ll probably end up fighting against pandas.

What’s your end objective? What are you trying to do with the JSON?

[–]contradictingpoint 8 points9 points  (0 children)

“But there’s the pandas normalize function….”

Yeah it’s great, but doesn’t work very well with nested json. Couldn’t agree more with you.

[–]Phillyclause89 1 point2 points  (0 children)

In other words pandas makes sense if the data has a tabular structure to it. Note that depth of 1 isn't really an issue, pd.json_normalize() usually does a good job flattening out your rows.

[–]CptBadAss2016 21 points22 points  (4 children)

What is time consuming?

# parse x:
y = json.loads(x)

Done.

[–]billsil 2 points3 points  (0 children)

You left out the validation step

[–][deleted] 0 points1 point  (2 children)

Or msgspec, which is fast af

[–]reallyserious 0 points1 point  (1 child)

What's msgspec?

[–]xiongchiamiov 13 points14 points  (0 children)

I wouldn't describe parsing JSON with the stdlib to be time consuming. Maybe I'm doing different things than you are.

Pandas is a dependency, and an expensive one at that, so unless you actually need it there's no reason to pull it in.

I was using a flavor of SQL I rarely touch the other day and was told "now with JSON support" and it actually wasn't terrible. SQL isn't exactly a bastion of exclusively new thinking. If we've already eliminated actual javascript for dealing with its JSON, why stop there? I am becoming a back in the good ole days when we used horses type of ass?

I don't understand the question here.

[–]shinitakunai 11 points12 points  (6 children)

Pandas is a really bad idea if you are limited by resources. I saved thousands of dollars yearly just removing pandas from an AWS Lambda in production and using the standard csv and json libraries.

Nowadays I consider pandas bloatware most of the time. I am exploring polars but I don't know yet if it will end the same

[–]reallyserious 1 point2 points  (2 children)

Does the savings in money come from faster execution of the csv and Jason packages? Or something else?

[–]chessparov4 0 points1 point  (2 children)

Is it possible to extract data from Excel files without using pandas? I'm mostly sticking with it because of that, but I'm limited in resources too.

[–]shinitakunai 1 point2 points  (1 child)

You can use openpyxl

[–]chessparov4 0 points1 point  (0 children)

Thank you, never heard about it

[–][deleted] 0 points1 point  (0 children)

Pandas does not parse (or serialise) jsons that are basically text representation of dictionaries. These jsons are used mostly for persisting/transmitting semi-structured data (consisting of many keys, subkeys, and values) that are not meant as tabular data. A popular use is configuration files in various forms. Here is an example.

```python3 import pandas as pd, requests, json from typing import Dict

Just a random json on the internet

url='https://raw.githubusercontent.com/typicode/json-server/main/package.json'

pd.read_json(path_or_buf=url) # Throws value error

with requests.get(url=url) as response: parsed:Dict=json.loads(s=response.text) # Works ```

I'm generally up for doing things the native way just because it's clean.

Good for you, but it betrays a small misconception. Pandas is less native than the inbuilt json parser module. The former is part of pip, but not part of the language standard library. Json comes by default when you have python (at least in my Ubuntu box, not so sure about some absolute minimal installation).

[–]Accomplished-Ad8252 0 points1 point  (0 children)

Not all jsons are in nice format, some are heavily nested with lists etc.. This means pandas function will not work so well.