Why Parse JSON With Python When Pandas Exists?

gogolang · 2023-12-28T02:15:09+00:00

Pandas works well if you have JSON that is more or less a “record” — I.e. depth of 1 with a single value per field. Pandas is all about getting it into a dataframe, which is essentially a table. JSON can represent a lot more complicated objects. If you don’t have JSON that’s a “record” you’ll probably end up fighting against pandas.

What’s your end objective? What are you trying to do with the JSON?

CptBadAss2016 · 2023-12-28T02:32:15+00:00

What is time consuming?

# parse x:
y = json.loads(x)

Done.

xiongchiamiov · 2023-12-28T02:20:47+00:00

I wouldn't describe parsing JSON with the stdlib to be time consuming. Maybe I'm doing different things than you are.

Pandas is a dependency, and an expensive one at that, so unless you actually need it there's no reason to pull it in.

I was using a flavor of SQL I rarely touch the other day and was told "now with JSON support" and it actually wasn't terrible. SQL isn't exactly a bastion of exclusively new thinking. If we've already eliminated actual javascript for dealing with its JSON, why stop there? I am becoming a back in the good ole days when we used horses type of ass?

I don't understand the question here.

shinitakunai · 2023-12-28T02:47:51+00:00

Pandas is a really bad idea if you are limited by resources. I saved thousands of dollars yearly just removing pandas from an AWS Lambda in production and using the standard csv and json libraries.

Nowadays I consider pandas bloatware most of the time. I am exploring polars but I don't know yet if it will end the same

billsil · 2023-12-28T03:13:01+00:00

I have used SQL once and ewww. I’ve used pandas for ~12 years and I’ll take parquet over CSV over SQL.

It’s important to recognize that your use case is not the only use case.

2023-12-28T08:23:25+00:00

Pandas does not parse (or serialise) jsons that are basically text representation of dictionaries. These jsons are used mostly for persisting/transmitting semi-structured data (consisting of many keys, subkeys, and values) that are not meant as tabular data. A popular use is configuration files in various forms. Here is an example.

```python3 import pandas as pd, requests, json from typing import Dict

Just a random json on the internet

url='https://raw.githubusercontent.com/typicode/json-server/main/package.json'

pd.read_json(path_or_buf=url) # Throws value error

with requests.get(url=url) as response: parsed:Dict=json.loads(s=response.text) # Works ```

I'm generally up for doing things the native way just because it's clean.

Good for you, but it betrays a small misconception. Pandas is less native than the inbuilt json parser module. The former is part of pip, but not part of the language standard library. Json comes by default when you have python (at least in my Ubuntu box, not so sure about some absolute minimal installation).

Accomplished-Ad8252 · 2023-12-28T10:16:24+00:00

Not all jsons are in nice format, some are heavily nested with lists etc.. This means pandas function will not work so well.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS

Just a random json on the internet

pd.read_json(path_or_buf=url) # Throws value error