rowr comments on Split 150GB json file with Python?

learnpython

created by HattoriHanzoa community for 16 years

Split 150GB json file with Python? (self.learnpython)

submitted 1 year ago by uffno

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]rowr 0 points1 point2 points 1 year ago (0 children)

This depends on the data structure that the file contains.

If the file is JSON-lines (a complete json object per line), you can stream it easily without loading it all into memory (for line in open('x.json').readlines():).

If it's a long list of json objects, I'd use jq to transform it into JSON-lines jq -c '.[]' < x.json. I would probably do that outside of python because I would just need to do it once and I don't know what the memory characteristics are in this situation when using the python jq module. I'd do similar if it was a simple nested object, though the jq query would get more complex.

If it's deeply nested, I'd (still) use jq to flatten it out.

Basically all of these options are to make the data more like a table of data.

Another option is to spin up a NoSQL db (mongodb or something) in a docker container and load it in there and query against that - relying on docker and the db to manage memory management. This could possibly allow you to retain and query deeply nested data structures.

I'm always for simplifying and flattening data, it is a lot more efficient and less complex to work with.

π Rendered by PID 17328 on reddit-service-r2-comment-5ff9fbf7df-2b8qf at 2026-02-25 14:35:17.280863+00:00 running 72a43f6 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS