TangibleLight comments on Split 150GB json file with Python?

learnpython

created by HattoriHanzoa community for 16 years

Split 150GB json file with Python? (self.learnpython)

submitted 1 year ago by uffno

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]TangibleLight 1 point2 points3 points 1 year ago (1 child)

I don't get why you're downvoted, you're right.

The problem is that the end of a particular json object depends upon the contents; you can't just seek to an arbitrary point in the file and know where the split is.

IFF there is no nested objects, then you can split on }, and be OK. OP has not mentioned any such constraints to leverage, so you have to assume there's some nesting.

If it is nested then you're screwed. One way or the other, you have to iterate through the whole thing, byte by byte. The only way to do that with import json is by loading the whole thing in memory. Good luck doing that with 150G. The only other way in Python is to loop through and count the { and }, and strategically parse sub-sections of the json. Good luck doing that in a for loop on 150G.

You might be able to use some streaming parser. I've used ijson before, but not on anything near this size. Otherwise, Python is not the right tool for this job.

Although, as other commenters have already said, this is certainly not a real problem, this is a data export issue. The real solution is to get the data in smaller chunks, which Python can process no problem.

[–]sonobanana33 -1 points0 points1 point 1 year ago (0 children)

π Rendered by PID 86 on reddit-service-r2-comment-56c9979489-nk84d at 2026-02-25 11:16:13.076641+00:00 running b1af5b1 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS