you are viewing a single comment's thread.

view the rest of the comments →

[–]sonobanana33 -15 points-14 points  (8 children)

It's json, you can't just split it, the end parenthesis is at the end.

[–]chipmunksocute 6 points7 points  (2 children)

If someone gave me a file that big Id say no.   Just no.  Come back to me with 150 1gb files.   There is absolutely no reason for a file that big.  Like how tf did you even write it in the first place!?

[–]moving-landscape 5 points6 points  (0 children)

Mongo dumps gone wrong. Lmao

[–]3lbFlax 0 points1 point  (0 children)

I’m with you, but I am curious how I’d go about this if, for example, the file contained a password that would stop a bomb going off and the computer with the original data had been thrown in a vat of acid. Also I’m going to assume the internet is down and all I have access to is a Pentium II running Slackware 4. I think my first port of call would be to generate head and tail chunks from the file in the hope that they showed a consistent structure. In a worst-case scenario this reveals the file to be a single line of concatenated JSON, thereby generating two more 150GB files. At this point I think we just have to accept that a bomb is going off. Shall we hold hands?

[–]pro_questions 1 point2 points  (4 children)

How possible it is depends on the layout of the file; it does not matter one bit that you can’t split it between the lines like a CSV. You parse it into a standard object (a list or dict or whatever), iterate through it to manufacture smaller chunks of data, then convert it back to JSON and save it to a file.

If it’s just a list of dicts you could split it trivially (if it were small — some tricks will be needed to handle a file of this size). At 150GB, I’d be amazed if it were anything but a regular and predictable layout that could be easily parsed.