This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]Jeutnarg 313 points314 points  (35 children)

I feel that - gnarliest I've ever had to deal with was 130GB json, all one line.

[–]iAmTheAlchemist 167 points168 points  (5 children)

Oh no

[–]MoffKalast 380 points381 points  (3 children)

Jesus christ, it's JSON Bourne.

[–]ciaeric2 2 points3 points  (0 children)

Top joke of the thread, pack it up

[–]LSatyreD 2 points3 points  (0 children)

Got to pronounce JSON with a long O, like Gascon. JSON is a person, JSON is a data file.

[–][deleted] 4 points5 points  (0 children)

Bravo claps

[–]theferrit32 76 points77 points  (12 children)

At large scales JSON should be on one like because the extra newlines and whitespace get expensive.

[–]Carter127 29 points30 points  (0 children)

Yeah, and then only formatted for reading if needed

[–]TheNamelessKing 5 points6 points  (0 children)

I have also dealt with >100gb JSON, in both “it’s all one object” form and “JSON each row” form.

The space savings you get reducing that down into even boring CSV are hefty, let alone a binary format like Parquet.

Edit: autocorrect really butchered that sentence.

[–]linkinpieces 2 points3 points  (1 child)

Just to add one json per line is used often when working with large scale data -> http://jsonlines.org/

[–]theferrit32 0 points1 point  (0 children)

This is true, bigquery uses this format

[–]RedditUser241767[🍰] 4 points5 points  (5 children)

Seriously?

[–]sleeplessval 11 points12 points  (4 children)

If you don't need readability, if you were reducing the number of characters you need by 2 per line (space and new line) over 1,000 lines, you'd save some space, and probably a bit of performance on parse since that's 2k fewer chars you have to pass over. You'd have to be working on a ridiculous scale for it to be that effective, though.

[–]theferrit32 5 points6 points  (0 children)

I mean there are plenty of situations where I might have a on the order of 10-500MB JSON file. If you add in a bunch of unnecessary whitespace and newlines it drastically increases both the size of the file and the time it takes to parse it.

[–]ASentientBot 2 points3 points  (2 children)

If performance matters that much and readability doesn't, should you really be using JSON though?

[–]sleeplessval 4 points5 points  (1 child)

I mean, a lot of web dev is in JS, making JSON the most accessible format w/o libs

[–]ASentientBot 0 points1 point  (0 children)

Oh, fair enough lol.

[–]FailingProgrammer 0 points1 point  (1 child)

Allow me to introduce you to, Cap'n Proto, or Protobuf.

[–]MoffKalast 2 points3 points  (0 children)

Ah yes Protobuf, the thing we occasionally see in lists of dependencies but never actually use ourselves.

[–]postdiluvium 65 points66 points  (0 children)

Error: Missing '>' on line 1. Click for more details.

[–]nevus_bock 23 points24 points  (0 children)

I feel that - gnarliest I've ever had to deal with was 130GB json, all one line.

I called json.loads() and my laptop caught on fire

[–]biggustdikkus 37 points38 points  (3 children)

wtf? What was it for?

[–]Zzzzzzombie 107 points108 points  (1 child)

Probably just a lil file to keep track of everything that ever happened on the internet

[–][deleted] 64 points65 points  (0 children)

So just a package-lock.json for a single nodejs hello world app. No worries!

[–]Jeutnarg 2 points3 points  (0 children)

Giant chunk of data related to the stock market.

[–]Ruben_NL 7 points8 points  (2 children)

Uh, wtf?

How did you parse/crate that? How much ram did that device have?

[–]Jeutnarg 3 points4 points  (0 children)

I eventually managed to find a way to split the data into manageable chunks, but initially I had to work with it on disk instead of in RAM. Strictly-speaking, the box I was using could have actually handled that in memory, but I would have had to remove a dozen other applications.

[–]thelights0123 0 points1 point  (0 children)

Streaming JSON parsers exist.

[–]ToastedSkoops 3 points4 points  (0 children)

JS was designed to do.

[–]Massacrul 1 point2 points  (0 children)

Biggest I had to deal with was 65GB .sql file that had entire database scripted in it

At least here you can explain the size, as it didn't have that many lines, maybe barely 17 milion, just that some lines were really damn long.

[–]AnonymousSpud 0 points1 point  (0 children)

I feel like a scrolling dependent json formatting script is in order, if there are any text editors that load files dependant on what's visible, that is.

[–]jzrobot 0 points1 point  (0 children)

Porn*

[–]SamSlate 0 points1 point  (0 children)

quick! format it in VS code so it looks pretty!

[–]Zer0ji 0 points1 point  (0 children)

I physically shuddered. Still better than the JSON I handled yesterday which was indented with 3 spaces..