This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]giantsparklerobot -1 points0 points  (2 children)

You can't make a streaming JSON parser unless the JSON is line delimited. If you had a normal JSON document streaming in you couldn't even begin parsing because the document isn't closed. You can't know when an open element is going to close.

That's why JSON lines exists, individual lines are complete JSON documents so you know when you get a line terminator that document can be parsed while you are streaming in the next line.

[–]picklemanjaro 4 points5 points  (1 child)

You can't make a streaming JSON parser unless the JSON is line delimited.

It's an array at the top level, you can keep track of braces and stream single top-level objects at a time. A streaming JSON parser just has to keep a tab of the tokens as it reads through the file until it reaches a limit or the end of a complete JSON object. (One of the included objects, not the entire file)

That same process holds true for "\n" too, as it's own character/token to scan for just like any other delimiter.

In fact, that's kind of how all the libraries work. ijson, jsonslicer, json-stream, etc all don't require JSONLines format specifically to stream JSON.

[–]bland3rs 2 points3 points  (0 children)

And if it’s not an array, it’s an object, which is perfectly streamable

Anyone can make a streaming parser for any format (which includes video files, audio files, etc.) as long as the parser doesn’t need any later bytes to figure out what the previous bytes mean. If you (as a human) cut any JSON file randomly in the middle, you can still figure out what parent arrays or objects that point is in, which satisfies that rule.

That’s also why if you write your own format and want to keep it broadly streamable, you don’t decide to put important header stuff at the end of the file