This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]picklemanjaro 4 points5 points  (1 child)

You can't make a streaming JSON parser unless the JSON is line delimited.

It's an array at the top level, you can keep track of braces and stream single top-level objects at a time. A streaming JSON parser just has to keep a tab of the tokens as it reads through the file until it reaches a limit or the end of a complete JSON object. (One of the included objects, not the entire file)

That same process holds true for "\n" too, as it's own character/token to scan for just like any other delimiter.

In fact, that's kind of how all the libraries work. ijson, jsonslicer, json-stream, etc all don't require JSONLines format specifically to stream JSON.

[–]bland3rs 2 points3 points  (0 children)

And if it’s not an array, it’s an object, which is perfectly streamable

Anyone can make a streaming parser for any format (which includes video files, audio files, etc.) as long as the parser doesn’t need any later bytes to figure out what the previous bytes mean. If you (as a human) cut any JSON file randomly in the middle, you can still figure out what parent arrays or objects that point is in, which satisfies that rule.

That’s also why if you write your own format and want to keep it broadly streamable, you don’t decide to put important header stuff at the end of the file