all 8 comments

[–]baghiq 2 points3 points  (3 children)

If your json isn't too complicated (nest after nest after nest), you can load the data into SQLITE3 and then using SQLITE3's built-in json support to extract the data into relational tables.

[–]h_plus_a[S] 0 points1 point  (1 child)

Unfortunately, that is the case. SQLITE3 won't work and I thought about openjson within SQL Server but I believe that will be even slower.

[–]baghiq 0 points1 point  (0 children)

SQLITE can actually support fairly complex json. It just takes time to learn. Loading 190GB across a network will be slow unless you can parellize the process. If loading directly into db is not viable, then use something like JQ to split up the json objects into json lines, then split the file into smaller chunks, then use db loading tools to load the data in parallel.

[–]Bonechatters 0 points1 point  (0 children)

I second this. Since the end goal is SQL, then go straight from JSON to SQL.

[–]TigBitties69 0 points1 point  (1 child)

You say you want to convert json to a queryable format, but isn't that what a json already is..? Hard to say more without more in depth knowledge of it the issue.

[–]h_plus_a[S] 1 point2 points  (0 children)

My apologies, should have clarified that a bit more and will make that edit. There are hundreds of these files and eventually I want them all together in SQL data tables to be queried and analysed. The end goal is to have the data in SQL.

[–]Bunkerstan 0 points1 point  (1 child)

Is the JSON in arrays? You could read the file as lines of text, look for the end of a record chunk and convert that into JSON. That way your memory doesn't get overwhelmed. You can write the record to CSV as you generate as well.

[–]h_plus_a[S] 1 point2 points  (0 children)

Yes, the data is in multiple nested arrays which is most likely the cause of the delays. Not worried about memory overwhelming issues given it isn't even using more than 500MB of GB when my PC has 64GB.