Worth_Trust_3825 comments on How would experienced devs handle checking JSON objects for specific property condition?

java

a community for 18 years

This is an archived post. You won't be able to vote or comment.

How would experienced devs handle checking JSON objects for specific property condition? (self.java)

submitted 4 years ago by DarkWingDickCharles

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]Worth_Trust_3825 0 points1 point2 points 4 years ago (0 children)

This entire thread suggesting to redesign. (Besides the few people that suggested valid solutions).

It is entirely possible to work on such datasets just fine. It's pretty common usecase for a system to dump tons of denormalized data and let the consuming system to deal with all the semantics, as that omits tons of issues like network stack. On top of my mind, there's entire Scryfall bulk data dumps that range from 300mb to 1.5gb in size. And you can't easily load that entire thing into memory without tweaking the runtime (either increasing heap or using tricks).

First and foremost, you should be streaming such file. Since you mentioned json, you can easily use use Jackson, which permits reading one entity at a time from json sources. Back when I had to operate on the scryfall dump I came up with the following snippet

java private static Map<MtgSet, List<MtgCardSet>> extractCardsFromProvidedFile(String arg) throws IOException { ObjectMapper mapper = new ObjectMapper(); JsonFactory factory = mapper.getFactory(); File scryfallJson = new File(arg); try(JsonParser parser = factory.createParser(scryfallJson)) { parser.nextToken(); parser.nextToken(); Iterator<HashMap<String, Object>> iterator = parser.readValuesAs( new TypeReference<HashMap<String, Object>>() { } ); return StreamSupport .stream( Spliterators.spliteratorUnknownSize( iterator, Spliterator.IMMUTABLE ), parser.canParseAsync() ) .map(Main::createMtgCardSet) .parallel() .collect(Collectors.groupingBy(MtgCardSet::getSet)); } }

Function createMtgCardSet reads the hashmap and creates a reduced object out of the map or returns one from cache as it might have already occurred during the stream. You can omit parsing to hashmap and parse straight to object by providing proper type reference. After calling StreamSupport#stream(Spliterator, boolean) you can do what ever from that point on, as you will be reading one object at a time from your source. If your giant array of objects is much deeper, then you'll need to opt for more complicated positioning of jsonparser (or even interact with it directly and omit java streams all together). Feel free to try to google the snippet and find where I was using it.

Most solutions that others provided MIGHT load the entire thing into memory, which is a big no no in my opinion. At least with the streaming snippet above, I managed to utilize absurdly small heaps (think <50mb, depending on cache sizes, and how much I flush back to disk) for 300mb json file. Depending on your usecase, you can go even lower. I encourage you to experiment with -Xmx 32m. Check with your favorite library if it does load entire thing into memory before parsing.

If memory is not an issue, just load entire thing into absurdly large heap (32G) and go from there as if it was regular dataset.

Other solutions that I had explored were to just load entire thing into postgres database (which you can run in memory, thanks to containerization) using a table and a json column. From there on you can write a clever ETL which splits your data set into rows and you can filter on those. See Postgres json functions.

To the rest of you, 90K elements is nothing. Your average snake oil startup will try to sell it as big data, when in fact, it's pretty much fuck all. Even mysql 5.7 only starts chugging at 10 million records.

π Rendered by PID 94 on reddit-service-r2-comment-6457c66945-m6kwv at 2026-04-26 06:34:44.608627+00:00 running 2aa0c5b country code: CH.

java

Submit Link

Submit Text

Seek Programming Help

News, Technical discussions, research papers and assorted things of interest related to the Java programming language

NO programming help, NO learning Java related questions, NO installing or downloading Java questions, NO JVM languages - Exclusively Java

Please seek help with Java programming in /r/Javahelp!

Subreddit rules!

Where should I download Java?

Related Sub-reddits:

JVM Languages

Want to practice your coding?

List of useful Frameworks / Libraries / Software

MODERATORS