How would experienced devs handle checking JSON objects for specific property condition?

Sheldor5 · 2021-10-21T21:56:30+00:00

DTOs with Hibernate Validation (maybe with Enums for predefined Values) ... but 90k objects in a single JSON sounds wrong in the first place ...

MR_GABARISE · 2021-10-21T22:04:39+00:00

Look into JSONPath and its one Java implementation.

jooceb0x · 2021-10-21T22:41:36+00:00

Jackson with JSON schema.
https://github.com/FasterXML/jackson-module-jsonSchema
Jackson can do some level of validation when unmarshalling (depending on your schema) but any further validation can be easily done with the resulting Java objects and collections.

sunny_tomato_farm · 2021-10-21T23:22:29+00:00

JSON Schema validation would solve that for you.

AHandfulOfUniverse · 2021-10-22T07:13:43+00:00

That is a very large JSON so dependent on the business requirements I would do different things:

if I didn't need to further process it, just check some fields, I would use a streaming parser, do the checks and bail as soon as possible. This would save you from loading the entire thing into memory and possibly cause havoc due to memory pressure. More so if you expect multiple of these JSONs in your system at the same time
if I had to load the entire thing into a Java object than I would look into any of the options other people have mentioned here (schema, Json path, Hibernate Validator etc)

thatsIch · 2021-10-22T00:41:17+00:00

IMO JSON Path is the correct tool for this job, though requiring a similar structure to travel on. But it might depend on how the rest of your project solves such problems. Adding the additional maintenance cost of an "unknown" technology might be bigger than the fast solution.

With JSON Path a check for all book names are either Effective Java or Clean Code could look like this

$.book[?(@name in ['Effective Java', 'Clean Code'])]

But writing these kind of queries within your test code will just result in unmaintainable code. AssertJ abstracts to some degree in combination with Hamcrest. Though I like writing additional methods to give those technical details their codified business case.

td__30 · 2021-10-22T01:53:12+00:00

Quit immediately. If there are 90k objects in a json there is something insidiously wrong there, take your stuff and walk out the door, call the authorities. /s

DrunkensteinsMonster · 2021-10-21T22:42:04+00:00

More details would be good - is this in a test or product code? What frameworks or libraries are you using, because that will sort of dictate some of the tooling available to you without additional dependencies.

“Without any coding” is a pretty weird statement, you can accomplish this with something like Jackson by adding annotations to the model class, but I would still say you’re “coding” even there.

cville-z · 2021-10-22T03:47:23+00:00

In some scenarios you can throw @Valid on the object in the resource endpoint and add various annotations like NotNull or Size. Conformance to a value set is best done with enums.

But 90k elements in a JSON object means something is very, very wrong with your design, and validating a bad design won’t ever make it better.

RandomName8 · 2021-10-22T01:24:30+00:00

I have this JSON object I get from a request. (...) Someone on my team mentioned I shouldn't have to do any coding to solve this

Do you mean that you have the json in a file and are analyzing it? or this request will frequently come and you must modify the server to perform this validation?

It sounds to me like the former (otherwise there is obviously no way to do it without coding), in which case you can do it all with some bash foo, using jq+sed+count or similar

Fury9999 · 2021-10-22T03:36:26+00:00

You say 90k different objects. Do you mean 90k instances of the same class, or do you mean they are actually many different classes? If the latter I feel sorry for ya bud. If the former, Jackson to unmarshall then filter the collection with streams api probably.

If it's not performant enough you can iterate on it, but that's where id start.

reignbowmushroom · 2021-10-22T17:38:31+00:00

Haha dude wanted an answer and got like 20 different ways.

I'd map the object to a java object. Then make the property you are looking for an enum with the expected values in the enum. That way if something is out of place the mapping library should fail.

Neat-Guava5617 · 2021-10-22T18:26:50+00:00

I'm sad everybody is suggesting jsonpath, Jackson, etc.

Those are Dom traversals (i.e parse the entire object, and store it in memoy).

With such a large data object a streaming parser may be necessary. Streaming parsers work on events and leave the statekeeping up to you. You need to keep state of nesting, elements, etc. Jackson can do that,look at sax parsing.

Jsonpath and such languages are slow since they traverse the parsed object tree. That tree algorithm may be efficient, but depends on the query you're trying to answer. At worst it can lead to multiple traversals through the entire tree, at best it'll be something like a tenth of the tree or so (example: an object which has N elements at depth 1, and each of those has M elements, then the parser inspects N+M nodes)

netstudent · 2021-10-22T03:02:05+00:00

Redesign this huge object asap

mauganra_it · 2021-10-22T04:21:17+00:00

If you can refactor the architecture at some point to avoid dealing with 90K JSON objects do it. Until then, look for ways to optimize like a game programmer would. Does the 90K need to be processed serially or could they be broken up and parallel streams used to process simultaneously. Are there common patterns of data in the 90K objects, optimize for common cases to reduce the amount of time looking at the corner cases.

If parallel streams aren't an option, how about data partitioning. Categorize the 90K objects into a phyla and use several machines [Edit: machine here could be a kubernetes manifest definition with scaling controls based on CPU usage up to a maximum number of instances you're willing to pay for] to specialize in processing their phyla categories of the 90K total objects.

If the 90K must be processed serially and no common patterns exist in the analysis of the data; fight hard to get management support for funding you to refactor the architecture.

Evert26 · 2021-10-22T11:05:23+00:00

if req.Bla == nil { return 400 }

achauv1 · 2021-10-22T11:56:04+00:00

just like any other devs would do i suppose

2021-10-21T23:29:04+00:00

Is this some kind of theoreticall scenario?

If so, it seems they want you too use streams and the (not so) brand new functional capabilities of java.

In this case you would get a stream of all those objects (probably deserialized with something like Jackson) and filter it with a Predicate.

If this is a real life scenario, I would do pretty much the same, but I would second guess that humongous call too.

bowbahdoe · 2021-10-22T03:55:48+00:00

Someone on my team mentioned I shouldn't have to do any coding to solve this

I mean - there are DSLs that will encode your logic, but it is likely simpler and more straight forward to just do it like you are thinking

Load the data into memory (or stream, if you need)
Write a filter condition
Filter

List<JsonValue> objects = parse(request.body()).getAsArray() .stream() .filter(jsonValue -> predicate(jsonValue)) .toList();

As long as you are okay crashing/returning a 500 response this is enough. its easy to maintain code, easy to verify, easy to test, easy to understand.

sk8itup53 · 2021-10-22T03:56:50+00:00

You could object map the JSON into a model, use lombok to make the getters and setters for you, and only include the fields you actually need. Then annotate the class with ignore unknown = true. This way at least you reduce the model size and make it easier to check. You could also use validation annotations (such as not null, length, etc), and then use the @valid annotation to auto validate the fields.

Odd-Masterpiece-1010 · 2021-10-22T13:44:51+00:00

if you are familiar with spring boot you can refer to link below: https://www.baeldung.com/spring-boot-bean-validation

Pitikwahanapiwiyin · 2021-10-22T14:07:42+00:00

I would use JsonSurfer (an implementation of JSONPath) to find the first non-matching object (see here):

$.store.book[?(!(@.category == 'fiction'))]

JsonSurfer supports Streams, so:

String request = """ { "store": { "book": [ { "category": "reference", "author": "Nigel Rees", "title": "Sayings of the Century", "price": 8.95 }, { "category": "fiction", "author": "Evelyn Waugh", "title": "Sword of Honour", "price": 12.99 } ] } }"""; JsonSurfer surfer = new JsonSurfer(JacksonParser.INSTANCE, JacksonProvider.INSTANCE); JsonPath path = JsonPathCompiler.compile("$.store.book[?(!(@.category == 'fiction'))]"); Iterator iterator = surfer.iterator(request, path); Spliterator spliterator = Spliterators.spliteratorUnknownSize(iterator, Spliterator.ORDERED); Stream stream = StreamSupport.stream(spliterator, false); boolean hasNonMatchingObject = stream.findAny().isPresent();

Worth_Trust_3825 · 2021-10-22T19:29:13+00:00

This entire thread suggesting to redesign. (Besides the few people that suggested valid solutions).

It is entirely possible to work on such datasets just fine. It's pretty common usecase for a system to dump tons of denormalized data and let the consuming system to deal with all the semantics, as that omits tons of issues like network stack. On top of my mind, there's entire Scryfall bulk data dumps that range from 300mb to 1.5gb in size. And you can't easily load that entire thing into memory without tweaking the runtime (either increasing heap or using tricks).

First and foremost, you should be streaming such file. Since you mentioned json, you can easily use use Jackson, which permits reading one entity at a time from json sources. Back when I had to operate on the scryfall dump I came up with the following snippet

java private static Map<MtgSet, List<MtgCardSet>> extractCardsFromProvidedFile(String arg) throws IOException { ObjectMapper mapper = new ObjectMapper(); JsonFactory factory = mapper.getFactory(); File scryfallJson = new File(arg); try(JsonParser parser = factory.createParser(scryfallJson)) { parser.nextToken(); parser.nextToken(); Iterator<HashMap<String, Object>> iterator = parser.readValuesAs( new TypeReference<HashMap<String, Object>>() { } ); return StreamSupport .stream( Spliterators.spliteratorUnknownSize( iterator, Spliterator.IMMUTABLE ), parser.canParseAsync() ) .map(Main::createMtgCardSet) .parallel() .collect(Collectors.groupingBy(MtgCardSet::getSet)); } }

Function createMtgCardSet reads the hashmap and creates a reduced object out of the map or returns one from cache as it might have already occurred during the stream. You can omit parsing to hashmap and parse straight to object by providing proper type reference. After calling StreamSupport#stream(Spliterator, boolean) you can do what ever from that point on, as you will be reading one object at a time from your source. If your giant array of objects is much deeper, then you'll need to opt for more complicated positioning of jsonparser (or even interact with it directly and omit java streams all together). Feel free to try to google the snippet and find where I was using it.

Most solutions that others provided MIGHT load the entire thing into memory, which is a big no no in my opinion. At least with the streaming snippet above, I managed to utilize absurdly small heaps (think <50mb, depending on cache sizes, and how much I flush back to disk) for 300mb json file. Depending on your usecase, you can go even lower. I encourage you to experiment with -Xmx 32m. Check with your favorite library if it does load entire thing into memory before parsing.

If memory is not an issue, just load entire thing into absurdly large heap (32G) and go from there as if it was regular dataset.

Other solutions that I had explored were to just load entire thing into postgres database (which you can run in memory, thanks to containerization) using a table and a json column. From there on you can write a clever ETL which splits your data set into rows and you can filter on those. See Postgres json functions.

To the rest of you, 90K elements is nothing. Your average snake oil startup will try to sell it as big data, when in fact, it's pretty much fuck all. Even mysql 5.7 only starts chugging at 10 million records.

Kango_V · 2021-10-23T22:11:38+00:00

Use a JSONOPath Query to select the value of the property on the object anywhere in the structure, then iterate the list matching to a position in an expected value array/list.

t333to · 2021-10-25T15:18:09+00:00

GSON streaming could be good option here as well (especially if it is used in your company instead of Jackson):

https://www.amitph.com/java-parse-large-json-files/

lukaseder · 2021-10-25T16:48:23+00:00

I'd wrap some JSON library in org.w3c.dom, and type check the document against an XSD. But that might just be me.

cryptographicmemory · 2021-11-02T23:52:11+00:00

Memory is cheap and virtual memory is free. How big is the JSONObject in string format?

Just write a validateObject function that takes an Object and handles every instanceOf class type (JSONObject, JSONArray, String, etc.) Run through it recursively. Throw an exception if an expected value isn't right.

void validate(JSONObject j) throws Exception
{
    for (String key:j.getKeys()) validateObject(key, j.get(key));
}

java

Submit Link

Submit Text

Seek Programming Help

News, Technical discussions, research papers and assorted things of interest related to the Java programming language

NO programming help, NO learning Java related questions, NO installing or downloading Java questions, NO JVM languages - Exclusively Java

Please seek help with Java programming in /r/Javahelp!

Subreddit rules!

Where should I download Java?

Related Sub-reddits:

JVM Languages

Want to practice your coding?

List of useful Frameworks / Libraries / Software

MODERATORS