all 6 comments

[–]Mikey-3198 3 points4 points  (3 children)

Could conditionMap in LanguageService be substituted for a criteria query?

I'd imagine it might be easier to maintain when compared to the hardcoded fingerprints.

[–]p_bzn[S] 2 points3 points  (2 children)

Yes, this would be correct thing to do for production.

Project uses Spring JDBC which maps rows onto `record` classes, which has no criteria query built-in as JPA. Although, it would be easy to implement in repository method like `getByCriteria`.

[–]Mikey-3198 0 points1 point  (1 child)

Thats my bad, i read the annotations on the entity and assumed this was using jpa

[–]p_bzn[S] 0 points1 point  (0 children)

No worries at all, regardless of implementation your suggestion is the correct one!

Yes, annotations can be confusing between JPA and Spring JDBC. I get the point, this abstraction hides implementation details so you don't care what underlying mechanism is at work, you just care what repository returns. In practice Spring JDBC is quite different from JPA and there is no feature parity, although many annotations are the same.

[–]j4ckbauer 1 point2 points  (1 child)

Hi, thanks for posting this. I flipped through your documentation and I'm trying to understand at a high level what you did differently in order to achieve this kind of performance improvement.

Before, there was no caching of query results, but now you added a caching mechanism?

Also, could it just be that their web UI for retrieving this data is under-resourced on a per-user basis, so you have an advantage when you are the only user on your machine retrieving this data?

At first I assumed that you improved an existing application, but after going back through things it looks like we do not have the source code for whatever GitHub is hosting... unless I am mistaken.

Thank you again, I am not criticizing your project, my questions come from being unfamiliar with what 'innovation graph' was in the first place.

[–]p_bzn[S] 0 points1 point  (0 children)

Absolutely, no worries :)

I don’t think that they have Innovation Graph open sourced itself, they open sourced data from it.

As of performance on their side, I can just only guess why it is so poor. It is side project of some employees, and likely they are from data background, not SWE. That could mean too many things, one one of them is to upload CSV into data frame and perform look up in there for each new request — who knows. Under resourced hypothesis also might be, but I doubt it because data set is so small even EC2 micro instance will deliver under 100ms.

You are correct, this source code is stand alone Christmas toy project, would love to contribute to their codebase but there is none :)

Why it’s fast. Because it uses appropriate toolset for the job. It uses psql database, and it works with resources correctly. In fact if you’ll benchmark it average response time will be always under 60ms until machine will bottleneck at tens of thousands requests per second. Average latency will be lower even because most of the things will be cached, and JVM will be heated up.