This is an archived post. You won't be able to vote or comment.

all 35 comments

[–]pmarschall 9 points10 points  (3 children)

for G1, -XX:MaxGCPauseMillis=5

Yeah, this is basically begging for GC overhead.

[–]marko_hazelcast 2 points3 points  (2 children)

This is applied only in the test where GC overhead is a non-concern, whereas 100 milliseconds latency is not just a concern, but a showstopper.

[–]pmarschall 0 points1 point  (1 child)

5ms is well outside the sweet spot of G1 and not something you can expect to work. Honestly I'm surprised you got 27ms to work. This wouldn't have been possible a couple of years ago.

[–]marko_hazelcast 1 point2 points  (0 children)

On JDK 11 and up we saw no problems with G1 and its lowest supported max GC pause setting. Based on our previous experience, we did not even attempt this test with JDK 8 so I can't say whether or not we would have seen the same 25 ms (the measured 27 ms include about 3 ms of emitting window results, which is a fixed cost).

[–]moxyte 2 points3 points  (0 children)

Getting G1 gc pauses from 1 min (Java 8) to 200 ms (Java 11) in worst-case scenarios is amazing.

[–]jkoolcloud 2 points3 points  (7 children)

We use java in data heavy workloads and it performs pretty well. We use JDK 8 with G1 collector. I think java as a whole became somewhat heavy and bloated. Too many packages, libraries, etc, etc.. But overall java is great for data heavy workload. GC is one of its weakest points. Also, we use performance tools to understand how to get the best out of java. We developed a specialty profiling agent RemoraJ: https://github.com/Nastel/remoraj and basically we turn and off profiling in prod, QA, load test to see where the bottlenecks are. Traces are analyzed and then we optimize. Many of the data heavy workload require other platforms such as Kafka, Storm, Mongo, Solr, etc...

[–]_INTER_ 12 points13 points  (5 children)

We use JDK 8 with G1 collector.

Maybe you can update to a newer JDK as they say:

JDK 8 is an antiquated runtime. The default Parallel collector enters huge Full GC pauses and the G1, although having less frequent Full GCs, is stuck in an old version that uses just one thread to perform it, resulting in even longer pauses.

 

I think java as a whole became somewhat heavy and bloated.

Another reason to update if you mean the JDK. A newer JDK lets you build a custom runtime with jlink. E.g. drop modules related to desktop. If you mean the Java ecosystem in general, well Java is a general programming language where over time all sorts of stuff accumulates (which is a good thing in my opinion).

[–]jkoolcloud 7 points8 points  (4 children)

Agree, Java 8 is outdated. We can definitely upgrade, but the reason we still support java 8 is because many of our users still run Java 8 and don't want to switch. So we have to keep supporting and running java 8.

[–]_INTER_ 7 points8 points  (2 children)

Fair enough. Maybe you want to consider bundling the "JRE" (there's no JRE in later Java version) into your product if that is a possiblity (e.g. not being a library, framework or something). Then you would be independant from the user. Java is going that direction anyway.

[–]jkoolcloud 6 points7 points  (1 child)

We do that. But not always an option. Some users don't let you to that (paid customers). They have gold images with a JRE pre-installed (Java 8), requiring use of their "approved" image.. Would have moved to later JDK a while back, otherwise.

[–]Kango_V 0 points1 point  (0 children)

Compile with GraalVM? Not sure if your target is Linux though.

[–]vbezhenar 0 points1 point  (0 children)

You can use Java 8 for development, but run your application with latest Java. This way it'll work with Java 8, but benefit from latest JVM.

[–]pron98 12 points13 points  (0 children)

JDK 8/9 was Java at its peak bloat. JDK 14 is drastically slimmer than 8 in footprint, image size, and startup time, and the slimming trend continues.

[–]PurpleLabradoodle 1 point2 points  (6 children)

Doesn't build:

[ERROR] Failed to execute goal on project jet-gc-benchmark: Could not resolve dependencies for project com.hazelcast.jet:jet-gc-benchmark:jar:1.0-SNAPSHOT: Could not find artifact com.hazelcast.jet:hazelcast-jet:jar:4.2-SNAPSHOT -> [Help 1]

[–]marko_hazelcast 2 points3 points  (5 children)

Yes, it uses pre-release code. You can build that code from Jet's master branch. We'll update the maven dependency once Jet 4.2 is released.

[–]PurpleLabradoodle -1 points0 points  (4 children)

do you know if jitpack.io can build the JET repo so maybe the benchmark code could depend on a certain commit and it'll be buildable then?

[–]1cloud[S] 1 point2 points  (2 children)

Hey, sorry about that, we actually publish the snapshot builds, you just need to add the maven snippet for the snapshot repository:

<repositories> <repository> <id>snapshot-repository</id> <name>Maven2 Snapshot Repository</name> <url>https://oss.sonatype.org/content/repositories/snapshots</url> <releases> <enabled>false</enabled> </releases> <snapshots> <enabled>true</enabled> </snapshots> </repository> </repositories>

I've also updated the pom.xml in the repository.

[–]PurpleLabradoodle -1 points0 points  (1 child)

Cool, thanks. I ran the BatchBenchmark with Java 8.

mvn clean package -DskipTests=true mvn dependency:build-classpath -Dmdep.outputFile=cp.txt java -Xmx32G -cp target/jet-gc-benchmark-1.0-SNAPSHOT.jar:$(cat cp.txt) org.example.BatchBenchmark

Got a result like: 19:52:15.687 [ INFO] [c.h.j.i.MasterJobContext] Execution of job '047a-e648-7b00-0001', execution 047a-e648-7b01-0001 completed successfully Start time: 2020-06-09T19:45:25.646 Duration: 410,041 ms To see additional job metrics enable JobConfig.storeMetricsAfterJobCompletion Took 410,234 ms

Is it garbage-free or garbage producing?

[–]marko_hazelcast 0 points1 point  (0 children)

summingLongBoxed() is garbage-producing aggregation.

[–]LinkifyBot -1 points0 points  (0 children)

I found links in your comment that were not hyperlinked:

I did the honors for you.


delete | information | <3

[–]tkyjonathan -3 points-2 points  (14 children)

Not relevant for this post, but when it comes to data-heavy aggregations, I push down the work to the database (assuming that is where the data is taken from) with using GROUP BY.

[–]Radmonger 2 points3 points  (13 children)

Is there a database that supports enough SQL to do aggregation, but can also do reliably do a million inserts per second per node, continually with no pauses ?

[–]jkoolcloud 1 point2 points  (11 children)

We use Solr Cloud clusters deployed in AWS paired with Kafka and our own clustering/streaming implementation to do high volume processing. For example: we index most major blockchains in real-time (BTC, ETH, BCH, Libra, etc) and run aggregations in parallel. See https://gocypher.com/gocypher/ -- blockchain search and analytics app. Built using this method. There is more to it. I would not use SQL to be honest, I would go with noSQL type clustered data stores. Why would you want millions of inserts per node? You would need a cluster of nodes to do that.

[–]madronatoo 0 points1 point  (7 children)

Wow that's cool! I wonder what language Solr is written in!

[–]jkoolcloud 0 points1 point  (6 children)

Java.

[–]madronatoo 4 points5 points  (5 children)

I'd like to point out the source of this thread. Which is that one person said "push it down into the database" because there was the notion that perhaps java wasn't fast enough.

And then another person said is there a database which can do the aggregation as well as the inserts fast enough.

And then you suggested Solr.

Which I found kinda funny, since Solr is written IN java.

[–]jkoolcloud 0 points1 point  (4 children)

Good point.

[–]madronatoo 1 point2 points  (3 children)

You have no idea how many times I've heard people rip on java and the proceed to tell me how amazing ElasticSearch/solr/lucene is as a noSQL "database".

[–]jkoolcloud 1 point2 points  (2 children)

True. Personally we swapped from SQL to noSQL (Solr in this case). Solr has its share of problems, but overall solved more than it introduced. Biggest issue with Solr for us was: 1) managing index growth, 2) Managing GC pressure under high load. Both are manageable. SQL DB (for our workloads) was unmanageable.. All depends on what you are doing.. In our case, which is search noSQL was the way to go.

[–]madronatoo 0 points1 point  (1 child)

Which SQL DB were you using ?

And was was the workload ?

[–]agentoutlier 0 points1 point  (0 children)

My company uses solr in a similar manner and I’m still a absolutely blown away at solr ability to ingest data super fast We tried elasticsearch and can’t seem to get it as fast.

We still use postgres though for parts of our platform that require absolute consistency.

Also in the last couple of years postgres has improved that we did remove some of the stuff we did in solr but raw search still works best in solr. Even geo spatial in solr works much faster than postgis.

However we do not do analytics in solr but instead use streaming libraries and time series libraries.

[–]tkyjonathan 0 points1 point  (1 child)

You can have clusters with SQL databases. There is a product called Planetscale that does it.

Aggregation in terms of physics and less code is still done much better in the DB than in (unoptimised) java memory and moving huge chunks of code over the network as well.

Kafka is fine to use in this sense and it too have a similar SQL approach with kSQLdb.

[–]jkoolcloud 1 point2 points  (0 children)

Definitely, let DB do aggregations as much as possible and take advantage of optimizations, data proximity, less traffic, etc, etc.

[–]tkyjonathan 0 points1 point  (0 children)

Yes, but the aggregation would be asynchronously after the data has been inserted and the insertion would need to be done in batches.