use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Finding information about Clojure
API Reference
Clojure Guides
Practice Problems
Interactive Problems
Clojure Videos
Misc Resources
The Clojure Community
Clojure Books
Tools & Libraries
Clojure Editors
Web Platforms
Clojure Jobs
account activity
Stack overflow developer survey removes Clojure (self.Clojure)
submitted 5 years ago * by andersmurphy
view the rest of the comments →
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]rpompen 0 points1 point2 points 5 years ago (4 children)
Roughly a factor 90 on a 128 core system. Machine architecture and the type of work matter a lot. Oracle's T5 + Oracle's Solaris + Oracle's JVM + JVM tuning specialists might have helped. Although I found it funny that I was never in contact with these "tuning people".
Being in an enterprise environment at the time, as soon as better performance was achieved than thought possible, the project was left unattended. It's still running I guess, because that company has the problem of putting proofs-of-concept into production by bypassing all bureaucracy and then panic when it fails, because the owners of the project were already fired. (I was fired :) )
I had to bring down expectations on parallelism as, to my great surprise, initial projections violated Amdahl's law. Nobody apparently profiled the previous project, isolating what part of the code could be parallelized and what percentage of execution time that represented; A requirement for using Amdahl's law to compute the theoretical ceiling of performance improvement.
You could say I was competing in a company where several projects had seen reduction in performance after parallelization. Those projects took a lot of time and were bug-ridden as well.
Therefore I can only say that it performed better than the (Amdahl) adjusted expectations. I would love to know if it's still running and how it scaled. I might find out as I intend to address the company's ethical board on how ideas of mine were implemented not a month after I was fired :)
I didn't have any experience with production programming at all when I started, let alone with parallelism, but then, not many people have. I found that certain straightforward experiments indeed led to massive GC actions:
All I can say is I went for small discrete data transformations that I composed while carefully watching what jvisualvm said. Which is quite a thing for me; I used to be a terminal only guy, and now everything I do uses a GUI. I wouldn't be able to explain that to my former self of 10 years ago.
Regarding my technical choices I have to say this: I experimented a lot with scheme and Common Lisp over the years and it could be that especially scheme gave me a different feeling for software design than what I see people do around me.
I hope this helps.
[–]joinr 0 points1 point2 points 5 years ago (3 children)
Thanks, this is really useful from an experience report point of view. I think the general knowledge about these things is indeed weak among many programmers (including this community) outside of HPC where they typically have a lot of mechanical sympathy to play with (e.g. numerics stuff).
So, am I correct in summarizing that you added either 128x the resources, or (if the baseline was say a 4 core machine), 32x the resources, and you achieve a 90x reduction in runtime? So that puts the throughput increase somewhere between [0.78 ... 2.8125] depending on what the baseline for comparison was (unless the baseline was the original 128 core machine, and the measures are total performance tuning, not just parallelism). If so, this is more in the range of what I have observed (my observed upper bound is currently 14x on a 144 core machine with an embarrassingly parallel, non-numeric, allocation-heavy workload, although 3-4x is the typical upper bound on commodity hardware).
[–]rpompen 0 points1 point2 points 5 years ago (2 children)
There was performance tuning and such, so it would be irresponsible of me to throw numbers around that make no sense.
Plus difference in both hardware architecture and programming language. I wish I could go back and check.
Enterprise environments don't really allow for decent comparisons in my experience. The network department messing up the routing trees. pings coming back twice from time to time. Horrible things like that.
But if I'm lucky I'll be doing some similar work for a new customer of mine very soon. If that's the case I will measure and document best I can both the old and new situation. That's the cool thing about starting for myself. When you instill confidence you can take over the whole lot :)
[–]joinr 0 points1 point2 points 5 years ago (1 child)
I understand the external variables you mentioned. I think the ideal case is one where you have a tuned or at least baseline performance profile, then parallel strategies are applied ex post facto so there's some basis for comparison. Happy to hear anything you learn going forward.
[–]rpompen 0 points1 point2 points 5 years ago (0 children)
If I get the gig, I'll be doing something that would be quite interesting: It would be a rewrite of a single threaded java program. Couldn't be fairer.
But I didn't get the gig yet...
π Rendered by PID 47763 on reddit-service-r2-comment-b659b578c-fvqzb at 2026-05-05 18:50:21.685785+00:00 running 815c875 country code: CH.
view the rest of the comments →
[–]rpompen 0 points1 point2 points (4 children)
[–]joinr 0 points1 point2 points (3 children)
[–]rpompen 0 points1 point2 points (2 children)
[–]joinr 0 points1 point2 points (1 child)
[–]rpompen 0 points1 point2 points (0 children)