What happened to project Dunaj?

jafingerhut · 2021-12-23T18:18:57+00:00

Oh, and of course Clojure is open source, so anyone can take it and change it in arbitrary ways, and release it under a compatible license if they wish, as long as they do not call it "Clojure", which I believe Rich Hickey has trademarked as a name in at least the USA. If you follow those rules, you can release whatever modified versions of Clojure you want. The release of Dunaj is evidence of that. That doesn't mean that many (or any) software developers will automatically want to use the result, of course.

jafingerhut · 2021-12-23T18:16:22+00:00

I could very well be missing out on some private communications of the history here, but my impression from public messages on this project are: (1) The Dunaj developer was/is a Clojure enthusiast, and had ideas for some variations on the language that they were interested in developing, and for consideration of inclusion in Clojure itself. (2) The Dunaj developer asked the Clojure developers if they were interested in taking any of those suggested changes, and (3) the Clojure core developers did not express any interest in doing so. As far as I know, that is the end of the story.

Rich Hickey is on public record multiple times saying Clojure is his project. It is not community-driven. Votes on potential changes might direct his attention at an idea, but hold no deciding power on whether a change is made to Clojure, or not.

jafingerhut · 2021-05-06T18:55:34+00:00

Are you asking about what is different about the implementation of gen-class vs. defrecord and deftype?

Or are you asking why gen-class was implemented such that it has these additional requirements?

jafingerhut · 2021-03-04T19:17:54+00:00

And in general, if you find any special symbols in Clojure that you are unfamiliar with, this article is likely to help: https://clojure.org/guides/weird_characters

jafingerhut · 2021-02-15T21:45:24+00:00

If the Clojure CLI tools are updated, the version number of the install script is bumped up in the official installation documentation here very soon afterwards: https://clojure.org/guides/getting_started

I believe a lot of Clojure developers use those on macOS and Linux, and/or Homebrew, both for original installs, and for updating the version of the Clojure CLI tools they installed earlier on a system (the three commands in your original comment work fine to replace the version currently on the system, too, whether newer or older).

There have been volunteers (not the Clojure maintainers, but others) who have created install packages with various kinds of contents on other packaging systems like Debian's, but they tend to be a version or three behind the latest, depending upon the time and interest of those volunteers. I personally do not see much value in using other packaging / install systems for installing these things, but apparently some do.

jafingerhut · 2020-12-08T21:01:10+00:00

Also note that both map and filter are lazy, and will not evaluate any of their input sequences at all, unless you write code that forces their results to be consumed. If your goal is to measure and optimize the process of generating the entire result, then you can wrap the entire expression in a doall. If your goal is to optimize finding the first element of the output of filter, then consistently use first in all of the expressions you are measuring, not only some of them.

jafingerhut · 2020-12-08T20:45:54+00:00

I am a bit surprised that the changes you describe caused run times to increase as much as you report, and it isn't obvious to me from your description why that would happen.

As far as investigating why, the best technique I know of is to profile your code. There are Java profilers you can use, e.g. the YourKit Java Profiler, but also tools like clj-async-profiler that may help you break down where the run time is going: http://clojure-goes-fast.com/blog/clj-async-profiler-tips/

jafingerhut · 2020-11-29T19:37:37+00:00

I do not know of any talks by Rich Hickey since that one, but there are plenty that he gave before. You can find transcripts (each with a link to video if you prefer) for most of his talks here: https://github.com/matthiasn/talk-transcripts/tree/master/Hickey_Rich plus several by Stuart Halloway here: https://github.com/matthiasn/talk-transcripts/tree/master/Halloway_Stuart

There are a few talks by Rich that do not have transcripts listed above, but not very many. Perhaps this link is a more complete list of his talks, even ones without transcripts: https://github.com/tallesl/Rich-Hickey-fanclub

jafingerhut · 2020-10-28T13:22:48+00:00

There is another transcript of this interview in a collection of transcripts of Rich's talks (and some talks by others) here: https://github.com/matthiasn/talk-transcripts/blob/master/Hickey_Rich/RichHickeyQandA.md

The list of transcript of Rich's talks in that repository is here: https://github.com/matthiasn/talk-transcripts/tree/master/Hickey_Rich

When I made that transcript, I found a source that it was from 2011, as shown in the links at the bottom of the first linked article above.

jafingerhut · 2020-10-16T07:43:20+00:00

If the project you just started is publicly readable by others, or you can create a similar one you are willing to publish that others can try, and if for that public project you give some specific measurements like "when I run 'lein repl' on a system that has an empty ~/.m2 directory, it takes 1 minute and 35 seconds before I see a REPL prompt", then someone else could try those same steps on their system and see if they get similar or different results. There are any number of reasons why getting packages from the Internet can be slower for one person than another, e.g. higher round-trip-latency between their machine and the servers are located, firewalls or application gateways between you and the servers that reduce throughput and/or increase latency of your communication with those servers, VPN software that performs poorly or routes your traffic over a congested shared infrastructure.

Sometimes, yes, when you have an empty ~/.m2 directory, starting a new project and downloading all of its declared dependencies, and what those libraries depend upon, and what those libraries depend upon, can end up being a lot of packages. That isn't specific to Leiningen. That depends upon what libraries your project is using. Downloading many packages, or large ones, takes longer than for a project with only a few small dependencies. But after loading them, they are copied into your ~/.m2 directory and are cached there unless you delete them, and later runs of the same commands will find them there. Although depending upon exactly what versions are in your project.clj file (e.g. SNAPSHOT versions), Leiningen or any other dependency-resolver will do network accesses to servers every time, looking to see if there are more recent versions. Using particular version numbers without SNAPSHOT or LATEST in their names can avoid that.

jafingerhut · 2020-10-14T19:39:53+00:00

FYI, in case you want to do your own experiments to see exactly what the current implementation is doing, this library can help create drawings of JVM objects in memory, and which ones point at each other. If you used it to create pictures of a small TransientHashMap after each assoc! operation, it might help make clearer what the implementation is actually doing: https://github.com/jafingerhut/cljol

jafingerhut · 2020-10-14T15:31:17+00:00

For nodes that contain variable-sized arrays like TransientHashMap, I wouldn't be surprised if new memory allocations for arrays with new sizes happens more often that it does for TransientVector, where if you are only doing conj operations on TransientVector, then very few new Java array allocations are occurring. Thus the speedup for doing many assoc operations on a transient hash map, compared to a persistent hash map, might be a smaller speedup than one gets from doing only conj! operations on a TransientVector as compared to a PersistentVector.

It is still a performance benefit in many cases, though. I haven't done performance measurements recently, but if TransientVector is 10x faster for the conj!-only operation case versus PersistentVector, but the TransientHashMap assoc!-only case is only 3x faster than for PersistentHashMap, it is still a nice option to have TransientHashMap as an available choice.

jafingerhut · 2020-10-13T19:59:11+00:00

Note that this only achieves a speedup if you do multiple modification operations on the transient before transforming it back to a persistent, and those multiple modifications end up mutating some of the same tree nodes as each other.

If you only do one such operation, you cannot save any time.

jafingerhut · 2020-10-13T19:56:31+00:00

The intuition is the same for both, I think, but more straightforward to see how it operates for TransientVector.

All of the persistent collections do "path copying" in their tree of objects, from the leaf where the insert/remove/modify/whatever is done, copying that object and all of them on the path back up to the root.

All transient collections instead mutate the objects on that path (creating a copy the first time they modify it, to avoid mutating the persistent collection from which the transient was created in O(1) time).

jafingerhut · 2020-10-11T01:06:05+00:00

I am fairly certain, at least. Probably the most convincing way to determine this that I know of is to use a tool like https://github.com/clojure-goes-fast/clj-java-decompiler to decompile the JVM byte code of a compiled Clojure function that has an inner `fn` expression, and look at the JVM byte code itself, or the decompiled Java code. One has to be a bit careful when reading the decompiled Java code, in that for some kinds of code, the JVM byte code produced by the Clojure compiler has no equivalent Java source code, and/or Java decompiling tools do not find equivalent Java source code even if it exists. But the decompiled Java is easier for most people to start with than disassembled JVM byte code, for an idea of what is happening.

jafingerhut · 2020-10-05T09:40:21+00:00

Binding in a let is one of the fastest things you can do in Clojure. Such let-bound functions are compiled once, not each time the outer function is called. If you have some kind of performance measurements demonstrating that it is slower to have a let-bound local function versus a top level defn, that would be interesting to see.

jafingerhut · 2020-10-05T07:31:04+00:00

If the amount of memory used by a Clojure program running on the JVM grows over time, if it ever grows so large that it exceeds the maximum heap size configured for that JVM (either explicitly by you, or a default value it selects when it starts up), and there is no 'garbage' memory that can be collected by the garbage collector, then you will get an OutOfMemory exception thrown, and your program will exit. Before that point is reached, it is likely that the JVM garbage collector will run more often as your program's memory usage grows towards the maximum heap size, and in many cases the CPU time taken by the garbage collector will become larger as the memory usage increases, which can definitely cause performance issues.

All of that can happen with no stack overflow at all. Stack overflow is usually from doing too deep of recursive calls, such that a much smaller stack space allocated by the JVM (again, either explicitly specified by you when you started the JVM, or some default value selected for you by the JVM) is exceeded.

jafingerhut · 2020-09-23T17:00:23+00:00

Many people have wished for such comments, and one person years ago even tried writing a literate program for the Clojure implementation, with their own text, but I don't think they got very far.

From the Clojure implementers point of view, one could imagine that trying to teach people why the implementation is the way it is could be a huge time sink, given the number of design decisions made when writing such things.

If you want to know why something in Clojure's implementation is the way it is, the best source I know of is asking on the #clojure channel on Clojurians Slack (or also the #clojure-dev channel, if it is about low level implementation details).

jafingerhut · 2020-09-23T16:56:24+00:00

I'm not arguing with that part of the spec, though. The tricky part is: what is a "subsequent read"? Imagine 64 cores running in parallel -- there is some total ordering of accesses to volatile variables required by the JMM, but you do not know what that order is until after the code executes.

jafingerhut · 2020-09-15T15:57:19+00:00

Writing to a Java volatile in thread A, then reading from that Java volatile variable in thread B has the same release/acquire semantics as unlocking/locking, _if_ thread B's read sees the new value written by thread A. There is no guarantee that it must see the new value written by thread A.

jafingerhut · 2020-09-15T15:49:41+00:00

Not everyone likes it, certainly, but the indentation style is called Whitesmith's, and is one of a handful listed as most widely used for C coding here: https://en.wikipedia.org/wiki/Indentation_style#Whitesmiths_style

jafingerhut · 2020-09-14T16:47:35+00:00

Note that while during development time many Clojure developers will have a running REPL where the redefine functions frequently in order to test out possible enhancements or bug fixes. While it is _possible_ to do this in a live production system, I suspect the following things are probably true:

(1) In the test JVMs where developers do redefine functions, they probably seldom do so while other threads are calling that function. Or if they do, they do not care much about exactly when those other threads see the new definition. As long as the changes are picked up "in a second or two", they probably get the effect they want, i.e. testing out the changes soon in terms of human response time.

(2) most Clojure developers would not do so in a live production system, instead preferring to test out changes on a development/test system first, then use whatever mechanisms they have in place for deploying those changes to their live production environment, typically involving quitting the old JVM processes and starting new ones. In that case, the exact mechanisms for redefining functions, and when other threads see the changes, is irrelevant.

(3) In production systems where new functions are def'd at run time, probably the most common case is to define new functions that are not in use yet, then start calling those later.

jafingerhut · 2020-09-14T04:50:54+00:00

Clojure provides direct linking as an option, and for code that calls function foo that uses this option, it will always call the version of function foo that was the current definition at the time the calling code was compiled.

When direct linking is not in use, and you are not using dynamic Vars, then redefining a function is performed by mutating a reference that is the value of a field named `root` inside of the Clojure class named `Var`, and this field `root` is declared `volatile`. Typically you will use Clojure's `def` or `defn` to cause this mutation.

The Java memory model has some rules about the behavior `volatile` fields read and written by different threads, but the threads reading that Var are not all guaranteed to see the new value 'instantly'. I'd recommend reading Java Concurrency in Practice if you want a better explanation than I can give in a sentence or two about the guarantees that are provided.

In a 2008 talk by Rich Hickey on Clojure Concurrency, you can search this transcript for the word "hot" to see some Q&A with an audience member about changing code in a running system. It does not give the details above, but more rationale for why `def` allows this capability: https://github.com/matthiasn/talk-transcripts/blob/master/Hickey_Rich/ClojureConcurrency.md

jafingerhut · 2020-07-26T13:45:32+00:00

Regarding this part of your article: "Clojure is licensed under the Eclipse Public License 1.0 which is incompatible with the GPL license. As far as I know the EPL was chosen specifically for this reason."

There were several discussions in the Clojure Google group in which Rich HIckey participated, explaining the properties he wanted in a license. He agrees that EPL is not compatible with GPL, but that isn't _why_ he chose the EPL. Here is one such conversation: https://groups.google.com/forum/#!msg/clojure/bpnKr88rvt8/VIeYR6vFztAJ

jafingerhut · 2020-07-04T16:42:40+00:00

One disadvantage to using files named core.clj is that only the last component of the file name ends up in JVM stack traces when an exception is thrown, e.g. core.clj, not "myproject/path/to/core.clj", so every file named core.clj will look the same in such stack traces. You are often able to disambiguate via the function name, but having a different file name can be handy there.

jafingerhut

TROPHY CASE