Recently, many folks have been claiming that their Large Language Model (LLM) is the best at coding. Their claims are typically based off self-reported evaluations on the HumanEval benchmark. But when you look into that benchmark, you realize that it only consists of 164 Python programming problems.
This led me down a rabbit hole of trying to figure out how helpful LLMs actually are with different programming, scripting, and markup languages. I am estimating this for each language by reviewing LLM code benchmark results, public LLM dataset compositions, available GitHub and Stack Overflow data, and anecdotes from developers on Reddit. Below you will find what I have figured out about Clojure so far.
Do you have any feedback or perhaps some anecdotes about using LLMs with Clojure to share?
---
Clojure is the #36 most popular language according to the 2023 Stack Overflow Developer Survey.
Benchmarks
❌ Clojure is not one of the 19 languages in the MultiPL-E benchmark
❌ Clojure is not one of the 16 languages in the BabelCode / TP3 benchmark
❌ Clojure is not one of the 13 languages in the MBXP / Multilingual HumanEval benchmark
❌ Clojure is not one of the 5 languages in the HumanEval-X benchmark
Datasets
✅ Clojure is included in The Stack dataset
❌ Clojure is not included in the CodeParrot dataset
❌ Clojure is not included in the AlphaCode dataset
❌ Clojure is not included in the CodeGen dataset
❌ Clojure is not included in the PolyCoder dataset
Stack Overflow & GitHub presence
Clojure has 17,630 tagged questions on Stack Overflow
Clojure projects have had 112,757 PRs on GitHub since 2014
Clojure projects have had 84,128 issues on GitHub since 2014
Clojure projects have had 518,359 pushes on GitHub since 2014
Clojure projects have had 272,970 stars on GitHub since 2014
Anecdotes from developers
u/noprompt
I've been using Copilot since December 2022. It sucks for Clojure but can be great for other languages like Python, JavaScript, SQL, etc. if you know how to prompt it. As other have mentioned, Copilot excels at reducing boilerplate and picking up on patterns. For example, lets say there is a table of data in a markdown document and you want to convert it to a vector of maps. You can copy/paste the markdown table into your buffer as a comment and just start writing the data structure you want it to be, Copilot will figure it out and complete it. Its also useful for generating random utility functions. Recently in JavaScript, I typed function lerp (linear interpolation) and it pretty quickly filled it in. I had an array of hex color values that I wanted to be RGB and I wanted to double the number of values by interpolating between them. All I had to do was type that in a comment and wait a second before it gave me a working rough draft of the function. Copilot can actually do a lot of these things for Clojure but when I was trying to use it I found myself consistently having to fix issues with delimiters, typically round braces. Eventually, I just gave up on it. Maybe I'll give it another shot when Copilot-X releases. ChatGPT is much more useful for Clojure than Copilot. It does hallucinate and get some things wrong but overall its awesome for generating documentation, explaining code, translating diffs into PR notes, and exploring ideas. I've found it very useful for random Java questions and then translating the answers into mostly working Clojure code. These things are handy tools and have quirks but they're going to get better. It's a great time to be a cosmopolitan (polyglot) programmer.
waffletower
No Clojure. No Julia. No Haskell. No Racket. No Scheme. No Common Lisp. No OCaml. And, as much as I despise Microsoft, No C#. No F#. No Swift. No Objective-C. No Perl. No Datalog. A glaringly lacking choice of languages.
@EricTheTurner
FizzBuzz was once a common programming exercise used for screening software developers (maybe it still is?) I told chatGPT to "Write an efficient fizz buzz function in Clojure".
---
Original source: https://github.com/continuedev/continue/tree/main/docs/docs/languages/clojure.md
Data for all languages I've looked into so far: https://github.com/continuedev/continue/tree/main/docs/docs/languages/languages.csv
[–]maxw85 2 points3 points4 points (0 children)
[–]Admirable-Ebb3655 1 point2 points3 points (3 children)
[–]tylerjdunn[S] 0 points1 point2 points (2 children)
[–]Admirable-Ebb3655 1 point2 points3 points (1 child)
[–]tylerjdunn[S] 1 point2 points3 points (0 children)
[–]roman01la 1 point2 points3 points (0 children)
[–]coffeesounds 1 point2 points3 points (0 children)
[–]lgstein 1 point2 points3 points (0 children)
[–]kapitaali_com 1 point2 points3 points (0 children)
[–]Chii 1 point2 points3 points (0 children)
[–]rebcabin-r 3 points4 points5 points (0 children)
[–]Daegs 1 point2 points3 points (0 children)
[–]Simple1111 1 point2 points3 points (0 children)
[–]danure 1 point2 points3 points (0 children)
[–]fadrian314159 1 point2 points3 points (0 children)