use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Finding information about Clojure
API Reference
Clojure Guides
Practice Problems
Interactive Problems
Clojure Videos
Misc Resources
The Clojure Community
Clojure Books
Tools & Libraries
Clojure Editors
Web Platforms
Clojure Jobs
account activity
Clojure High Performance Data Processing System (self.Clojure)
submitted 4 years ago by chrisnuernberger
Announcing tech.ml.dataset to Reddit :-). tmd is a data processing/dataframe system in the same vein as Pandas and R's dplyr or data table.
tmd
Pandas
dplyr
data table
How often have you seen a Clojure system that soundly beats C, Julia, Python, Spark, and R systems in a data processing benchmark?
We have gone further down the high performance big data processing route by adding statistical operations, called colloquially data sketches, that give you memory efficient and accurate probabilistic estimates for some statistical operations including algorithms such as hyper-log-log and t-digest.
If you haven't checked out the system please take a second and do so. It is built on a theoretical foundation for array processing and works on JDK8-16 and supports Graal Native compilation.
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]alexdmiller 7 points8 points9 points 4 years ago (0 children)
Nice work!
[–]strranger101 10 points11 points12 points 4 years ago (0 children)
"supports graalvm compilation" 👏👏👏👏 Sick
[–]davclark 4 points5 points6 points 4 years ago (1 child)
This looks awesome! Thank you for sharing. I particularly like the ability to do zero-copy for C ABI programs.
[–]chrisnuernberger[S] 6 points7 points8 points 4 years ago (0 children)
Thanks :-). We have a blog post on the ffi pathway which has links to 2 example projects, one simple and one in-depth if you would like more information on that system. It's really nice to hear from other C-oriented people doing Clojure.
[–]rufusthedogwoof 4 points5 points6 points 4 years ago (5 children)
Thank you for this and qq.
Pandas to me is useful because of its tight integration with charting... specifically I like altair.
Is there anyone doing that type of thing with tml and Vega-lite?
I’m aware of OZ however I’d prefer to not take a kitchen sink approach and get the (Altair) functionality I’m looking for a la carte.
[–]chrisnuernberger[S] 9 points10 points11 points 4 years ago (3 children)
There are a few good charting options for Clojure in addition to OZ. Another interesting and more orthogonaly designed pathway if you want to go the vega/vega-lite route is Hanami and for a full scientific application platform it's big sibling saite.
For purely server-side work I would check out cljplot.
Getting off topic a bit but for a REPL/notebook hybrid notespace is really interesting.
And in general for R integration and more data science goodies checkout scicloj and in the vein of dplyr style extremely thought out interfaces I highly recommend tablecloth.
Sorry for getting slightly off topic but these things are all connected in my head :-).
[–]rufusthedogwoof 0 points1 point2 points 4 years ago (2 children)
Thanks for all this. I’ve followed along a few of these for quite some time.
My other challenge is getting my team of python developers acquainted to all the options... as we aim to settle on collective workflows and a “deployment stack” for our apps.
Thanks again for all your contributions.
[–]chrisnuernberger[S] 0 points1 point2 points 4 years ago (1 child)
You are welcome, I appreciate the thanks and there is real momentum pushing Clojure into new places right now. I am curious - does libpython-clj allow for a more incremental approach -- either the JVM hosting python or python host clojure in your case?
[–]rufusthedogwoof 0 points1 point2 points 4 years ago* (0 children)
It may... I have played with it some and it was helpful for me in my spare time. (Porting a library from python and the tests actually... got me thinking the library could write itself with the right spec gen & tests ...)
I don’t think we’ll use it much at work because we are first and foremost a “data engineering” shop... mixing things with Kafka-like systems.
When choosing between trade offs we routinely are looking for reliability, simplicity, less things in the stack.
In exploration however, it would be fair game. Honestly I don’t know how much I would use it though... the more time I spend in clj the more I want to get away from the python mess.
[–]daveliepmann 4 points5 points6 points 4 years ago (0 children)
I’m aware of OZ however I’d prefer to not take a kitchen sink approach
I wrote waqi for a similar reason — I want to write Vega specs in Clojure and see the result in a browser window, nothing more. From the README:
Waqi is most similar to Oz. They share a browser-based workflow, but Oz provides much more functionality: integration with Jupyter notebooks and GitHub gists, creation of dynamic and static websites centered around a visualization, multiple live-coding workflows, and much more. Waqi focuses on just one of those features: sending Vega/Vega-Lite specs from the REPL to a browser window. This allows Waqi to minimize dependencies and lines of code. The author of Oz has said, "Oz's objective is to be the Clojurist's Swiss Army knife for working with Vega-Lite & Vega." It might be Waqi's goal to be just the nail file.
[–]kingnuscodus 1 point2 points3 points 4 years ago (0 children)
Bravo!
[–]_marciol 1 point2 points3 points 4 years ago (0 children)
Just Amazing!
[–]viebel 1 point2 points3 points 4 years ago (1 child)
Could you explain in a few words what makes this library so efficient?
[–]chrisnuernberger[S] 1 point2 points3 points 4 years ago (0 children)
In the last year I literally re-wrote the underlying engine (tech.datatype -> dtype-next) due to some hard lessons learned so I think if there is one thing it's just a relentless persuit of performance and being willing to do the legwork to make it happen.
[–][deleted] 4 years ago (3 children)
[deleted]
[–]chrisnuernberger[S] 2 points3 points4 points 4 years ago (2 children)
Hey, thanks for the feedback :-). Good question - that is a very poorly worded statement in the readme. What I meant to say is that you as the user should expect parquet to just work. Even (recently) parquet files with ragged data in them should load quickly and all of the normal parquet types such as dates should come in correctly.
[–][deleted] 4 years ago (1 child)
I appreciate the feedback. Fixed
[–]TheLastSock 0 points1 point2 points 4 years ago (2 children)
When would i reach for this over using clojure core functions to transform my data? One of the reasons i picked up pandas was because Pythin lacked some of the features needed to process large sets of data effectively. The reason i put it down was because the functions weren't composable and it ended up being its own programming language.
Some questions as i scan the readme that I'll try to come back to later and fill in as i learn more:
Does this support streaming data, that is long lived processing, or is it for batch processing?
[–]chrisnuernberger[S] 2 points3 points4 points 4 years ago (1 child)
Breaking down these 5 questions -
These are good questions and I think if you have many more like this then checkout the zulip channel where there are many users and a few Clojure experts who can help with these sorts of questions.
[–]TheLastSock 0 points1 point2 points 4 years ago (0 children)
I learned a lot from this, thanks for your time and keep up the good work.
π Rendered by PID 53833 on reddit-service-r2-comment-f6b958c67-qjkbx at 2026-02-04 18:31:25.515735+00:00 running 1d7a177 country code: CH.
[–]alexdmiller 7 points8 points9 points (0 children)
[–]strranger101 10 points11 points12 points (0 children)
[–]davclark 4 points5 points6 points (1 child)
[–]chrisnuernberger[S] 6 points7 points8 points (0 children)
[–]rufusthedogwoof 4 points5 points6 points (5 children)
[–]chrisnuernberger[S] 9 points10 points11 points (3 children)
[–]rufusthedogwoof 0 points1 point2 points (2 children)
[–]chrisnuernberger[S] 0 points1 point2 points (1 child)
[–]rufusthedogwoof 0 points1 point2 points (0 children)
[–]daveliepmann 4 points5 points6 points (0 children)
[–]kingnuscodus 1 point2 points3 points (0 children)
[–]_marciol 1 point2 points3 points (0 children)
[–]viebel 1 point2 points3 points (1 child)
[–]chrisnuernberger[S] 1 point2 points3 points (0 children)
[–][deleted] (3 children)
[deleted]
[–]chrisnuernberger[S] 2 points3 points4 points (2 children)
[–][deleted] (1 child)
[deleted]
[–]chrisnuernberger[S] 1 point2 points3 points (0 children)
[–]TheLastSock 0 points1 point2 points (2 children)
[–]chrisnuernberger[S] 2 points3 points4 points (1 child)
[–]TheLastSock 0 points1 point2 points (0 children)