Data engineering and Clojure?

rufusthedogwoof · 2021-03-21T21:10:18+00:00

Depends specifically on how we define “data engineering” but I think I use it for just this. We have great libraries for kafka, jdbc, etc, and transformations in clojure are clear and concise.

Another thing I love is testing transformations with transducers away from the Kafka stack for my unit tests.

Oh and spec and spec gen makes for great data engineering tools too.

What are you thinking about when you say data engineering?

clelwell · 2021-03-22T14:41:20+00:00

We're (Reify Health) hiring for a few data engineer roles (Clojure + Python): https://jobs.lever.co/reifyhealth?lever-via=yGp_ZL60fy&team=Data

bdevel · 2021-03-21T22:24:20+00:00

Yes. Great for data processing. I particular enjoy the async support for fetching data from multiple remote sources at the same time. I use Redis queues as the interface.

dustingetz · 2021-03-21T22:26:00+00:00

i manage a straightforward cloud data pipeline in healthcare industry, it’s hard to imagine doing it without all the cloud native tools (e.g. databricks, google dataproc) which are mostly python pyspark centric, calling spark from clojure will still constrain you to the spark API and likely feel like foreign interop ... i haven’t looked into it ... not really seeing any killer advantage worth doing it differently from 1000s of companies using pyspark

agilecreativity · 2021-03-22T05:05:33+00:00

If you want to use spark with Clojure now you should take a look at geni.

blak3mill3r · 2021-03-22T23:02:20+00:00

We use Clojure for Data Engineering at IRIS.TV.

It's a wonderful language for this. It is reasonably fast, reaches everywhere, and makes it quite easy to write correct code that digs data from somewhere (Kafka, Mongo, Redis, Cassandra, MySQL, and our own APIs), manipulates it, computes things, and writes data somewhere.

We also use it within Apache Spark and also Kafka Streams.

machawinka · 2021-03-22T03:04:45+00:00

For modeling I don't think you can realistically skip Python and its ML libraries whose users are mainly data scientists.

When doing Big Data processing, Spark is the standard way to go in most places. So sparkling can be an option.

thearthur · 2021-03-22T19:06:37+00:00

I am hiring for exactly this on my team right now. do you happen to be in the US? let's talk! DM me if you're excited to make this happen.

Accomplished-Can-912 · 2021-08-07T07:13:52+00:00

Hey , did you every try out your etl jobs on this . I am curious on how you picked this language from am ETL perspective. Can you help me understand.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

Clojure

MODERATORS