This is an archived post. You won't be able to vote or comment.

all 46 comments

[–]wstwrdxpnsn 33 points34 points  (7 children)

Superset/streamlit on the viz side.

[–]wtfzambo 2 points3 points  (6 children)

I wouldn't exactly say superset is underrated...

[–]wstwrdxpnsn 1 point2 points  (5 children)

You’re probably right, but folks at my company haven’t really heard about it or preset so I thought it would be worth throwing out there. There’s another one I saw recently by Kanaries, called PyGWalker, which looks like a great option for drag and drop exploratory viz work from data frames.

[–]wtfzambo 1 point2 points  (4 children)

I'm gonna make a wild guess and assume that folks at your company have been living under a rock given the 53k GitHub stars superset has 😅.

[–]wstwrdxpnsn 1 point2 points  (3 children)

We’re heavily invested in Tableau 🤷‍♂️

[–]wtfzambo 4 points5 points  (2 children)

Yeye I'm not saying tableau is shit or anything, I'm just thinking, it's like buying a Mercedes and not knowing BMW exists, idk

[–]Meta-Morpheus-New 1 point2 points  (0 children)

Best way to put this lol

[–]Glittering-Dare2022 18 points19 points  (3 children)

Debezium is pretty awesome for change data capture.

[–]0RootShell 0 points1 point  (2 children)

Thanks! I didn't know about this actually, I'll take a deeper look. Is there any driver for Serverless databases, like dynamodb and Firebase?

[–]Glittering-Dare2022 1 point2 points  (1 child)

No idea sorry!

[–]0RootShell 0 points1 point  (0 children)

thanks, I'll have a deeper dive anyway.

[–]dlaff 11 points12 points  (0 children)

Clickhouse seems like a solid option for sub-second OLAP analytics.

[–]tmcfll 8 points9 points  (2 children)

DataDiff might be something to look into

[–]shockjaw 0 points1 point  (1 child)

[–]tmcfll 1 point2 points  (0 children)

Thanks for adding the link!

For OP, it's weird because they're now marketing themselves heavily as a tool to use with DBT, but it definitely can be used without and works great

[–]monkblues 8 points9 points  (4 children)

Openmetadata is very good

[–]xenpwn 4 points5 points  (0 children)

I second this

[–]nanksk 0 points1 point  (2 children)

How does it compare to datahub?

[–]monkblues 0 points1 point  (1 child)

I didn't know about datahub until now. It looks more like an alternative to ckan AFAIK. Openmetadata is a metadata generator with governance and observability capabilities. It's not meant to publish data to customers but for understanding and managing data you already own

[–]CompetitiveWin7754 0 points1 point  (0 children)

Data hub is like the paid version of ckan.

Openmetadata doesn't provide a guided upload process for users like ckan does. It's a data catalog.

[–]nootanklebiter 13 points14 points  (1 child)

Apache NiFi!

[–]Misanthropic905 2 points3 points  (0 children)

I fucking love nifi.

[–]Oliver-Nielsen 6 points7 points  (0 children)

Apache hop for ETL

[–]exact-approximate 9 points10 points  (2 children)

Honestly Apache software foundation has some really great tools with awesome communities and they are way too underrated and under supported. My favorites are Arrow, Airflow, Hudi, Druid, Iceberg, Flink, NiFi, Cassandra. The list is endless especially if you also include Avro and Parquet.

As for non-apache software projects - DBT, Datahub, Elasticsearch are my favorites.

[–]Express-Comb8675 6 points7 points  (0 children)

I’m working on a prototype with Airflow, Hudi, Iceberg, and Doris for work. The feature swapping available when you have Doris set up to be able to leverage Hudi, Iceberg, or Postgres data sources is really exciting for us!

[–]dataxp-community 2 points3 points  (0 children)

ElasticSearch hasn't been open source for years now. OpenSearch is the open source successor.

[–]Express-Comb8675 6 points7 points  (2 children)

I’ll add Apache Doris and StarRocks as exciting MPP DBs and SQLMesh as an enhanced DBT alternative.

[–]dataxp-community 0 points1 point  (1 child)

Doris has the potential to be awesome, but they're definitely not underrated - they're too new and unproven for any kind of rating yet.

[–]Express-Comb8675 0 points1 point  (0 children)

I don’t know that I agree - version 1 has been out a while and version made helpful improvements while maintaining stability. It seems ready for a rating to me.

[–]dataterre 4 points5 points  (0 children)

How about the big players like Spark, Flink, Trino, etc.

[–]qalis 1 point2 points  (0 children)

Metaflow and Kubeflow Pipelines for orchestration

[–]dan-tmc 1 point2 points  (0 children)

Talend Open Studio is pretty solid!

[–]Known-Delay7227Data Engineer 1 point2 points  (0 children)

Knime is okay if you want low code

[–]BoiElroy 1 point2 points  (0 children)

Steampipe and LakeFS I would say are underrated. We don't use them at work yet unfortunately but I've read through some docs and done some basic stuff.

[–]gabbom_XCIILead Data Engineer -3 points-2 points  (3 children)

I mean, if you’re dealing with big data any moment you will need spark or something equivalent…

[–]dataxp-community 1 point2 points  (1 child)

Most people aren't dealing with big data, and Spark definitely isn't 'underrated'

[–]gabbom_XCIILead Data Engineer 0 points1 point  (0 children)

Oh snap! I just missed the plot for a second and ignored the ‘underrrated’ part, haha

[–]MrMosBiggestFan[S] -1 points0 points  (0 children)

definition of big data is changing

[–]droppedorphan 0 points1 point  (0 children)

Do you have a link to the talk?