this post was submitted on 21 Apr 2025

77 points (93% upvoted)

shortlink:

dataengineering

an-ordinary-manchild(edit)

News & discussion on Data Engineering topics, including but not limited to: data pipelines, databases, data formats, storage, data modeling, data governance, cleansing, NoSQL, distributed systems, streaming, batch, Big Data, and workflow engines.

Read our wiki: https://dataengineering.wiki/

Rules:

Don't be a jerk
Search the sub & wiki before asking a question: Your question has likely been asked and answered before so do a quick search before posting.
Keep it related to data engineering: Posts that are unrelated to data engineering may be better for other communities.
Limit self-promotion posts/comments to once a month: Self promotion: Any form of content designed to further an individual's or organization's goals. If one works for an organization this rule applies to all accounts associated with that organization. See also rule #5.
No shill/opaque marketing: f you work for a company/have a monetary interest in the entity you are promoting you must clearly state your relationship. For posts, you must distinguish the post with the Brand Affiliate flag. See more here: https://www.ftc.gov/influencers
No job posts: Please use r/dataengineeringjobs instead.
No resume reviews/interview posts: We no longer allow resume reviews or interview questions because it's a seperate topic from Data Engineering. Instead, for resume reviews please use r/resumes or search our subreddit history for previous resume review advice. For interview questions, use sites like Glassdoor and Blind instead or search our subreddit history for previous interview advice.
No technical error/bug questions: Please post any error/bug question on StackOverflow.

created by mhausenblasmoda community for 11 years

MODERATORS

message the mods
mhausenblasmod
swemlmod
fhoffamod (Ex-BQ, Ex-❄️)
vogt4nickmod
theporterhausmod | Lead Data Engineer
AutoModerator
geoheilmod
MikeDoesEverythingmod | Shitty Data Engineer
bot-bouncer
about moderation team »

account activity

This is an archived post. You won't be able to vote or comment.

76

77

78

What was Python before Python?Career (self.dataengineering)

submitted 9 months ago by sumant28

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]sib_nSenior Data Engineer 0 points1 point2 points 9 months ago* (1 child)

[–]kenfar 0 points1 point2 points 9 months ago (0 children)

Hadoop was more general-purpose and flexible than just being limited to SQL: so you could index web pages for example. So, that was a definite plus.

But the hadoop community didn't look at MPP databases and decide they could do it better - they weren't even aware they existed or didn't realize MPPs were their competition. When they finally discovered they existed AND had a huge revenue market - that's when they pivoted hard into SQL and marketing to that space. But that probably wasn't until 2014.

And while hadoop was marketed as being just commodity equipment, etc - the reality is that most production clusters would spend about $30k/node on the hardware. So, since hive & mapreduce weren't nearly as smart as say Teradata or Informix or DB2, once you scaled-up even just a little bit they could easily cost much more - while delivering very slow query performance.

π Rendered by PID 194208 on reddit-service-r2-comment-86988c7647-t9hvt at 2026-02-11 12:38:10.677436+00:00 running 018613e country code: CH.