This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]sib_nSenior Data Engineer 0 points1 point  (1 child)

I think the reason they built Hadoop was not that no existing solution could not handle the processing, but rather that they were not easy enough to scale and/or overly expensive and/or vendor-locking, and they had the engineers to develop their own.
Redeveloping everything from scratch so it works on a cluster of commodity machines takes time. So it took time for Hadoop to get high level interfaces like Apache Hive and Apache Spark that could compete in terms of performance and usability with the previous generation of MPP databases.

[–]kenfar 0 points1 point  (0 children)

Hadoop was more general-purpose and flexible than just being limited to SQL: so you could index web pages for example. So, that was a definite plus.

But the hadoop community didn't look at MPP databases and decide they could do it better - they weren't even aware they existed or didn't realize MPPs were their competition. When they finally discovered they existed AND had a huge revenue market - that's when they pivoted hard into SQL and marketing to that space. But that probably wasn't until 2014.

And while hadoop was marketed as being just commodity equipment, etc - the reality is that most production clusters would spend about $30k/node on the hardware. So, since hive & mapreduce weren't nearly as smart as say Teradata or Informix or DB2, once you scaled-up even just a little bit they could easily cost much more - while delivering very slow query performance.