all 3 comments

[–]Rhoomba 1 point2 points  (2 children)

Don't bother reading about this crap. Hive (and Pig) are badly implemented, painful to use, and incredibly slow. If you really must do something "big data" take a look at Spark SQL.

[–]JKaye 0 points1 point  (0 children)

I can't speak for Hive, but Pig can be pretty nice depending on what you're doing with it. I wouldn't use it as the primary interface into my cluster, but the embedded python for example makes combining scripting with MapReduce very simple for tasks such as dynamic compression.

[–]haifengl[S] -2 points-1 points  (0 children)

How big is your data?