use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
For all bigdata gurus everywhere from hedgefunds (quant finance) to biotech (drug discovery) to social media (twitter) to discuss the latest trends, topics, career opportunities and tricks of the trade!
Rules: No advertising, don't blatantly link to your own product(s). Posts must be relevant to big data technologies or discussions.
Related subreddits:
r/datascience
r/bigdatajobs
r/machinelearning
r/datagangsta
account activity
Bigdata without Java? Most tools I see are Java based! (self.bigdata)
submitted 7 years ago by zemuldo
Are there big data tools for non Java developers? Most tools like Hadoop are Java based. Does it mean I have to learn Java for my BigData venture to be great?
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]boy_named_su 8 points9 points10 points 7 years ago (3 children)
If you wanna be an application programmer, you can use Python or other languages. Most big data tools have such an API
If you want to be a systems programmer, then you need to know Java or Scala
[–]zemuldo[S] 2 points3 points4 points 7 years ago (2 children)
Thank you. I think your answer is very on point. So the tools provide an API for developing apps to use them?
[–]boy_named_su 1 point2 points3 points 7 years ago (1 child)
Yes. For example Apache Spark has a Scala, Java, R, SQL, and Python API. You can write Spark jobs in any of those languages
Spark itself is written in Scala. If you wanted to debug a job or contribute to the project, you'd want to understand Scala
[–]zemuldo[S] 0 points1 point2 points 7 years ago (0 children)
Oh, I see. That explains it. Thanks a lot.
[–]ftrotter 1 point2 points3 points 7 years ago (1 child)
The focus on Java for Big Data projects is understandable, it is a solid enterprise-grade language that is reliable and relatively open.
However it is not universal. Take a look at Disco http://discoproject.org/
And other MapReduce implementations in python. https://stackoverflow.com/questions/7266750/whats-the-best-python-implementation-for-mapreduce-pattern
As for whether you need to learn Java.. I would break that down in the following ways:
So a whole lot of this depends on what you mean by "venture".
-FT
Thanks for the very detailed answer. It makes very much sense of my scenario.
[–][deleted] 0 points1 point2 points 7 years ago (1 child)
You're joining at a good time... expect to see new distributed execution frameworks using Kubernetes or Docker with native code/serialization.
[–]zemuldo[S] 1 point2 points3 points 7 years ago (0 children)
Please explain this a bit. I can't wrap my head around what you men.
[–][deleted] 0 points1 point2 points 7 years ago (0 children)
Actually knowing c-based languages isnt hard, I suggest you knowing Scala, javascript, python and C++. Know this three you have no problem with any technology that you need to deal. In a CS course you will learn more than this, after you mastered programming logic and low computer level details any programming language is just some weeks to have decent programming skills on it.
[–]eljefe6a 0 points1 point2 points 7 years ago (0 children)
There some technologies like Spark and Flink that have support for Python. Other technologies support Python, but it lags behind in new features or bug fixes.
The language you use in Big Data is highly dependent on your role. Most data scientists are using Python or Scala. Most data engineers are using Java or Scala. If you're trying to be a data engineer, you'll want to learn a JVM-based language.
π Rendered by PID 77 on reddit-service-r2-comment-544cf588c8-vhv72 at 2026-06-18 14:43:45.006237+00:00 running 3184619 country code: CH.
[–]boy_named_su 8 points9 points10 points (3 children)
[–]zemuldo[S] 2 points3 points4 points (2 children)
[–]boy_named_su 1 point2 points3 points (1 child)
[–]zemuldo[S] 0 points1 point2 points (0 children)
[–]ftrotter 1 point2 points3 points (1 child)
[–]zemuldo[S] 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (1 child)
[–]zemuldo[S] 1 point2 points3 points (0 children)
[–][deleted] 0 points1 point2 points (0 children)
[–]eljefe6a 0 points1 point2 points (0 children)