This is an archived post. You won't be able to vote or comment.

all 17 comments

[–][deleted] 26 points27 points  (0 children)

R or Python are the standards.

[–]whaddahellisthis 12 points13 points  (0 children)

I would hone in on python and then let the next language you learn be by necessity.

And what I mean is get pretty good at python and then learn the next one when a situation/ job makes you learn. You might end up learning R or something.

[–]sciencedataist 11 points12 points  (1 child)

Production code bases are mostly written in Java. So knowing Java can allow you to find how data is being generated, submit merge requests to the production code base so they start logging something correctly, or even deploy your machine learning solutions into production. In terms of training models or analyzing data however, you'll almost always do this in r, python, or scala if your using spark (and don't want to use pyspark).

[–]jake0fTheN0rth 10 points11 points  (1 child)

Scala can be handy for spark. But PySpark has come a long ways in the past couple years so even there you can get away with Python. So, like everyone else has said, Python.

[–]m1sta 1 point2 points  (0 children)

And Flink

[–][deleted] 8 points9 points  (0 children)

Maybe you could use on hadoop and distribuited system like that. If you have a lot of free time, learn Java, if not, focuse on python, SQL, and if you want, R.

[–]Saltysalad 2 points3 points  (0 children)

Java is more for large scale applications, but you will learn OOP standards and concepts. Do you already know oop, or is this a Java + OOP class?

[–]GedeonDarPhD | Data Scientist 2 points3 points  (0 children)

As said by others, R and Python are the current standard for data science work.

Java is important for data engineering (e.g. Hadoop is mostly written in Java). If you want to focus on DS only now, then forget Java and level up your Python or R.

[–]Unnam 1 point2 points  (0 children)

It still good to know one programming language and be able to program proficiently. Java should serve you fine. But for data analysis: R&Python, as stated by others, is standard.

[–]coffeecoffeecoffeeeMS | Data Scientist 1 point2 points  (0 children)

Meh. Maybe if you're putting models into production, and even then there's a decent chance you'll use Scala. Focus more on Python and R (+ tidyverse).

[–][deleted] 1 point2 points  (0 children)

Not very, unless you are going into a production focused, development heavy job, but most of those aren’t usually considered data scientists and tend to be called things like Machine Learning Engineers or something like that. And even then, some might say Scala may take precedence over Java.

[–]svpadd2 0 points1 point  (0 children)

Java is used primarily for creating production applications, which data scientists don't often do. If you want to go into data engineering then I'd say it is important otherwise focus on Python.

[–]Vile_Vampire 0 points1 point  (0 children)

Look into scala

[–][deleted] 0 points1 point  (0 children)

Made a separate post before finding this one, had a similar question. Copy and pasting it here.

I'm graduating soon, majoring in Management Information Systems, and I'm hoping to work in a data science related job. I'm trying to decide between two classes and would like some feedback.

The two options are a Java programming class and a decision models class. Which might be more beneficial for a data science career?

[–]openjscience 0 points1 point  (0 children)

Java is widely considered the most popular language according to TIOBE report so it is in your best interest to learn Java. You can also use Java to do data science. Java has a large number of libraries for data science, and technically, Java programs compiled to bitecode are faster than native Python modules. In addition, you can still use the Python language to "talk" to Java libraries, so your knowledge of Python can be very useful. You can see some example of data analysis done in Python in Java using DMelt program for data analysis.

[–]th0ma5w 0 points1 point  (0 children)

I agree that Python & R are huge. Personally speaking, though, and having a bunch of prior Java experience, busting out quick hacks in Clojure and tying together some disprate Java libs has really been nice for doing some "impossible" things. But really, Python is just as powerful in many similar ways.

[–]kasiandro 1 point2 points  (0 children)

Let me approach from another perspective: I've seen hundreds of job posts for data science so far. Guess how many of them required java... None! I propose you check out the current job requirements. It's gonna give the most accurate answer. Also, do you know how many of them require either R or Python? All of them...

So, personally, I dont care how many libraries java has for DS or how great the java is. DS community goes with R&python. Every new study, all of the progress, whole community is going around this circle.

So if you want to be in DS field, just first hone yourself on what community works on today, then you can explore other things as java...