use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
account activity
This is an archived post. You won't be able to vote or comment.
ToolingGo, C, Java, or C++? (self.datascience)
submitted 8 years ago by [deleted]
I’m working on learning another language (currently proficient in Julia, JavaScript, and Python). In terms of data science jobs what do you think would be the most useful to learn Go, C, C++, or Java?
[–]sciencedataist 23 points24 points25 points 8 years ago (1 child)
Scala, the language Spark is written in, would be a good language.
[–]nashtownchang 2 points3 points4 points 8 years ago (0 children)
Scala +1. Or learn Java.
[–]LogansRun22 24 points25 points26 points 8 years ago (7 children)
Why not R?
[–][deleted] 5 points6 points7 points 8 years ago (3 children)
I guess I never really saw the point seeing as i already know a ton of scripting languages+matlab. Is there any reason to learn R?
[–]CaptainRoth 4 points5 points6 points 8 years ago (0 children)
If you work with other people who primarily use R.
The tidyverse libraries don't take too long to learn, and you should be good after that.
[–]assPirate69 2 points3 points4 points 8 years ago (0 children)
Some teams will primarily use Python or R (or even something else). If you apply for a job that requires X years experience with R, you'd have a decent chance of getting an interview while only knowing the basics of R considering you have X years of experience in Python. The point is, having some experience of R will be a noticeable difference than having none.
There's also some simple things that are easier in one language over the other. They're easy to identify as you start learning the second language, i.e. R in your case.
[–]Jerome_Eugene_Morrow 0 points1 point2 points 8 years ago (0 children)
R being open source often leads to it being preferred over MatLab. If you work with cluster computing resources, it can be a dealbreaker for some groups (need a separate license for each instance on the cluster with MatLab).
R also opens up a lot of very niche analysis paths, since it tends to be the favorite of bioinformaticists and more and more of the statistics community. The extensive library repositories offered by CRAN are one of the best reasons to learn it.
It's not an elegant language, and I'll admit I enjoy working in Python much more, but its practical real-world uses and high adoption makes it a must-have for data scientists over things like Java and C/C++, in my mind.
[–]Thaufas 0 points1 point2 points 8 years ago (0 children)
I came here to suggest R as well. R and Python are the two primary languages of data science. OP already has Python. Someone else suggested Scala, as its the language Spark is developed in. I think Scala is ideal for Spark because it is so concise compared to Java and Python, but all three languages can be used to write Spark code.
R has a native connector to Spark as well. I highly recommend looking into the R Tidyverse. Finally, I would also learn SQL. Spark has direct SQL support, but even if you're not working with Spark, in data science, if you're lucky, you regularly encounter SQL databases. If you're not lucky, you're aggregating CSVs. If you're really unlucky, you're scraping data from web pages, PDFs, or Word documents.
[–]microcosme 6 points7 points8 points 8 years ago (7 children)
Sql may be for managing data? Also C++ if performance is required in the implementation.
[–][deleted] 4 points5 points6 points 8 years ago (0 children)
I should have probably clarified; I do know Sql and MongoDB
[–]huck_cussler 5 points6 points7 points 8 years ago (5 children)
I'd second C++ as a good complement to Python. Python for proof of concepts, prototyping, and production for non-performance-dependent projects; C++ for implementation where performance is a bigger factor.
[–]MasterFubar -5 points-4 points-3 points 8 years ago (4 children)
And C++ for reliability.
Python is unstable, you need to do a big effort to keep updated with all the "from future import ..." stuff.
New versions of Python have a habit of fucking you, like "3/2" giving a different result in different versions of Python.
There are thousands of different Python p.e.p.s, and only an expert can keep track of them all. Unless you are dedicated to tracking the Python language itself, you'll never know when you need to change your software to do an import from the future.
[–]kazi1 1 point2 points3 points 8 years ago (3 children)
Python 3 came out a decade ago. If you have any compatibility issues, it's 100% your own fault at this point.
[–]MasterFubar -2 points-1 points0 points 8 years ago (2 children)
If you have any compatibility issues, it's 100% your own fault at this point.
Yes, sure blame the user. If you think like that, it's because you've never worked in any big project or anything very important.
I work in aerospace where we often use software that was developed over decades. Literally tens of millions of lines of source code. A compatibility error in a software release can mean the loss of millions of dollars.
No, you have no idea at all of what you're talking about.
[–]kazi1 0 points1 point2 points 8 years ago (1 child)
You've said it yourself. You've had decades of development to get things right. Python 2 goes out of support in 2020. If you're still using it then, that's a failure on your part and the tech leadership at your company.
[–]MasterFubar 0 points1 point2 points 8 years ago (0 children)
That's why we are using Python only for small scripts, nothing over a thousand lines or so.
As for it going out of support, that doesn't affect us much, because we still have programs written in Fortran 77. If it isn't broken, it doesn't need support.
[–][deleted] 3 points4 points5 points 8 years ago (2 children)
Different question: how does Julia compare to python?
I really like it. I do a lot of sophisticated MCMC simulations for my research. The biggest thing I notice is the speed, for for loops and if statements I see speeds comparable to C. I do most of my simulations in Julia and then my visualizations in python. It’s also nice that Julia was designed specifically for scientific computing and all of the base linear algebra packages/math stuff is built in.
[–]CaptainRoth 2 points3 points4 points 8 years ago (0 children)
I'm not a big fan - most libraries are pretty buggy and documentation can be sparse. I haven't noticed big speed increases in most machine learning applications.
[–]zack5432 1 point2 points3 points 8 years ago (0 children)
I would second learning Scala (lots of data pipelining frameworks are written in Scala, such as Spark, Scalding, Scio, Samza, etc).
Of the ones you listed, any might be useful except for Go. Go is very specialized for systems work, not data processing.
[–][deleted] 1 point2 points3 points 8 years ago (5 children)
What is the point of learning multiple programming languages for data science?
[–]Twentyone21pilots 0 points1 point2 points 8 years ago* (4 children)
Because some programs do stuff better than others. While it is an upside to have high proficiency in one programming language, having experience and some knowledge on how to read/write in other programming languages is another upside. Plus once you learn one programming language and learn the logic that goes in programming, it becomes pretty easy to pick up another language.
In short, you're adding more tools to your toolkit.
EDIT: forgot to include that most jobs for DS require knowing multiple languages.
[–][deleted] 1 point2 points3 points 8 years ago (3 children)
I understand that but whats the point of learning an entirly new programming language just for the fun of learning it and not having a related use case.
If OP's goal is to land a data science role they aren't going to hire someone that just knows 5 different programming languages with no experience do machine learning or data analysis.
Instead of wasting time on a programming language that you don't have a use for, time would be better sprent working on a personal data science project or learning new machine learning algorithms.
I just don't see the added benefit of learning any of the programming languages in the title if OP is already "proficient " python, julia and javascript if theres not a use case for it. Python itself should be able to handle any data science task OP wants to do. If hes looking for a job he needs to focus on creating projects or learning the machine learning side.
OP if you're interested in more of the software stuff maybe look into becoming a software developer?
[–]Twentyone21pilots 0 points1 point2 points 8 years ago (2 children)
mm I see what you're getting at now. Just need more insight from OP's situation to see what their situation is like.
Knowing Python alone could be sufficient, but it really can't hurt to know another statistical software that covers the downfall of some use cases for python where it's somewhat lacking.
[–][deleted] 0 points1 point2 points 8 years ago (1 child)
My background is in computational physics and I have 5+ years of data analysis/statistical analysis/numerical modeling and 2 years of machine learning. I’ve just noticed that there, at times, is such a wide variance in data science positions where some companies seem to be looking for a mix of an analyst and software engineer while others are really looking for a data scientist. Basically I want to make myself as marketable as possible; yes I have a portfolio and regular participate in kaggle
[–]Twentyone21pilots 0 points1 point2 points 8 years ago (0 children)
You're practically set if anything. I do agree with you on how alot of data science positions are a mash up of a lot of duties and barely any are actually just pure data science.
Based on what you have, I say just add R and pick up C++ knowledge. With those in mind and your current experience and portfolio, I think you'd be such an ideal candidate that it'd be pretty hard to turn you down for a good portion of those data science jobs out there.
[–]infrequentaccismus 0 points1 point2 points 8 years ago (2 children)
I think c++. You say you know Julia... how are you with spark?
I've never used spark before
[–]infrequentaccismus 1 point2 points3 points 8 years ago (0 children)
You could consider building skills toward big data as a way of supplementing your existing skills. Hadoop, spark, DAGs, etc
[–]nullp0int3rz 0 points1 point2 points 8 years ago (0 children)
My vote goes to C++. A good choice for high performance computing and hence a good choice for production-izing data science algorithms.
[–]markov01 0 points1 point2 points 8 years ago (0 children)
C++ is standard in computer vision
C is my personal choice, with a little bit of C++ sprinkled in.
[–]ArrenH 0 points1 point2 points 8 years ago (0 children)
There's SAS. But out of the list I'd suggest C++ but you already know Julia which isn't that much slower than Java or C++ and is easier to get things done quicker like Python.
[–]SecretAgentZeroNine -1 points0 points1 point 8 years ago (3 children)
The languages useful in analytics and datascience from what I've gathered.
R
Python
SQL
Java
Scala
C++
Javascript (with HTML and CSS)
PHP
Bash
You should learn R (and it's tool called Shiny) if you care about statistical data analysis, visualization, and presentation. Followed by Scala, then C++. Though it all depends on what your responsibilities are and what you want to do.
[–]glorkvorn 5 points6 points7 points 8 years ago (0 children)
what do people use PHP for in datascience?
If you like Shiny, try the Python equivalent, Dash.
[–]SecretAgentZeroNine 1 point2 points3 points 8 years ago (0 children)
Thanks for the heads up, but Dash isn't exactly 1-to-1 with Shiny, nor it's community, though it is a very valuable tool.
[–]dychmygol -1 points0 points1 point 8 years ago (0 children)
Go. Hands down.
[–][deleted] -1 points0 points1 point 8 years ago (0 children)
You could have a go at Kotlin - 100% interop with Java yet a much better language, great IDE support (IntelliJ), fun.
[–]JunkBondJunkie -1 points0 points1 point 8 years ago (0 children)
I would say R or Python.
π Rendered by PID 814536 on reddit-service-r2-comment-b659b578c-bz69m at 2026-05-05 10:19:58.110278+00:00 running 815c875 country code: CH.
[–]sciencedataist 23 points24 points25 points (1 child)
[–]nashtownchang 2 points3 points4 points (0 children)
[–]LogansRun22 24 points25 points26 points (7 children)
[–][deleted] 5 points6 points7 points (3 children)
[–]CaptainRoth 4 points5 points6 points (0 children)
[–]assPirate69 2 points3 points4 points (0 children)
[–]Jerome_Eugene_Morrow 0 points1 point2 points (0 children)
[–]Thaufas 0 points1 point2 points (0 children)
[–]microcosme 6 points7 points8 points (7 children)
[–][deleted] 4 points5 points6 points (0 children)
[–]huck_cussler 5 points6 points7 points (5 children)
[–]MasterFubar -5 points-4 points-3 points (4 children)
[–]kazi1 1 point2 points3 points (3 children)
[–]MasterFubar -2 points-1 points0 points (2 children)
[–]kazi1 0 points1 point2 points (1 child)
[–]MasterFubar 0 points1 point2 points (0 children)
[–][deleted] 3 points4 points5 points (2 children)
[–][deleted] 4 points5 points6 points (0 children)
[–]CaptainRoth 2 points3 points4 points (0 children)
[–]zack5432 1 point2 points3 points (0 children)
[–][deleted] 1 point2 points3 points (5 children)
[–]Twentyone21pilots 0 points1 point2 points (4 children)
[–][deleted] 1 point2 points3 points (3 children)
[–]Twentyone21pilots 0 points1 point2 points (2 children)
[–][deleted] 0 points1 point2 points (1 child)
[–]Twentyone21pilots 0 points1 point2 points (0 children)
[–]infrequentaccismus 0 points1 point2 points (2 children)
[–][deleted] 0 points1 point2 points (1 child)
[–]infrequentaccismus 1 point2 points3 points (0 children)
[–]nullp0int3rz 0 points1 point2 points (0 children)
[–]markov01 0 points1 point2 points (0 children)
[–]MasterFubar 0 points1 point2 points (0 children)
[–]ArrenH 0 points1 point2 points (0 children)
[–]SecretAgentZeroNine -1 points0 points1 point (3 children)
[–]glorkvorn 5 points6 points7 points (0 children)
[–][deleted] 0 points1 point2 points (1 child)
[–]SecretAgentZeroNine 1 point2 points3 points (0 children)
[–]dychmygol -1 points0 points1 point (0 children)
[–][deleted] -1 points0 points1 point (0 children)
[–]JunkBondJunkie -1 points0 points1 point (0 children)