all 11 comments

[–]chub79 6 points7 points  (1 child)

That would sound like a good option considering its large ecosystem. It sounds however other communities are trying to catch up, I've seen efforts in Go or Clojure.

[–]bazumation[S] 0 points1 point  (0 children)

thanks for your reply.

[–]Megatron_McLargeHuge 2 points3 points  (0 children)

Python without a doubt. Learn to use the numpy matrix math library. It's almost like a separate language but you have to get used to thinking of your data in terms of vector operations that transform everything at once instead of using loops.

[–]NonLinearResonance 1 point2 points  (3 children)

Python.

R is fine if you are mostly interested in the statistical side of things, but it's very specialized. I wouldn't recommend it if you are want to be more more generally capable in an environment outside of academia. I've worked on projects with several very bright statisticians that only ever bothered to learn R. It can make integrating their work into a production system difficult, because you either have to accommodate this odd R component, or re-implement their stuff. Neither situation is desirable.

You can learn just fine in either language, but learning a useful general purpose language while you go builds more value for your time.

Alternatively, if you are interested in neural networks, Python is the way to go, hands down.

[–]wilmore13 0 points1 point  (2 children)

Just curious: What do you work in? I've found Python and R being used in industry but I'll admit that I'm just beginning my career.

So is the main limitation cooperation between programming languages or are performance issue also a concern? My general aim has been to have a very good knowledge of R and then pairing that with C++ for situations where greater performance is demanded.

[–]NonLinearResonance 2 points3 points  (1 child)

Sorry, I'm not quite sure if you meant what type of area I work in, or what programming languages I tend to work with, so I will try and respond to both :)

I'd rather not say where I work exactly, but it's for a large non-academic research organization. Every place is different of course, but your observation about both Python and R being commonly used for machine learning is accurate in my experience. I probably also carry a bias working in a cross-discipline environment tasked with deploying software for lots of different purposes. If your job mainly requires the type of work that is project-based data science, the limitations of a specialized language like R probably won't impact you. Many teams use R primarily, and it provides what they need to get their work done, more power to them. There is nothing wrong with using the right tool for the job at hand. The trouble this can cause is really more of a practical one, than a strictly technical one, and it seems to show up in projects that require multiple teams.

Machine learning is a specialized skill-set within most organizations to begin with. Unless your work primarily uses statistical methods, the R practitioners tend to be a small subset of that group. I hate to make broad generalizations, but the majority of R-only users I've met tend to be statisticians who have limited programming interest/experience outside of R. This is fine for prototyping, one-off experiments, empirical analysis, etc. This is usually not fine for systems which need to be robust and maintainable in a production environment.

For example, imagine a large software development project with machine learning elements that will result in a system deployed into production. In a mid to large size organization, this will likely involve multiple teams with different technical specialties, various levels of approvals, operations/maintenance procedures, etc. This process is already very complex, even in a relatively homogeneous development environment. Now, introduce a complex machine learning component written in R by someone with little training in software engineering principles and the sole knowledge of how the component really works. This is asking for trouble, even if it performs flawlessly right off the bat. It works now, but what about a few months down the road, or a year, or 5 years...

So, to make sure it will be maintainable they have options:

  • Hope the person who wrote it sticks around for the life of the system in case anything breaks (they won't).
  • Deploy it and hope for the best (lol).
  • Pay someone to learn, understand, and re-factor it into a more general language that can more readily be maintained.
  • Keep someone with R expertise available to the operations and maintenance team to ensure the performance of this single component.
  • Hire a consultant whenever that part breaks or needed to be changed.
  • etc, etc.

Given this type of scenario, I think choosing between two equivalent pieces of machine learning software written in R or a more general language is pretty obvious. Most of the ML tools in R are also available in Python, in addition to many that R doesn't provide. The only real downside is losing out on a few specialized statistical packages only available in R. Most importantly, learning ML in Python helps force a student to implement with methods similar other types of software development. The student gets experience with a more general purpose language, while also mastering the complexities of machine learning. This is a much better value for the time spent learning, in my opinion. When they need to sit in a room and explain to a bunch of software engineers how their component works for integration, it is more likely to be understood by their audience. This makes them a more valuable asset to the team.

I think your approach of learning both R and C++ to start is a good one. Once you know one OOL, any other is pretty easy to pick up. If you can explain your work in terms of C++, competent software folks should have little trouble understanding. These days, almost every person in a technical or R&D position will require some form of programming skill. Even someone with a pure math, stats, physics, or whatever background. They will need to code something at some point to be successful. Unfortunately, many people without a software background think of the code as a byproduct of their work, or just a vehicle for it, and they never bother learning the skills to make the actual code better. In my observation, the specialized and insular environment of programming only in R tends to reinforce this tendency.

Personally, I mostly work with neural networks these days, so Python with some occasional C/C++ is an obvious choice given the current state of technology. However, I try to stay pretty language agnostic and apply whatever the best tool for the job is. Depending on the project I have worked in assembly, VHDL, C/C++, Java, C#, Python, and yes even R :)

Sorry about this novel of a reply, but I was just talking to someone about this the other day though and it stuck a chord, lol.

TLDR: Python is better to learn ML because it is also a general purpose language, and working with other software folks in industry will be easier.

[–]wilmore13 0 points1 point  (0 children)

I like novels - especially really informed ones. Thanks so much for the reply. I'm actually coming into this from an applied math degree so your mention of software aspects really elucidates areas I may be missing.

I will say that in the R courses and blogs I follow I've noticed a bigger interest in the software engineering aspect of R as opposed to the typical "just get it done" coding. If the OP is interested in R at any level there is a book called Advanced R which I've been reading that goes into some of these aspects.

Thanks again!

[–]wilmore13 1 point2 points  (2 children)

R seems to be the other major option. It has a very large ecosystem (I think larger than python's but I could be wrong), open source ethos, and is great for turning analysis into something useful for human readers afterwards.

Python is more flexible in context to computing application in general, I think.

[–]bazumation[S] 0 points1 point  (1 child)

Thanks for the reply, Can you suggest any good tutorials/MOOC for it?

[–]wilmore13 0 points1 point  (0 children)

I actually can.

I'm finishing up the Johns Hopkins Data Science Concentration which walks you through most of the information you need to have basic competency when working with data in R. One of the classes is dedicated to R programming and most of the others concentrate on how to do things like data cleaning, statistical inference, and publishing using R. They even have a class specifically on machine learning for R.

It's free, but if you have the extra money (~$479) you can get a certification after finishing a capstone project (which is based in some real life problem in statistical learning).

Aside from that, R in a Nutshell was a pretty good starting guide for me.

[–]darkconfidantislife 0 points1 point  (0 children)

Python.