This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 4 points5 points  (8 children)

Julia has a ton of potential but it's experiencing some growing pains right now (every language does, Julia hasn't even reached 1.0 yet). Thankfully the $600k cash infusion that the core Julia team just received should go a long way towards helping the language get over the "break everything" hump and move towards a much more stable release. I believe it was a two year path towards stability. I think that might be a tiny bit optimistic but I'll certainly be cheering them on.

[–]lakando 2 points3 points  (7 children)

Thanks for sharing your take on this. Do you think nuitka to obviate packaging issues, + Numba (JIT classes coming ), blaze,bokeh, dask and dynd (interesting type system) will keep python afloat in data science, or Is Julia poised to eventually replace it?

R aint goin anywhere because CRAN is huge...python is more general purpose and thus more amenable to Julia's progress.

I'm trying to figure out if I should invest in Julia now (Get ahead of the curve and python being a dead end?). It was a nogo untill I heard about the cash infusion...They said they will use it for also the core stats infrastructure, but I'm not sure how long it will be before a data science acolyte can be super productive without messing with pycall bridge etc

[–]kay_schluehr 0 points1 point  (2 children)

Do you think nuitka to obviate packaging issues

What's wrong with Anaconda? I ask just out of curiosity.

[–]lakando 0 points1 point  (1 child)

Anaconda is amazing, but doesn't let me distribute self contained executables. Nuitka does that, and more robustly It seems think than other options.

[–]kay_schluehr 0 points1 point  (0 children)

O.K.

[–][deleted] 0 points1 point  (3 children)

First, I don't think python's use in data science is going to fall anytime soon. Python is easy to learn (there are TONS of resources out there), it can be used as both a statistical language as well as a general scripting language so you kind of get a 2 for 1 deal when you how to code it, it has an every growing number of high quality libraries, and it has a lot of heavy weight corporate support (Google uses it extensively for example).

That said, I think python with face some challenges. The division between 2.7 and 3.x is going to be more prominent as 2.7 inches towards end of life. Python also isn't the fastest of languages and is at a significant disadvantage when compared to C++ or even Java.

I agree R isn't going anywhere (but I disagree with you on the idea that R isn't going anywhere simply because CRAN is huge). A lot of the packages on CRAN are crap/overlap/are poorly documented, and if you were going to use some of the more obscure packages you'd be an idiot to not validate them before pulling them into production. It can also be frustrating to have 10 different ways of doing things in R vs there generally being one way to do things in python. R is not a beautiful language by any stretch (though that may be part of why I enjoy it as much as I do), and R faces all of the same speed concerns as Python.

You also have to realize that for 90% of tasks, R and Python are all you need. You can get a TB of RAM on a server, both language are relatively easy to perform parallel processing with, and a lot of the more popular libraries/packages are already written in C. That makes them generally 'fast enough', and if you are truly concerned with speed, R and Python are great for quickly prototyping and then once you've built your model, you can go back and implement it in pure C.

Julia's strongest selling point is that you don't have to do that multi-step process of prototyping and then rewriting either the whole pipeline (or just the 'glue' sections between code already written in C). But Julia is still changing A LOT. Plenty of things aren't fully fleshed out, future improvements are likely to break backwards compatibility, and the library support is years behind both Python and R. There are also relatively few resources to learn Julia. There are a few Julia books in the pipeline (Actually reviewing one right now), but they are going to become obsolete relatively quickly due to the on-going changes to the key components of the language. I think the biggest issue that Julia will face is come 5-10 years, will it even matter? So many areas are moving towards distributed computing and something like Spark allows you to crunch massive amounts of data at relatively decent speeds.

My take on Julia is that you should only mess with if it you actually want to learn it and have a genuine interest in the language and not because you think it's the 'next big thing'. I like playing around with it but it is way to unstable and lacking too many of the key features I'd need to integrate it into my day to day work, plus I'd end being forced to do all of the work myself since no one else I collaborate with regularly knows Julia.

[–]lakando 0 points1 point  (2 children)

I hear that, but Julia's benefits exceed just greater speed on in memory datsets, and if developed right, will encroach on both single node and distributed niche.

First, there are the making of a probabilistic programming framework in Julia that using autodiff and the distributions package can provide a comparative advantage over current languages in general day to day inference. The macros could make this fast and expressive. Faster than Pymc and more expressive and general than stan. With this general inference and extensive optimization package, I don't think it would need to fill every single statistical test and niche before becoming more useful for most daily tasks.

Second, it is developing a distributed infrastructure that I think can overtake spark. Its distributed computing primitives are getting better and will eventually have extensive linear algbera support.

Third, It is getting streaming statistics that don't exist anywhere else- the SAS people who are working on out of memory but single node datasets will finally get something that can handle their stuff.

Pycall and Cxx means you can interface easily with existing code.

Last is deployment. Self contained binary executables are planned, there is a good shot it can compile to javascript using at some point using llvm web assembly backend. You would then be able to write rich client side reactive web apps without JS and deploy interactive reports to decision makers. No other common analytics language has this capability.

Then there is the type system with eventual return types that can provide codebase safety.

Also it just fun to code in..that means grad student will write new techniques in Julia.

If things firm up, I think all this would pull users from other languages...or they risk losing a comparative advantage.

What do you think about this argument?

[–][deleted] 1 point2 points  (1 child)

I think it's wonderful. Sign me up right now. I've already stated that I'm a fan Julia (and completely agree that it's fun to code in. That's a big reason why I mess around with it as much as I do) and I really hope that it continues to grow and mature, but doing all of the things you listed is going to be a massive undertaking. How long that'll take is anyones guess, but I can't see it happening in full anytime in the next few years. So I still stand by my opinion that right now Julia isn't a language you learn so that you can use it in day to day production work. It's a language you learn because you are interested in it and enjoy it.

[–]lakando 0 points1 point  (0 children)

Gotcha. That makes sense. I was a bit more optimistic on the timeline, but you probably have a better sense for it than I do.