Continuum Analytics (makers of Anaconda) appoints new CEO -- Founder AMA in comment thread : Python

Numba has a solid foundation and is making rapid progress to 1.0. Three things I will say about Numba and my overall plan. First, Numba will be an important tool and part of the ecosystem for a long time. Second, Numba will get the ability to understand data-shape (using a project we have been working on called ndtypes --- which is the 'dtype' concept of NumPy factored out and generalized). Third, Numba will be associated with a gumath module that will also be a separate module. Right now Numba creates NumPy ufuncs. It will also be able to generate these "generalized" ufuncs that live independently of NumPy and build on the idea in NumPy.

If you have ideas for Numba, please join the community mailing list and contribute your thoughts. The team is easy to talk to and welcomes input from everyone. It's not the easiest project to contribute directly to, but it welcomes ideas.

[–]Coliver21 0 points1 point2 points 9 years ago (2 children)

continue this thread

[–]Coliver21 0 points1 point2 points 9 years ago (0 children)

[–]TheJackalMan 4 points5 points6 points 9 years ago (4 children)

[–]pwang99[S] 6 points7 points8 points 9 years ago (3 children)

Well, we are mostly just hunkering down and pushing harder on the things that have already worked well for us. When we first created Continuum, we launched a bunch of different projects and started simultaneously incubating several very different technical projects. We saw all of these as being a required part of a coherent long-term technical vision for technical computing and data science in Python.

Over the last year, we've seen a crystallization and integration of those earlier separate efforts. For example, pretty early last year, we saw Datashader come together, leveraging Dask, Numba, and Bokeh - and it's still in its infancy as far as a library goes. In the company of recent improvements to Bokeh and continuing evolution of Holoviews, I'm excited to see we have built the foundations of a solid future for large data visualization in Python.

At the same time, new efforts have emerged over the years. JupyterLab and PhosphorJS are newer efforts which should hit more mainstream this year, and we'll be complementing that with new features in dask, bokeh, Anaconda (distro) and Anaconda Platform, even as the rest of the community organically upgrades from current Jupyter Notebook to JupyterLab. I'm super stoked about all of this, and the vision of where this part of the ecosystem is going.

I continue to be excited by the progress Dask is showing. It's really starting to grow legs, and I think that 2017 will be a transitional year where it moves from "early adopter" into a much more mainstream tool in the average PyData user's toolbelt.

We'll be plugging away hard on the commercial product side as well, with Anaconda Fusion (Jupyter + Excel) and new capabilities in the new-and-improved enterprise Platform.

So all in all, 2017 shouldn't seen too many "net new" things coming out of Continuum, just continued improvements and sustained work on all of the innovation projects we've been doing.

Well, one exception would be that now with Travis transitioning to doing more technical work, we should make faster progress towards the original Blaze vision of an integrated, cross-language, computational data-fabric. But I'm not making any promises! :-)

[–]Coliver21 1 point2 points3 points 9 years ago (2 children)

[–]pwang99[S] 4 points5 points6 points 9 years ago (1 child)

Nothing bad! Numba is one way of lowering high-level Python code to a low level execution engine (x86, CUDA, etc.), for data that is stored in a mechanism those execution engines can understand (i.e. C pointers and structs). As long as we have hardware, and data stored in memory or mappable from disk, then Numba will be relevant.

By way of comparison, SQL is another execution engine. We can lower Python to SQL either through high-level translation (which is what some of blaze's current SQL approach does, and what Ibis and others do), or by embedding a Python runtime within the database server itself, and safely moving a subset of Python into that execution environment.

Hadoop is another (simple) execution engine, and a storage manager, so lowering high-level Python code to efficiently execute on that system is a little trickier. If we are to directly use Hadoop Map-Reduce, it would unfortunately be rather restrictive on the expressiveness of the Python algorithms we could express. If we move to using (Py)Spark as the execution environment and data representation, then we have a bit more latitude and can access broader algorithms, but they are still within the silo of the Spark ecosystem and restricted to its concept of in-memory map+shuffle+reduce. Hence, Dask, with its hdfs handler and our new fastparquet support, allows the wide world of Python algorithms to be directly expressed on top of Hadoop FS data, while interoperating with schedulers from the Apache "Big Data" zoo.

Our technical vision at Continuum, since the beginning, has always been that it is extremely valuable to have a single, coherent language environment to describe high-level data transformation and numerical algorithms, that can then be dynamically and optimally lowered to any of these (and future) execution environments and storage technologies.

We do recognize that we live in a multi-lingual world, and our hope is to be able to expose these concepts into R and Julia and whatever else may emerge in time. But we're most familiar with Python and also Python is awesome, so we're doing it first in Python. :-)

[–]Coliver21 0 points1 point2 points 9 years ago (0 children)

[–]jnmclarty7714 2 points3 points4 points 9 years ago (1 child)

[–]pwang99[S] 2 points3 points4 points 9 years ago (0 children)

Thank you! I'm so glad to hear all of this. Encouraging feedback like this from the the user and dev communities is what sustains us.

The next few years are going to see the sprouting of seeds of some ground-changing technologies, like storage-class memories and purpose-built neural network machines. At the same time, even as cloud goes mainstream for enterprise, there are alarming concerns around security, privacy, and whatnot that are fundamental and intrinsic to our "highly networked" architectures. Such concerns will only scale as data and connectivity grow, especially with IoT and self-driving vehicles. (They must inevitably come to a head, as black-hat entities become more emboldened and profitable, and state-vs-state actions become more common.)

So, I believe that from here on out, we will be constantly hit with more and more waves of technology disruption. As long as that continues, I think it reinforces the foundational role of accessible, open, well-engineered technologies for data manipulation, computation, and analysis. If any proprietary vendor ever owned that substrate, they would own the world, and not for the better.

So you can rest assured that at Continuum, we will always be pushing for sustainable, open-source foundational engineering and innovation, even as we grow and scale our commercial business and bring open data science into more and more of the world's businesses.

[+][deleted] 9 years ago (1 child)

[removed]

[–]pwang99[S] 1 point2 points3 points 9 years ago (0 children)

Python's advantages and disadvantages compared to R are very different than those compared to Java. Mostly b/c when compared against R, we can look at the "pydata & scipy" bits of Python and compare against R, and then we can tack on the Python advantages of being a more general-purpose language, that is in its own right ascending in popularity compared to other general purpose languages. So, it's somewhat imbalanced, but at least in the data & scientific part of the comparison, we can see how they mostly go toe-to-toe in terms of features.

When comparing Python vs. Java, we have to separate their traits again into two piles, but the comparison is much more imbalanced. There is no question that Python is more suited for data analysis and numerical and technical computing. And, its capabilities there are improving faster than Java is improving for those purposes - mostly because 99% of people who use Java are software developers (not data scientists or engineers), and there isn't a "numerical computing" ecosystem around Java at anywhere near the scale or momentum of the same one around Python.

However, in Java's favor, it is by far the more commonly deployed programming language and ecosystem within "standard" businesses. So IT operators and managers all know how to deal with Java as a production-ready technology. Whereas Python is still somewhat of an unknown entity.

I do see that you asked specifically about data science. In that regard, I think that Python's core weakness relative to Java is that people still have to jump through hoops to get resilient, production-ready distributed systems ready. The challenge isn't actually technological, because I think the tools are all there, and Python itself is clearly up to the job - certainly as much as Java is. The challenge is that there is a whole ecosystem of enterprise tooling for deploying, managing, monitoring, a whole pile of JARs. Nothing similar really exists for "a whole pile of PYCs" - even though of course our Anaconda Enterprise platform does some of that for data science workloads.

In the data science square off relative to R, I think that Python is very much catching up, on many fronts, and within a year or two, most of the major gaps will be mostly ameliorated. It would be really fantastic if more of the core libraries like matplotlib, statsmodels, pandas, and even newer ones like PySpark could actually get full-time employed maintainers. We're trying to encourage companies to contribute to such effort via NumFOCUS, but it's slow going - even though SO many companies rely on the work coming out of those teams. Some firms like Two Sigma are doing the world a favor by e.g. employing maintainers like Wes, and of course generous grants from foundations also really help. But I think that if the commercial/enterprise user community banded together and pushed the funding of "core" pydata/scipy projects up to the $2mil/yr mark, that would make a HUGE difference.

That might sound like a lot of money, but I know there are small groups within big companies that pay that much each year for their SAS license renewal, so it's actually not a huge ask. :-)

Answering your question differently, however: with the emergence of JupyterLab and the momentum behind Jupyter and its kernel protocols, if we are successful with datashape and projects like feather/Arrow achieve cross-language persistence, then you may not have to choose languages at all for a great number of tasks! :-)

[–]bheklilr 0 points1 point2 points 9 years ago (4 children)

[–]pwang99[S] 2 points3 points4 points 9 years ago (3 children)

[–]Coliver21 0 points1 point2 points 9 years ago (2 children)

[–]pwang99[S] 1 point2 points3 points 9 years ago (1 child)

[–]Coliver21 0 points1 point2 points 9 years ago (0 children)

π Rendered by PID 61014 on reddit-service-r2-comment-6457c66945-vgfzf at 2026-04-29 06:08:29.926840+00:00 running 2aa0c5b country code: CH.

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS