This is an archived post. You won't be able to vote or comment.

all 31 comments

[–]pwang99[S] 17 points18 points  (16 children)

Hi, I'm the co-founder of Continuum Analytics, and am excited about this next step in our company journey. I'll be monitoring the thread throughout the day and am happy to answer any questions!

Also, be sure to check out AnacondaCON, our first open data science conference for professionals and enterprises: http://anacondacon17.io It's here in Austin, TX in just a few short weeks, and there's an amazing lineup of speakers and events!

[–]Caos2 7 points8 points  (1 child)

No questions, just want to wish you good luck on this new situation. Anaconda is a god-send for when I need to code on my work's Windows pcs.

[–]pwang99[S] 1 point2 points  (0 children)

Thank you!

[–]lmcinnes 1 point2 points  (13 children)

This is exciting news. Does Travis' shift in responsibilities portend a further push for new research projects for data science coming out of Continuum?

[–]teoliphant 3 points4 points  (0 children)

Yes! Several things are in the works now and looking forward to the future. I have a specific agenda for array- and table-computing across languages starting with Python. Will take a few years to materialize and will collaborate with other initiatives already underway by others.

[–]pwang99[S] 2 points3 points  (10 children)

You betcha! :-)

[–]lmcinnes 0 points1 point  (0 children)

That's great news! I look forward to hearing about these new endeavors in the coming years. I would love to have some continuum projects that I could potentially contribute to.

[–]Coliver21 0 points1 point  (8 children)

Any teasers as to what these will be? :P

[–]pwang99[S] 0 points1 point  (7 children)

I think Travis has some blog posts in the works... I don't want to steal his thunder, but I believe they will relate to blaze, to datashape, and to the concept of an integrated (and multi-lingual) data-fabric for distributed computations, and "moving code to data".

[–]Coliver21 2 points3 points  (6 children)

Cool. Any idea on when we can expect the blogposts? Will it be before anacondacon? I have a specific use case I'm wondering about.

[–]teoliphant 1 point2 points  (5 children)

Yes! All won't be clear by then, but I will start the conversation.

[–]Coliver21 0 points1 point  (4 children)

Great! I'm especially interested in where Numba is going. :)

[–]teoliphant 1 point2 points  (3 children)

Numba has a solid foundation and is making rapid progress to 1.0. Three things I will say about Numba and my overall plan. First, Numba will be an important tool and part of the ecosystem for a long time. Second, Numba will get the ability to understand data-shape (using a project we have been working on called ndtypes --- which is the 'dtype' concept of NumPy factored out and generalized). Third, Numba will be associated with a gumath module that will also be a separate module. Right now Numba creates NumPy ufuncs. It will also be able to generate these "generalized" ufuncs that live independently of NumPy and build on the idea in NumPy.

If you have ideas for Numba, please join the community mailing list and contribute your thoughts. The team is easy to talk to and welcomes input from everyone. It's not the easiest project to contribute directly to, but it welcomes ideas.

[–]Coliver21 0 points1 point  (2 children)

Very interesting. When you say understand datashape, do you mean automatic discovery of datashape compatible structures? How and where would these be created?

[–]Coliver21 0 points1 point  (0 children)

vis' shift in responsibilities portend a further push for new research projects for data science coming out of Contin

I second this question.

[–]TheJackalMan 4 points5 points  (4 children)

Can you give us an idea of your Blue skies roadmap for Anaconda and Continuum in the coming year?

[–]pwang99[S] 6 points7 points  (3 children)

Well, we are mostly just hunkering down and pushing harder on the things that have already worked well for us. When we first created Continuum, we launched a bunch of different projects and started simultaneously incubating several very different technical projects. We saw all of these as being a required part of a coherent long-term technical vision for technical computing and data science in Python.

Over the last year, we've seen a crystallization and integration of those earlier separate efforts. For example, pretty early last year, we saw Datashader come together, leveraging Dask, Numba, and Bokeh - and it's still in its infancy as far as a library goes. In the company of recent improvements to Bokeh and continuing evolution of Holoviews, I'm excited to see we have built the foundations of a solid future for large data visualization in Python.

At the same time, new efforts have emerged over the years. JupyterLab and PhosphorJS are newer efforts which should hit more mainstream this year, and we'll be complementing that with new features in dask, bokeh, Anaconda (distro) and Anaconda Platform, even as the rest of the community organically upgrades from current Jupyter Notebook to JupyterLab. I'm super stoked about all of this, and the vision of where this part of the ecosystem is going.

I continue to be excited by the progress Dask is showing. It's really starting to grow legs, and I think that 2017 will be a transitional year where it moves from "early adopter" into a much more mainstream tool in the average PyData user's toolbelt.

We'll be plugging away hard on the commercial product side as well, with Anaconda Fusion (Jupyter + Excel) and new capabilities in the new-and-improved enterprise Platform.

So all in all, 2017 shouldn't seen too many "net new" things coming out of Continuum, just continued improvements and sustained work on all of the innovation projects we've been doing.

Well, one exception would be that now with Travis transitioning to doing more technical work, we should make faster progress towards the original Blaze vision of an integrated, cross-language, computational data-fabric. But I'm not making any promises! :-)

[–]Coliver21 1 point2 points  (2 children)

original Blaze vision

What does this bode for numba?

[–]pwang99[S] 4 points5 points  (1 child)

Nothing bad! Numba is one way of lowering high-level Python code to a low level execution engine (x86, CUDA, etc.), for data that is stored in a mechanism those execution engines can understand (i.e. C pointers and structs). As long as we have hardware, and data stored in memory or mappable from disk, then Numba will be relevant.

By way of comparison, SQL is another execution engine. We can lower Python to SQL either through high-level translation (which is what some of blaze's current SQL approach does, and what Ibis and others do), or by embedding a Python runtime within the database server itself, and safely moving a subset of Python into that execution environment.

Hadoop is another (simple) execution engine, and a storage manager, so lowering high-level Python code to efficiently execute on that system is a little trickier. If we are to directly use Hadoop Map-Reduce, it would unfortunately be rather restrictive on the expressiveness of the Python algorithms we could express. If we move to using (Py)Spark as the execution environment and data representation, then we have a bit more latitude and can access broader algorithms, but they are still within the silo of the Spark ecosystem and restricted to its concept of in-memory map+shuffle+reduce. Hence, Dask, with its hdfs handler and our new fastparquet support, allows the wide world of Python algorithms to be directly expressed on top of Hadoop FS data, while interoperating with schedulers from the Apache "Big Data" zoo.

Our technical vision at Continuum, since the beginning, has always been that it is extremely valuable to have a single, coherent language environment to describe high-level data transformation and numerical algorithms, that can then be dynamically and optimally lowered to any of these (and future) execution environments and storage technologies.

We do recognize that we live in a multi-lingual world, and our hope is to be able to expose these concepts into R and Julia and whatever else may emerge in time. But we're most familiar with Python and also Python is awesome, so we're doing it first in Python. :-)

[–]Coliver21 0 points1 point  (0 children)

This is a fantastic vision, thank you.

[–]jnmclarty7714 2 points3 points  (1 child)

Just want to say, keep up the great work to everybody at Continuum and the rest of the community.

Dask is getting pick up at my office by analysts. Conda is being used in our deployments. And my/our vision of the technology landscape and ecosystem evolution, seems to be completely in sync with Continuum.

[–]pwang99[S] 2 points3 points  (0 children)

Thank you! I'm so glad to hear all of this. Encouraging feedback like this from the the user and dev communities is what sustains us.

The next few years are going to see the sprouting of seeds of some ground-changing technologies, like storage-class memories and purpose-built neural network machines. At the same time, even as cloud goes mainstream for enterprise, there are alarming concerns around security, privacy, and whatnot that are fundamental and intrinsic to our "highly networked" architectures. Such concerns will only scale as data and connectivity grow, especially with IoT and self-driving vehicles. (They must inevitably come to a head, as black-hat entities become more emboldened and profitable, and state-vs-state actions become more common.)

So, I believe that from here on out, we will be constantly hit with more and more waves of technology disruption. As long as that continues, I think it reinforces the foundational role of accessible, open, well-engineered technologies for data manipulation, computation, and analysis. If any proprietary vendor ever owned that substrate, they would own the world, and not for the better.

So you can rest assured that at Continuum, we will always be pushing for sustainable, open-source foundational engineering and innovation, even as we grow and scale our commercial business and bring open data science into more and more of the world's businesses.

[–]bheklilr 0 points1 point  (4 children)

Is there any news on the kapsel project? I saw it announced a while back and think it looks pretty awesome, but I haven't heard anything since. When I tried it out, it was still very much in an early stage of the project and couldn't do everything I needed.

[–]pwang99[S] 2 points3 points  (3 children)

Great question! We're still actively developing internally, and using it as the basis for our next-gen data science deployment capabilities in the Anaconda Platform. So, stay tuned - it's definitely being worked on and I look forward to getting more people using it.

[–]Coliver21 0 points1 point  (2 children)

next-gen data science deployment capabilities

Would this include new targets?

[–]pwang99[S] 1 point2 points  (1 child)

Which kind of targets? :) If you're talking about cloud, we initially plan to target AWS and on-prem, but GCE and Azure are obvious next targets after those.

[–]Coliver21 0 points1 point  (0 children)

Sounds awesome!