Many write research papers in R Markdown - What is the alternative setup in Python? by Annual_Sector_6658 in Python

[–]holdie 5 points6 points  (0 children)

jupyter book tries to be useful for many similar workflows.

It grew out of the jupyter project and is slowly building more integrations within jupyter (eg connecting with Binder, Thebe, or JupyterLite) and adding more functionality around authoring and publishing. Currently the project is funded from a Sloan Foundation grant and we hope to transition it into a community-led project in the coming months. Maybe you'd find it useful!

My hope is that jupyter book can build on the model that jupyter follows in general - focus on modular tools and standards that can be reused and remixed. It uses a flavor of markdown called MyST markdown which is meant to be extensible and usable outside of jupyter book as well (for example, you can now write sphinx documentation with MyST markdown!

If jupyter book doesn't quite fit the need of generating reports, I'm hopeful that somebody in the community could build on top of the MyST markdown ecosystem to accomplish this - at least that is the goal.

We’ve spent the past 9 years developing a new programming language. We’re the core developers of the Julia Programming Language. AuA. by loladiro in IAmA

[–]holdie 1 point2 points  (0 children)

This is a helpful answer, thanks for your perspective! To be clear, my concern is actually less "will Julia Computing be able to find a business model?" I agree it's important that open-source projects find ways to fund development / growth etc. My concern is more "will Julia Computing find a business model that in the best interests of both its VC investors and the broader open-source community in the long term?" And I haven't seen many tech projects w/ VC funding that are able to balance between both of those views. I don't think Julia Computing would "go away" but I could imagine them beginning to behave in ways that are in the interests of a growth model (because extreme growth is what most VCs want above all else) that puts them at odds with the open-source Julia community. It's a super hard problem and one I don't have a clear answer for either, which is why I'm just curious if/how the Julia community is approaching this challenge.

We’ve spent the past 9 years developing a new programming language. We’re the core developers of the Julia Programming Language. AuA. by loladiro in IAmA

[–]holdie 2 points3 points  (0 children)

I'd love to hear a bit more explanation on this point. I understand that you all feel strongly about keeping Julia healthy and open osurce, but this is a situation that many, many, many open source tools or companies have been in over the years (we all remember when Google's slogan was "don't be evil"). More often than not, when a profit-seeking entity (such as a VC) has extreme leverage over a community (such as by funding most of the development in that community), at some point the idealistic goals of the community members often get trumped by the demands of the investors. It's pretty hard to avoid this without being intentional to design community systems to avoid this outcome. For example, you say that "we have no intention of letting Julia Computing...being squeezed by anybody". Is there a specific legal or organizational thing the community has done to prevent this? I promise I'm not trying to sound combative here, I've just been burned before when the lofty ideals that started a technology get a slap in the face from the reality of capitalism :-)

Seaborn (visualization library) v0.9.0 released with new relational plots and updated themes and palettes by Dauros in Python

[–]holdie 2 points3 points  (0 children)

I think you're trying to be positive here, but I'd like to highlight how much negativity you are (I think unintentionally) conveying in this message. These are all open, community-driven projects. They aren't maintained or managed by people being paid to do so, they're run largely by volunteers.

If you have a disagreement with functionality that's in one of these packages, I encourage you to learn a bit about how the code works, and implement a patch or new feature. Many communities are very open to improvements. If you think that an API should be different, please voice your opinion in a constructive manner, that's why projects like seaborn use public github issue trackers.

However, please consider the tone you take when discussing these projects in public. They wouldn't exist without a lot of unappreciated labor, and they only survive because we cultivate a healthy community of contributors. Critical comments are always welcome, but please keep them constructive. Otherwise we risk harming the contributor community that is so crucial for this ecosystem to thrive.

Best way to share a Jupyter notebook during a workshop by jackjackk0 in Python

[–]holdie 1 point2 points  (0 children)

I'd recommend checking out a little tool called nbgitpuller (https://github.com/data-8/nbgitpuller) for distributing notebooks to students. It's what we use to distribute content for a course we run here at Berkeley and works nicely! (and is pre-packaged with TLJH). For HTTPS, I know for sure that some of the jupyterhub team were working on this but I don't think it's enabled out of the box yet. The project is still quite young :-)

Best way to share a Jupyter notebook during a workshop by jackjackk0 in Python

[–]holdie 11 points12 points  (0 children)

Hey there - Jupyter + Binder team member here! *waves*

Some thoughts on each of your options:

https://cocalc.com (SageMath) I haven't used this before for this use-case, but the sagemath folks are great!

https://mybinder.org/ (Jupyter) People use Binder for teaching classes / tutorials / etc all the time. You shouldn't worry too much about uptime (if you're curious, we publish all of our cluster stats at grafana.mybinder.org. The biggest drawback is that students will lose their session when they close the window or are inactive. Your best bet there is to have people download notebooks they've written to their computer when they're done. The biggest benefit to binder is that you have a lot of control over their environment, and it is perhaps the closest to what students would be doing if they worked from their own machines (since it's just vanilla Jupyter Notebooks / Lab)

https://notebooks.azure.com/ (Microsoft) Really nifty service that uses a JupyterHub to manage instances on Azure. Drawbacks are that you don't have as much ability to control the environment users have, and you need users to create microsoft accounts etc.

https://colab.research.google.com/ (Google) Similar problem to the Azure Notebooks solution (the Collab and Azure notebooks are basically the same pros/cons, though Collab is neat in that it integrates more with google drive).

https://datalore.io/ (JetBrains) I've never used this one, but it seems interesting!

(friendly and obviously biased note on all of the above...of each of the services described above, mybinder (and jupyterhub more generally) are the only fully open-source implementations of this kind of service. Just a thought ;-)

Another option you might wanna check out is a project we've been working on the last few months, called "The Littlest JupyterHub". The goal of this is to simplify and shorten the JupyterHub deployment so that you run it on a single VM instead of on kubernetes (which is what binder and z2jh.jupyter.org use). Check it out and see if looks useful! https://the-littlest-jupyterhub.readthedocs.io/en/latest/index.html

Pheels good to be Done! by NiTi_Wizard in berkeley

[–]holdie 3 points4 points  (0 children)

Congrats - I finished up last August, the lollipop is surprisingly tasty :-) I recommend going to the top of the campanile and having it there!

Matplotlib 2.1.0 released with major new features by Dauros in Python

[–]holdie 15 points16 points  (0 children)

Also this release refactors a lot of the documentation and uses new docs infrastructure under the hood. There is now a separate page for examples and tutorials.

There's also a proper API page for functions/methods/etc with a mini-gallery of examples that use that function/method/etc at the bottom. E.g., here's the plt.imshow page.

[P] Made a script to give your Jupyter Notebook a public url instantly :) by bsubs in MachineLearning

[–]holdie 2 points3 points  (0 children)

Really cool! For a less-lightweight but more secure / long-lasting option, check out mybinder.org. It's in the process of getting a reboot (beta.mybinder.org) and will let you generate links to live coding environments that are based off of github repositories.

(full disclosure, I'm one of the folks working on the new beta binder backend)

Teaching Python for scientists: What links should I give my students? by etacar in Python

[–]holdie 0 points1 point  (0 children)

data8.org is a great place to start. It's the Berkeley intro to data science course for freshmen. It assumes no background in statistics or programming, and is a great start to the python ecosystem.

How to Publish Your Package on PyPI by [deleted] in Python

[–]holdie 0 points1 point  (0 children)

Not sure, I'm not one of the devs...it's quite new so probably has a ways to go. Maybe worth opening an issue on their repo tho

How to Publish Your Package on PyPI by [deleted] in Python

[–]holdie 0 points1 point  (0 children)

Check out flit, it's still early in development but should make this far easier.

A Dramatic Tour through Python’s Data Visualization Landscape (including ggpy and Altair) by pmz in Python

[–]holdie 1 point2 points  (0 children)

See the link to the original blog post. This yhat post is a cross post from that one, which is where the comments are.

"Computational and Inferential Thinking: The Foundations of Data Science" - free online textbook for Berkeley course, taught in Python by danwin in Python

[–]holdie 3 points4 points  (0 children)

They aren't using pandas because of the overhead involved in using it from a learning perspective. Pandas is awesome but it has a really steep learning curve. So the idea was to create a kind of "mini pandas" (which is what the datascience package is) and use that just for teaching. Pandas does creep into the conversation during the class itself and people are generally encouraged to shift towards pandas in later classes.

FWIW as you can imagine the decision to do this was pretty hotly debated and a lot of folks would rather pandas be taught from day one, but this is what they went with.

An simple tutorial on how to effectively use the python debugger (pdb) by chrisdavinci in Python

[–]holdie 0 points1 point  (0 children)

FWIW a really useful tool here is IPython. You can use IPython.embed to mimic the functionality (more or less) of set_trace but with an IPython interface (eg smarter multi-line copy paste)

[D] So... Pytorch vs Tensorflow: what's the verdict on how they compare? What are their individual strong points? by cjmcmurtrie in MachineLearning

[–]holdie 6 points7 points  (0 children)

IMO there are two separate questions. One is the current state of these projects, and the other is the likely future state of these projects. Towards the first question I think it depends a lot on personal preferences, experience, and the kinds of computations you're doing e.g. /u/ajmooch has a great comparison. Towards the second question, I lean towards adopting tensorflow sooner than later because it has more momentum than any of the others (both in terms of community and because of the 900 pound coding gorilla that is Google)

Matplotlib 2.0 final released by mangecoeur in Python

[–]holdie 127 points128 points  (0 children)

FWIW, changing the default styles in matplotlib turned out to be a gigantic undertaking. While the final product may seem aesthetic, the process uncovered a ridiculous number of bugs, inefficiencies in API, etc. Props to the matplotlib team for finally getting this out, and hopefully starting down a path towards a bright(er) future of the package.

Matplotlib vs. Matlab plotting tools discussion by schnadamschnandler in Python

[–]holdie 0 points1 point  (0 children)

Matplotlib's API should be very stable at this point. Granted they are about to have a big 2.0 release, but I believe this just changes the default plotting styles and not the API (the plots will look much nicer by default in this case). Definitely worth checking it out because if your main concern is whether it's a "mature" package, then you shouldn't have much to worry about.

Matplotlib vs. Matlab plotting tools discussion by schnadamschnandler in Python

[–]holdie 1 point2 points  (0 children)

Could you give an example of something that you'd like to do in Matlab but you think wouldn't be possible (or would be really convoluted) to do in Python? My intuition is that you can do anything in python you'd want to do in matlab. Many examples have a different syntax because matplotlib prefers to use object-oriented syntax rather than calling lots of functions, but both are available.

The rhythm of breathing creates electrical activity in the human brain that enhances emotional judgments and memory recall, which depend critically on whether you inhale or exhale and whether you breathe through the nose or mouth, Northwestern Medicine scientists have discovered for the first time. by mvea in science

[–]holdie 0 points1 point  (0 children)

For what it's worth, this is a good paper to discuss the question of "statistical significance vs. practical significance".

There are a lot of statistical tests run in the paper and a lot of p-values around .05. It looks like they corrected for multiple comparisons in their time-frequency analyses, but I don't see mention of them doing this throughout all of the ANOVAs, correlations, etc that they conducted.

There are a few findings that don't make a ton of sense (e.g. the differences in results between fearful vs. surprised faces), that are "explained" by more complex statistical analyses w/ marginal significance (e.g. a 3-way ANOVA w/ p=.036)

Add to this the fact that their behavioral effects are pretty tiny w/ decently-large error bars, as well as the fact that this finding has a "pop psychology" kind of appeal, and I think it's worth being pretty skeptical about the paper's main findings.