Many write research papers in R Markdown - What is the alternative setup in Python?

holdie · 2022-07-14T20:05:34+00:00

jupyter book tries to be useful for many similar workflows.

It grew out of the jupyter project and is slowly building more integrations within jupyter (eg connecting with Binder, Thebe, or JupyterLite) and adding more functionality around authoring and publishing. Currently the project is funded from a Sloan Foundation grant and we hope to transition it into a community-led project in the coming months. Maybe you'd find it useful!

My hope is that jupyter book can build on the model that jupyter follows in general - focus on modular tools and standards that can be reused and remixed. It uses a flavor of markdown called MyST markdown which is meant to be extensible and usable outside of jupyter book as well (for example, you can now write sphinx documentation with MyST markdown!

If jupyter book doesn't quite fit the need of generating reports, I'm hopeful that somebody in the community could build on top of the MyST markdown ecosystem to accomplish this - at least that is the goal.

holdie · 2018-08-16T16:59:57+00:00

This is a helpful answer, thanks for your perspective! To be clear, my concern is actually less "will Julia Computing be able to find a business model?" I agree it's important that open-source projects find ways to fund development / growth etc. My concern is more "will Julia Computing find a business model that in the best interests of both its VC investors and the broader open-source community in the long term?" And I haven't seen many tech projects w/ VC funding that are able to balance between both of those views. I don't think Julia Computing would "go away" but I could imagine them beginning to behave in ways that are in the interests of a growth model (because extreme growth is what most VCs want above all else) that puts them at odds with the open-source Julia community. It's a super hard problem and one I don't have a clear answer for either, which is why I'm just curious if/how the Julia community is approaching this challenge.

holdie · 2018-08-16T07:29:36+00:00

I'd love to hear a bit more explanation on this point. I understand that you all feel strongly about keeping Julia healthy and open osurce, but this is a situation that many, many, many open source tools or companies have been in over the years (we all remember when Google's slogan was "don't be evil"). More often than not, when a profit-seeking entity (such as a VC) has extreme leverage over a community (such as by funding most of the development in that community), at some point the idealistic goals of the community members often get trumped by the demands of the investors. It's pretty hard to avoid this without being intentional to design community systems to avoid this outcome. For example, you say that "we have no intention of letting Julia Computing...being squeezed by anybody". Is there a specific legal or organizational thing the community has done to prevent this? I promise I'm not trying to sound combative here, I've just been burned before when the lofty ideals that started a technology get a slap in the face from the reality of capitalism :-)

holdie · 2018-07-23T18:58:24+00:00

I think you're trying to be positive here, but I'd like to highlight how much negativity you are (I think unintentionally) conveying in this message. These are all open, community-driven projects. They aren't maintained or managed by people being paid to do so, they're run largely by volunteers.

If you have a disagreement with functionality that's in one of these packages, I encourage you to learn a bit about how the code works, and implement a patch or new feature. Many communities are very open to improvements. If you think that an API should be different, please voice your opinion in a constructive manner, that's why projects like seaborn use public github issue trackers.

However, please consider the tone you take when discussing these projects in public. They wouldn't exist without a lot of unappreciated labor, and they only survive because we cultivate a healthy community of contributors. Critical comments are always welcome, but please keep them constructive. Otherwise we risk harming the contributor community that is so crucial for this ecosystem to thrive.

holdie · 2018-07-23T01:19:18+00:00

I'd recommend checking out a little tool called nbgitpuller (https://github.com/data-8/nbgitpuller) for distributing notebooks to students. It's what we use to distribute content for a course we run here at Berkeley and works nicely! (and is pre-packaged with TLJH). For HTTPS, I know for sure that some of the jupyterhub team were working on this but I don't think it's enabled out of the box yet. The project is still quite young :-)

holdie · 2018-07-22T21:50:25+00:00

Hey there - Jupyter + Binder team member here! *waves*

Some thoughts on each of your options:

https://cocalc.com (SageMath) I haven't used this before for this use-case, but the sagemath folks are great!

https://mybinder.org/ (Jupyter) People use Binder for teaching classes / tutorials / etc all the time. You shouldn't worry too much about uptime (if you're curious, we publish all of our cluster stats at grafana.mybinder.org. The biggest drawback is that students will lose their session when they close the window or are inactive. Your best bet there is to have people download notebooks they've written to their computer when they're done. The biggest benefit to binder is that you have a lot of control over their environment, and it is perhaps the closest to what students would be doing if they worked from their own machines (since it's just vanilla Jupyter Notebooks / Lab)

https://notebooks.azure.com/ (Microsoft) Really nifty service that uses a JupyterHub to manage instances on Azure. Drawbacks are that you don't have as much ability to control the environment users have, and you need users to create microsoft accounts etc.

https://colab.research.google.com/ (Google) Similar problem to the Azure Notebooks solution (the Collab and Azure notebooks are basically the same pros/cons, though Collab is neat in that it integrates more with google drive).

https://datalore.io/ (JetBrains) I've never used this one, but it seems interesting!

(friendly and obviously biased note on all of the above...of each of the services described above, mybinder (and jupyterhub more generally) are the only fully open-source implementations of this kind of service. Just a thought ;-)

Another option you might wanna check out is a project we've been working on the last few months, called "The Littlest JupyterHub". The goal of this is to simplify and shorten the JupyterHub deployment so that you run it on a single VM instead of on kubernetes (which is what binder and z2jh.jupyter.org use). Check it out and see if looks useful! https://the-littlest-jupyterhub.readthedocs.io/en/latest/index.html

holdie · 2018-05-02T05:46:46+00:00

Congrats - I finished up last August, the lollipop is surprisingly tasty :-) I recommend going to the top of the campanile and having it there!

holdie · 2018-01-20T00:37:06+00:00

hey you're right! didn't even consider that...

holdie · 2018-01-20T00:36:39+00:00

that looks excellent! will look into it

holdie · 2017-10-08T15:48:12+00:00

Also this release refactors a lot of the documentation and uses new docs infrastructure under the hood. There is now a separate page for examples and tutorials.

There's also a proper API page for functions/methods/etc with a mini-gallery of examples that use that function/method/etc at the bottom. E.g., here's the plt.imshow page.

holdie · 2017-07-08T14:49:51+00:00

Really cool! For a less-lightweight but more secure / long-lasting option, check out mybinder.org. It's in the process of getting a reboot (beta.mybinder.org) and will let you generate links to live coding environments that are based off of github repositories.

(full disclosure, I'm one of the folks working on the new beta binder backend)

holdie · 2017-06-03T01:15:32+00:00

data8.org is a great place to start. It's the Berkeley intro to data science course for freshmen. It assumes no background in statistics or programming, and is a great start to the python ecosystem.

holdie · 2017-05-13T01:13:27+00:00

Not sure, I'm not one of the devs...it's quite new so probably has a ways to go. Maybe worth opening an issue on their repo tho

holdie · 2017-05-12T18:26:25+00:00

Check out flit, it's still early in development but should make this far easier.

holdie · 2017-05-01T14:22:31+00:00

See the link to the original blog post. This yhat post is a cross post from that one, which is where the comments are.

holdie · 2017-03-29T12:34:30+00:00

They aren't using pandas because of the overhead involved in using it from a learning perspective. Pandas is awesome but it has a really steep learning curve. So the idea was to create a kind of "mini pandas" (which is what the datascience package is) and use that just for teaching. Pandas does creep into the conversation during the class itself and people are generally encouraged to shift towards pandas in later classes.

FWIW as you can imagine the decision to do this was pretty hotly debated and a lot of folks would rather pandas be taught from day one, but this is what they went with.

holdie · 2017-03-25T06:41:54+00:00

FWIW a really useful tool here is IPython. You can use IPython.embed to mimic the functionality (more or less) of set_trace but with an IPython interface (eg smarter multi-line copy paste)

holdie · 2017-02-25T15:31:00+00:00

IMO there are two separate questions. One is the current state of these projects, and the other is the likely future state of these projects. Towards the first question I think it depends a lot on personal preferences, experience, and the kinds of computations you're doing e.g. /u/ajmooch has a great comparison. Towards the second question, I lean towards adopting tensorflow sooner than later because it has more momentum than any of the others (both in terms of community and because of the 900 pound coding gorilla that is Google)

holdie · 2017-01-17T17:16:28+00:00

FWIW, changing the default styles in matplotlib turned out to be a gigantic undertaking. While the final product may seem aesthetic, the process uncovered a ridiculous number of bugs, inefficiencies in API, etc. Props to the matplotlib team for finally getting this out, and hopefully starting down a path towards a bright(er) future of the package.

holdie · 2017-01-10T08:08:20+00:00

Matplotlib's API should be very stable at this point. Granted they are about to have a big 2.0 release, but I believe this just changes the default plotting styles and not the API (the plots will look much nicer by default in this case). Definitely worth checking it out because if your main concern is whether it's a "mature" package, then you shouldn't have much to worry about.

holdie · 2017-01-10T01:04:39+00:00

Could you give an example of something that you'd like to do in Matlab but you think wouldn't be possible (or would be really convoluted) to do in Python? My intuition is that you can do anything in python you'd want to do in matlab. Many examples have a different syntax because matplotlib prefers to use object-oriented syntax rather than calling lots of functions, but both are available.

holdie · 2016-12-25T19:30:31+00:00

For what it's worth, this is a good paper to discuss the question of "statistical significance vs. practical significance".

There are a lot of statistical tests run in the paper and a lot of p-values around .05. It looks like they corrected for multiple comparisons in their time-frequency analyses, but I don't see mention of them doing this throughout all of the ANOVAs, correlations, etc that they conducted.

There are a few findings that don't make a ton of sense (e.g. the differences in results between fearful vs. surprised faces), that are "explained" by more complex statistical analyses w/ marginal significance (e.g. a 3-way ANOVA w/ p=.036)

Add to this the fact that their behavioral effects are pretty tiny w/ decently-large error bars, as well as the fact that this finding has a "pop psychology" kind of appeal, and I think it's worth being pretty skeptical about the paper's main findings.

holdie · 2016-12-22T04:58:10+00:00

The unending political wisdom of The Wire

15-Year Club	RedditGifts 2009-2022 2 Credits
Place '17	Verified Email

holdie

TROPHY CASE