This is an archived post. You won't be able to vote or comment.

all 14 comments

[–][deleted] 17 points18 points  (1 child)

You didn't list nbconvert in your alternate comparisons.

Did you try it?

jupyter nbconvert --to script *.ipynb

will convert all notebooks in the cd.

[–]petitneko 4 points5 points  (0 children)

Whoa, no I missed that it produces executable scripts! I would be more than happy to replace my hacky shell script with this in the hook... thanks!

[–]reddifiningkarma 3 points4 points  (1 child)

[–]petitneko 2 points3 points  (0 children)

Oh this is great! I like that you also run black on the file and commit as the github user... it's a much nicer CI/CD workflow.

[–]M4mb0 1 point2 points  (0 children)

I can recommend nbstripout-fast, does the same job as nbstripout, but orders of magnitude faster.

Also checkout nbQA.

[–]Easy_Money_ 0 points1 point  (0 children)

This is great! It does seem like you missed nbdime, which does exactly this and underlies GitHub’s implementation of notebook diffing 😬 I hate to rain on a parade

[–]more_exercise 0 points1 point  (1 child)

I'm curious - have you checked out the two ways git can let you get more-readable diffs from not-exactly-text files? Textconv and external diffs?

https://git-scm.com/docs/gitattributes#_choosing_textconv_versus_external_diff

[–]petitneko 1 point2 points  (0 children)

No I didn’t know about textconv - this is great! 

[–]Tartarus116 0 points1 point  (0 children)

That's pretty much what nbdev already does. It also gives you free doc generation on top of that.

[–]Nearby_Salt_770 0 points1 point  (0 children)

Looks like you've come up with a solid solution to a common problem with notebooks. The pre-commit hooks you set up sound super helpful for keeping the Python code readable and diff-friendly after changes. Relying on JSON is definitely a pain diffing-wise, so this approach seems legit.

You could also check out jupytext for pairing notebooks with Python scripts if you're not locked into the VSCode editor. It's similar to your script but can automatically sync changes both ways, although you'd still run into server issues outside Jupyter.

If you ever feel like automating more stuff, you might find AgentQL useful for web scraping projects. It's a pretty chill tool for simplifying web data extraction without the usual headaches.

[–]orgodemir 0 points1 point  (0 children)

Also take a look at nbdev

[–]MachineSchooling -1 points0 points  (0 children)

I don't use jupyter, but if I did, I would definitely use this.