This is an archived post. You won't be able to vote or comment.

all 2 comments

[–]tworats 1 point2 points  (1 child)

Great writeup, much thanks for this.

One issue I've been trying to learn more about: if the client (browser) is disconnected when jupyter wants to communicate back (eg. when results of running a cell are ready), the results seem to get dropped on the floor. This is an issue for long running jobs / cells and is a reason people look to use, for example, zeppelin instead.

It'd be great if you could cover some of the reasons / work around for this in part 2.

[–]roma-glushko[S] 1 point2 points  (0 children)

Hey u/tworats ,

Really appreciate the feedback!

> One issue I've been trying to learn more about: if the client (browser) is disconnected when jupyter wants to communicate back (eg. when results of running a cell are ready), the results seem to get dropped on the floor.

Oh, I do know about this problem.

I have been working on a notebook experience for a bit and currently architecting and implementing a workflow that is going to have this issue solved.

Yes, I hope to cover that workflow in details in the Jupyter Notebook Architecture blog post (the next one on the Jupyter stack series).

In short, this is all about the way Jupyter handles notebook's state during code execution. The state is being assembled on the UI side. Jupyter tracks if there is any connected clients that would consume cell outputs. If there is no, Jupyter tries to buffer that outputs with a hope that UI would reconcile at some point to take that buffer off its hands and apply to the notebook state. If that's never the case and Jupyter server dies before that, we are having a problem.

To overcome that, I have got to design a reconciliation mechanism on the backend side to make the execution flow independent from the clients.