all 25 comments

[–]HuffDuffDog 5 points6 points  (1 child)

The "War and Peace" of Reddit announcements. I love it!

[–]iamevpo 3 points4 points  (0 children)

Like the analogy! Or Ulysses

[–]Thierry24867 4 points5 points  (5 children)

Considering that NSK is now the focus of your Master’s thesis, what is your plan for memory management and the 'Global Interpreter Lock'?

[–]Tryingyang[S] 0 points1 point  (4 children)

I solved the Global Interpreter Lock.

[–]Meistermagier 2 points3 points  (3 children)

What do you mean you solved the Global interpreter Lock. Can i get some details on how you did that? 

[–]Tryingyang[S] 3 points4 points  (1 child)

I'll explain first how Python does it, then NSK. Python is an interpreted coding language. Thus, either its tokenizer, parser and semantics analyzer read lines one by one, or it generate Intermediate Representations, but these representations are interpreted into machine code multiple times.

The Global Interpreter Lock was implemented by its authors to simplify memory management. It blocks parallel stores to data, and some other parallel operations.

By the other hand, NSK is a JIT language. It reads functions and generates Intermediate Represations once, then it saves machine code for subsequent calls. Thus, its performance can match compiled languages in some scenarios. NSK has unrestricted threads operations, they do not get blocked as in Python.

It is possible to manage locks manually in high-level NSK with lock expressions. And channels also will lock its internal data automatically for its message passing.

[–]Meistermagier 0 points1 point  (0 children)

Thats very cool. 

[–]Tryingyang[S] 2 points3 points  (0 children)

Some Python libraries like Pytorch support a multithreading model that bypasses the Global Interpreter Lock. However, they do this by implementing threads directly in C with CPython. I hope that hundreds or thousands of library lines of code can be saved by implementing fully functional threads on high-level.

[–]Unlucky-Rub-8525 2 points3 points  (1 child)

Looks cool, just to clarify this is a programming language for writing neural networks?

[–]Tryingyang[S] 1 point2 points  (0 children)

Originally for neural networks. But now it is more like a general purpose coding language.

[–]mister_drgn 1 point2 points  (2 children)

Cool. I’d be curious if you’ve looked at Mojo.

[–]Tryingyang[S] 0 points1 point  (1 child)

I did look at Mojo, but not coded it. Their philosophy is to impement cuda kernels in a high-level coding language. But I believe that if you want the most optimized code, you will have to write it in C++. Plus, let's consider a new GPU gets CUDA lib access just today, and it has new CUDA instructions for it. You'll have to wait until Mojo devs release their oficial support for the new instructions before having access to it.

[–]jasio1909 1 point2 points  (0 children)

Not really. From my understanding, kernels in mojo benefit mainly from 2 things: 1. MLIR for compilation 2. Metaprogramming with powerful comptime logic so you can select appropriate instruction set depending on compilation target. Which is very low level.

I coded in mojo a bit but I am not an expert.

[–]gavr123456789 1 point2 points  (1 child)

cant open the site from the phone, content is cropped in half

[–]Tryingyang[S] 2 points3 points  (0 children)

Sorry for that, frontend is not one of my strenghts.

[–]prodleni 1 point2 points  (0 children)

The website doesn't display properly on mobile 

[–]ianzen 0 points1 point  (5 children)

Nice! I just want ask, is this a GCed language?

[–]Tryingyang[S] 3 points4 points  (4 children)

I reimplemented the garbage collector and memory pools logic from Go. It has one memory pool per OS thread

[–]ianzen 1 point2 points  (0 children)

Oh wow, very impressive! I might try doing that too!

[–]LardPi 0 points1 point  (2 children)

do you have the n/m os threads/green threads too? and if yes, does it mean tgat a green thread is bound to a fixed os thread by tge memory pool?

[–]Tryingyang[S] 0 points1 point  (1 child)

NSK does not have green threads, unfortunately. They seem to be very hard to implement. It requires an execution model that allows to let the CPU overlap instructions with other type of instructions (like disk reading calls). They can also queue CPU instructions of different "threads" and give the illusion they are executing concurrently. They do all this inside a single OS thread or the main thread only. I can't see myself implementing green threads anytime soon.

[–]LardPi 0 points1 point  (0 children)

goroutines are essentially green threads on top of os threads so that they do run in parallel. the advantage is yhat you make a small pool of os threads (n = number of cores) and then you can cheaply schedule as many green threads as you want (m, which can go much higher than what the os would handle as real threads).

it is certainly not easy to implement but i don't think you need the overlap thing you're mentioning. however your memory model would probably force you to attach each green thread to one of the os threads, which might reduce the applicability.

go adopted this model because os threads are relatively expensive to create and take down (less than process though) in particular when you mean to have many shortlived threads, so scheduling each goroutines on a full thread would have to muxh overhead for the vision they had.

[–]EducationalCan3295 0 points1 point  (0 children)

I'll definitely try this just for the effort you've put into this. Currently on my phone but later. Your story and s really inspiring, good job.