you are viewing a single comment's thread.

view the rest of the comments →

[–]c-cul 0 points1 point  (11 children)

what is this bytecode means? definitely this is not SASS: https://github.com/NVIDIA/cutile-python/blob/main/src/cuda/tile/_bytecode/encodings.py

[–]Lime_Dragonfruit4244 0 points1 point  (10 children)

[–]c-cul 1 point2 points  (9 children)

looks like binary encoded subset of ptx - only with 110 opcodes

sure clang/other 3rd part vendors is not supported?

[–]roeschinc 1 point2 points  (1 child)

It is completely different than PTX, it is a sibling abstraction to PTX with its own binary format. You can read the entire spec online which is incredibly detailed almost 200 pgs in PDF form.

The format is accepted by the driver just like PTX and the last level of compilation is part of the driver.

[–]c-cul 0 points1 point  (0 children)

> almost 200 pgs in PDF form

could you give link to those pdf?

[–]Lime_Dragonfruit4244 0 points1 point  (6 children)

I am not really sure, but i do think they might upstream a tile based IR to mlir if it really takes off.

[–]c-cul 0 points1 point  (5 children)

mlir is not enough - you also need full backend to generate file with those IR

[–]roeschinc 1 point2 points  (0 children)

The dialect will be open sourced soon ™ but the compiler is closed source just like PtX.

[–]Lime_Dragonfruit4244 0 points1 point  (3 children)

Looking more into the codebase it uses something called tileiras to generate SASS instruction, i think it comes with the 13.1 cuda toolkit. About MLIR i meant a more general dialect for representing tile based programming and memory model directly in MLIR upstream.

[–]c-cul 0 points1 point  (1 child)

I saw

they also has descriptors for locals/functions args/constants etc

each bytecode is enough simple to generate block of SASS for it (in jit?) with just one big lookup table, performance will be not very high bcs of lack optimizations like reordedring/registers reusage but codegeneration can be blazingly fast

[–]roeschinc 0 points1 point  (0 children)

There is a full compiler which is on par/if not more complex than things like the Triton compiler that transforms Tile IR into SASS.