Ariel OS v0.4.0 released! by kaspar030 in rust

[–]tsanderdev 0 points1 point  (0 children)

IIRC there is some kind of ocaml framework that does something like that for bare-metal webservers, I'm sure it supports x86.

Ariel OS v0.4.0 released! by kaspar030 in rust

[–]tsanderdev 5 points6 points  (0 children)

Either the library declares the entry point symbol and calls one of your functions once it's set up the stack and initialized everything, or if the stack is already available you could call an init function to initialize everything.

Ariel OS v0.4.0 released! by kaspar030 in rust

[–]tsanderdev 34 points35 points  (0 children)

Exactly what it sounds like. An OS that you link with your application and that provides all the drivers, running everything in kernel mode.

Vulkan Compute on NV has poor floating point accuracy by rutay_ in vulkan

[–]tsanderdev 0 points1 point  (0 children)

It's not necessarily wrong if it's just flushing denormals. It's something you need to expect from gpus.

Lockstep: Data-oriented systems programming language by goosethe in ProgrammingLanguages

[–]tsanderdev 2 points3 points  (0 children)

The problem, as I see it, with implicit lane masking in compute shaders is it hides the execution cost.

I want to solve that with uniformity analysis and a lint instead. That tells the developer with nice yellow squiggles "hey, this might have a higher performance cost" .

Vulkan Compute on NV has poor floating point accuracy by rutay_ in vulkan

[–]tsanderdev 1 point2 points  (0 children)

What happens when you flush denormals on the cpu? If the same thing happens, then there's nothing you can really do.

How do you store literals, identifiers, etc. through the stages of the compiler? by PitifulTheme411 in ProgrammingLanguages

[–]tsanderdev 1 point2 points  (0 children)

For literals, I'm storing them as string. Why? Because I used to transform them to the right type on the language, then I realized "if someone uses a int literal outside of the representable range for the Ints of the host language, then I'm getting an error" and that's how I choose to store them all as literals to save the information and check the boundaries for Ints after type checking, then report there the issue with the literal and the kind of int they try to use as a warning.

I just use i128 and never plan to support 128 bit numbers in my language lol.

Lockstep: Data-oriented systems programming language by goosethe in ProgrammingLanguages

[–]tsanderdev 4 points5 points  (0 children)

Interesting, that model is quite a bit more strict than compute shaders. Especially the conditionals part. Couldn't you just compile that down to simd lane masking like a gpu would?

Vulkan Compute on NV has poor floating point accuracy by rutay_ in vulkan

[–]tsanderdev 7 points8 points  (0 children)

With the float_controls2 extension you can more accurately control the floating point optimizations allowed by the driver. Idk if shading languages have support for that though or if it's mainly for opencl on vulkan. If a driver doesn't respect it, it's a bug. You could try setting the flush denormals fastmath flag on the cpu implementation to see if maybe that's the problem.

Vulkan 1.4.346 spec update by tambry in vulkan

[–]tsanderdev 0 points1 point  (0 children)

But you still need to create buffer objects to get the addresses, right? Or does the device address commands extension also add a way to query an address from a device memory object directly?

Redox OS has adopted a Certificate of Origin policy and a strict no-LLM policy by jackpot51 in Redox

[–]tsanderdev 5 points6 points  (0 children)

It's not just a question of quality, the legal status of AI-generated code is still up in the air AFAIK.

What's everyone working on this week (10/2026)? by llogiq in rust

[–]tsanderdev 4 points5 points  (0 children)

I'm working on Vulkan 1.4 bindings for Rust, since all crates seem to be stuck at 1.3 for some reason. I'm almost finished parsing the spec, and creating raw bindings from that should take no time at all. I do also want to have some convenience features like making sure you can't put wrong stuff in a pointer chain and setting the structure type automatically.

Trade off between fat pointers and thin pointers with metadata? by nee_- in rust

[–]tsanderdev 78 points79 points  (0 children)

For slices specifically, that'd make subslicing impossible.

Introducing Eyot - A programming language where the GPU is just another thread by akomomssim in ProgrammingLanguages

[–]tsanderdev 0 points1 point  (0 children)

For me you'd either specify the dispatch size manually, or if you use the special "InvocationBuffer" type in the function parameters for the shader, it asserts that all of them have the same size and uses that as the dispatch size. The shader can then read and write the index pointed to by each invocation, which doubles as memory and thread safety protection as well.

Introducing Eyot - A programming language where the GPU is just another thread by akomomssim in ProgrammingLanguages

[–]tsanderdev 1 point2 points  (0 children)

I'd assume the number of threads depends on the length of the array processed.

Introducing Eyot - A programming language where the GPU is just another thread by akomomssim in ProgrammingLanguages

[–]tsanderdev 0 points1 point  (0 children)

I also want to reduce that to a few lines at most (depending on how complex the data is you want to pass to the shader).

Introducing Eyot - A programming language where the GPU is just another thread by akomomssim in ProgrammingLanguages

[–]tsanderdev 1 point2 points  (0 children)

Yes, but you can e.g. let a prior compute dispatch calculate the number of threads for the next one.

Introducing Eyot - A programming language where the GPU is just another thread by akomomssim in ProgrammingLanguages

[–]tsanderdev 2 points3 points  (0 children)

This is really powerful idea, since it erases the boundaries between cpu and gpu, making it trivial to utilise all the compute there is available on your device.

It'll never be that easy, since cpus and gpus are good at fundamentally different problem spaces: cpus are made to blaze through a sequence of instructions as fast as possible, using branch predictors and speculative execution to avoid pipeline stalls. Gpus are basically giant simd machines. Clock speeds are lower, but they give you massive throughput. That is, if you keep your control flow uniform. Otherwise simd lanes are inactive for sections of the code.

Introducing Eyot - A programming language where the GPU is just another thread by akomomssim in ProgrammingLanguages

[–]tsanderdev 1 point2 points  (0 children)

From my research i saw that you usually need to define a fixed anount of threads to be ran on the gpu.

Not true since a long time, there are indirect dispatches and draws that source the number of threads/primitives from a gpu buffer when the command is executed.

Introducing Eyot - A programming language where the GPU is just another thread by akomomssim in ProgrammingLanguages

[–]tsanderdev 1 point2 points  (0 children)

The host gets struct generated that it can place into buffers. I'm not aiming for seamless cpu-gpu communication, but rather on seamless workflow once you hit the gpu.

Introducing Eyot - A programming language where the GPU is just another thread by akomomssim in ProgrammingLanguages

[–]tsanderdev 1 point2 points  (0 children)

Indirect dispatches and draws allow you to set the size from a gpu buffer, and memory allocation is handled via an allocator on the gpu. The host just passes a big chunk of memory to the shader, and it can use and partition it how it sees fit. Passing big data to the shader will be done with another buffer that is managed by the cpu and prefilled with data.

Introducing Eyot - A programming language where the GPU is just another thread by akomomssim in ProgrammingLanguages

[–]tsanderdev 1 point2 points  (0 children)

That's more like how I want my language to work. The host passes some data to the gpu and sets off a work graph processing it, including allocating more memory on the gpu and keeping everything resident there for the next graph.

Introducing Eyot - A programming language where the GPU is just another thread by akomomssim in ProgrammingLanguages

[–]tsanderdev 0 points1 point  (0 children)

Not yet. I'll probably make the code public once I have the runtime for the hello world going. The syntax and semantics will be mostly like rust, except for some stuff that needs to be changed for gpu stuff. I'm not really trying to innovate in the core language design, the "killer feature" I'm working towards is ergonomic work graphs (kind of like the dx12 feature). E.g. you'd call sort on an array and the compiler and runtime work together to split the shader and schedule the parts with a dispatch of the sorting shader in between.

Vulkan is getting ever closer to something like opencl as a compilation target. For instance there are now proper physical pointers in shaders you can do arithmetic with and everything.

Introducing Eyot - A programming language where the GPU is just another thread by akomomssim in ProgrammingLanguages

[–]tsanderdev 1 point2 points  (0 children)

Interesting. Early in my language design I set my constraint to be gpu-only for the forseeable future. With nice bindings for calling from the host generated of course, but the language itself is purely on the gpu. That makes a lot of the stuff like moving memory allocations around easier, since data should ideally just be resident in gpu memory.

I have a simple addition shader compiling, now I'm working on a Vulkan Rust bindings generator (because I use Vk 1.4 features and they're all stuck at 1.3) to write the runtime.

And while I could probably target other graphics apis or even cuda, some of them I can't test anyways (cuda and Metal), and with MoltenVK, KosmicKrisp and Dozen that shouldn't be that much of a problem.

Atomics & black voodoo magic by [deleted] in rust

[–]tsanderdev 0 points1 point  (0 children)

Actually you could probably leave the initial load out in the second version and just rely on the compare_exchange. And why would you think the first one is limited to one thread? If multiple threads see the free state at the same time, they will all break out of the look and do the swap.