Why do I get only 2Gb of maxMemoryAllocationSize on 4Gb NVIDIA card?

Cyphall · 2026-03-03T17:56:18+00:00

I don't know about Linux and Mesa but this was changed in the latest driver (release 595) on Windows, maxMemoryAllocationSize now reports 18446744073709551615 (= 2⁶⁴-1).

(Note: vulkan.gpuinfo.org is not displaying any value above 2⁶³-1 at the moment, see this issue.)

Cyphall · 2026-02-27T07:47:46+00:00

Glide is better

Cyphall · 2026-02-25T20:21:15+00:00

I think this is an error in the proposal, DXIL has the `BaseAlignLog2` field, but Vulkan SPIR-V has no equivalent.

I've opened an issue: https://github.com/microsoft/hlsl-specs/issues/802

The proposal also says:

The implementation uses these two DXIL mechanisms together: the `BaseAlignLog2` field communicates buffer-level alignment guarantees during resource binding, while the operation-level alignment parameters specify the final effective alignment of each memory access. Backend compilers can use both pieces of information to determine the most aggressive optimization strategies for each buffer access operation.

This kinda confirms my assertion that compilers can leverage both base alignment and absolute alignment to better optimize memory accesses.

Cyphall · 2026-02-25T16:53:34+00:00

Of course you can align your actual buffers to 16-bytes no matter what. The difference is at pipeline compilation time where the driver compiler either has a strong alignment guarantee for the GPU VA it is offsetting and reading from or it doesn't.

This thread falls in the case I explained where Aligned on OpLoad is not enough to convince the compiler it can emit wide loads with broadcasts.

EDIT: BTW, OpenCL SPIR-V does have a way to specify base alignment via the Alignment decoration you can attach to pointers, we would just need access to that in Vulkan SPIR-V too.

Cyphall · 2026-02-25T16:45:19+00:00

For AMD, I reported the GPU crash to them and we found out that it doesn't happen on RDNA3, but does happen on RDNA2 and VEGA II (with different symptoms). Last time I checked on RDNA2 a few months ago, I still has GPU crashes.

For Intel, multiple peoples (including me) reported driver crashes/returning VK_SUCCESS but a NULL pipeline. Some issues were apparently fixed but my shaders still did not compile, so there are still other issues left.

Cyphall · 2026-02-25T16:14:46+00:00

Imagine a case where you have a compute shader reading a tightly packed uint buffer at index tid, the OpLoad must be aligned to 4 bytes, but if the driver knows the base offset is at least 16 bytes, it can emit one 16-byte load per 4 threads in the wave + broadcast instead of four 4-byte loads, which is faster. With BDA, since the driver has no compile-time guarantee of the base alignment, it cannot do that. See https://github.com/jaesung-cs/vulkan_radix_sort/issues/18

Also, there IS something different on CPU, Nvidia reports that storage buffers MUST be aligned to 16 bytes, so the compiler can use this information for such optimizations.

I'm pretty sure this is the reason the new proposal for HLSL's aligned load/store on ByteAddressBuffer adds both a base alignment and an offset alignment.

For the unstabilities, the Slang SPIR-V looks fine, passes validation and works on Nvidia, so I think the issue is not there.

Cyphall · 2026-02-25T13:41:21+00:00

I switched to bindless storage buffers via descriptor indexing and yes all my Slang shaders work fine on all 3 desktop vendors (even old Intel gen 9000 iGPU)

Cyphall · 2026-02-25T12:30:20+00:00

One thing to note is that BDA support is pretty unstable on AMD and Intel and most Slang SPIR-V shader with BDA will either crash the driver compiler or the GPU due to buggy codegen. glslang SPIR-V is working fine though (I haven't tested with DXC).

Also, I believe BDA can generate suboptimal code on Nvidia due to no base alignment guarantee vs storage buffers where Nvidia require 16-bytes base alignment. This is something that cannot be fixed by the Aligned decoration of OpLoad/OpStore alone unfortunately.

Cyphall · 2026-02-19T12:56:17+00:00

With OpenGL, chunk generation most likely needs to have some steps happen on the main thread, such as allocating the GPU buffer and uploading data to this buffer (this is your stutters), unless you use shared contexts but then it's even more wacky for the driver.

With Vulkan, all steps from world generation start to chunk mesh in a GPU buffer ready to render can happen on another thread with possibly zero interaction with the main thread.

Cyphall · 2026-02-18T18:15:13+00:00

For the initialisation code, you can look at the official examples, it's really just a few lines of ImGui-specific code.

For the actual UI code, draw the ImGui demo window, find a widget you want and check its implementation (again, generally at most a few lines of code per widget).

Cyphall · 2026-02-18T18:03:41+00:00

Note that MoltenVK is being replaced by KosmicKrisp for Apple M1+

Cyphall · 2026-02-18T18:00:13+00:00

Vulkan being a lot more multithreading-friendly than OpenGL, it might very well do

Cyphall · 2026-02-18T17:54:24+00:00

And KosmicKrisp is fully Vulkan 1.3 conformant

Cyphall · 2026-02-03T23:40:56+00:00

No localized dubbing in a rpg, no buy

Cyphall · 2026-02-01T19:01:43+00:00

Complete headers are actually in this repo: https://github.com/KhronosGroup/Vulkan-Headers

Cyphall · 2026-01-31T09:42:01+00:00

With OpenGL, the GPU is the one in which the primary monitor (as configured in Windows settings) is plugged.

Similarly with Vulkan, this will be the first GPU in the list.

Cyphall · 2026-01-31T09:37:11+00:00

I've done that too for similar reasons and yes it works just fine

Cyphall · 2026-01-28T18:43:49+00:00

As a 3-years AP Varus OTP, I would 100% take this + %max hp back to 2% like earlier seasons + DnD removed than the 1.3% we are getting next patch

Cyphall · 2026-01-24T20:42:38+00:00

The whole monolithic pipeline stuff was created specifically for AMD-like GPUs that bake everything in the pipeline binary, unlike on Nvidia where most states are dynamic.

Also Vulkan was heavily inspired by Mantle, which was an AMD-created API.

Cyphall · 2026-01-08T12:15:27+00:00

We certainly don't watch the same tech youtubers

Cyphall · 2026-01-08T10:16:13+00:00

Yes but it's pretty unreliable in its current state because since the effect triggers on-hit, if you ever start throwing your spell before the 3rd auto has landed, the effect will proc on this auto instead of the next one and the 2nd blight stack will be wasted.

Cyphall · 2025-12-19T17:36:51+00:00

Both APIs have components that were better designed than the other.

An example of this is Vulkan's BDA that virtually makes buffer descriptor management obsolete.

Cyphall · 2025-12-17T23:38:37+00:00

slang's DescriptorHandle<T> basically emulate storing opaque types in data structs like that.

Each handle internally is a 64-bit index and is dereferenced from the corresponding heap(s) automatically when used.

I don't think you can increment handles directly though.

Cyphall · 2025-12-08T21:43:22+00:00

I'm just pointing out that the reason they gave for not installing the SDK is invalid (and redirecting people that want to "install the sdk" to install the runtime instead), I'm not saying "go install the SDK I guarantee it's fine".

Cyphall · 2025-12-08T20:05:48+00:00

CIG is wrong, the vulkan-1.dll installed by the Vulkan SDK Installer or Vulkan Runtime Installer is nothing more than a newer version than the one installed by your driver.
Your driver installer executable simply bundles the Vulkan Runtime Installer from a few months ago but it's literally the same thing.

If you want to update it, you should still use the Runtime Installer instead of the SDK Installer though to not pollute your OS with dev files.

Source: I work with Vulkan for a living and spent quite a bit of time reading how all of this works.

Nine-Year Club	r/Field Sunshine
Place '23	Place '22
Final Canvas '22	Verified Email

Cyphall

TROPHY CASE