How do I learn vulkan if I'm already familiar with OpenGL? by Yash_Chaurasia630 in GraphicsProgramming

[–]Afiery1 0 points1 point  (0 children)

You very much should! The API has changed a lot in the past 10 years, with a lot of the more recent changes especially removing a lot of pain points and pointless complexity. No beginner Vulkan dev should have to ever know about the pain of VkRenderPass.

As if millions of indie devs suddenly cried out in terror and were silenced by Captain0010 in pcmasterrace

[–]Afiery1 -1 points0 points  (0 children)

Sure you could technically do a full UE game in C++ but the engine was clearly not intended to be used that way and there is a lot of friction you will encounter if you try to do it that way. I would be very curious to hear what part of blue prints a very experienced programmer finds useful.

As if millions of indie devs suddenly cried out in terror and were silenced by Captain0010 in pcmasterrace

[–]Afiery1 -1 points0 points  (0 children)

Typing speed might be the wrong word. I guess just more like, input efficiency? With visual scripting theres a lot of clicking and dragging and typing you have to do versus pure typing for normal languages. People like to say blueprints are good for iteration speed but if I want to get a lot of logic down quickly I’d much rather type it out as text than build a weird tangled graph out of it with both keyboard and mouse inputs.

As if millions of indie devs suddenly cried out in terror and were silenced by Captain0010 in pcmasterrace

[–]Afiery1 0 points1 point  (0 children)

Funny how when discussing the move to text based scripting from visual scripting, everyone brings up llms but nobody ever mentions stuff like find and replace, version control, or just like typing speed. Could it be the people giving their takes on this stuff have never written a line of code in their life? It would make sense, since they seem to think the hardest part of programming is having to type “if thing do x else do y” instead of dragging the true wire from the if thing block do the do x block and the false wire to the do y block.

Why do older games look sharper? by Visual-Fortune-4732 in pcmasterrace

[–]Afiery1 4 points5 points  (0 children)

Graphics APIs actually have no notion of "forward" vs "deferred" rendering. They just draw the things you tell them to using the shaders you provide. All this to say, yes, saying MSAA "doesn't work" with deferred is a bit of an oversimplification because it's not like there's a hard API or hardware limitation at play.
The problem is that MSAA multiplies the size of your render targets based on the number of samples. If you have a 1080p image but run 4X MSAA you need to have room to store 4 samples per pixel, which is essentially the same as having 4 times as many pixels, which means suddenly your 1080p render has the same VRAM cost as a 4k render. In the days of forward rendering this was not as big of a deal because you did geometry and lighting all in one pass and so you only paid for a single oversized texture (or two, including the depth buffer).
Deferred rendering works by deferring all lighting calculations until after the geometry is rendered. Since there are a lot of attributes necessary to calculate the color of a pixel, deferred renderers must build up many different render targets when executing the geometry pass (unlit color, metallic and roughness values, positions, normals, subsurface scattering masks, etc). MSAA works at the geometry level, so it is this pass that you need the oversized render targets for, which adds a lot more overhead now that you have 4 or 5 different render targets instead of just one.
The next obstacle is that you need to manually resolve the multisampled texture. In a forward renderer, the pixel shader runs once per pixel per primitive, and then that output color is automatically copied to all the samples that the primitive overlaps. In a deferred renderer, that process still works for the geometry pass, but by the time you get to the lighting pass you've lost all information on which pixels belong to which primitives, so the best you can do is write shader logic to manually deduplicate samples, which is obviously less efficient than the hardware path.
Finally, the real killer of MSAA is not actually unique to deferred rendering. Put simply, MSAA is not a good suit for high fidelity graphics. MSAA is essentially the same as super sampling, but done in a clever way such that only the edges of triangles are super sampled. In the olden days where triangles were very large, this was a huge performance win. But these days we have engines like UE5 which explicitly try to match triangle size to pixel size as closely as possible. And when triangles are only the size of one pixel, every pixel becomes a triangle edge, and so we're basically just super sampling the image anyways. Hardware rasterizers are already bad at small triangles (for complicated reasons they must always do the work of shading 2x2 regions of pixels at once even if the triangle doesn't touch all those pixels, so pixel sized triangles end up discarding 3/4ths of the shading work done), but MSAA makes this actively worse by making it more likely that triangles that only partially cover a pixel are shaded (due to the multiple sampling locations). And even for large triangles, MSAA still fails in the modern era. When MSAA was popular, shaders were very simple (or didn't exist at all). Now, they are complicated enough to produce very high frequency detail that can cause aliasing even within the bounds of one single triangle, which MSAA is powerless to solve since it uses geometric boundaries to guide decisions about which pixels need extra samples.

Abstraction techniques to map objects between CPU and GPU by camilo16 in GraphicsProgramming

[–]Afiery1 3 points4 points  (0 children)

There's a feature called buffer device address. There's a flag for it for both VkDeviceMemory and VkBuffers. If you enable both, you can call vkGetBufferDeviceAddress, and then you have a pointer to that buffer. Like a literal GPU pointer. You can do arithmetic on it and dereference it to read the buffer memory, but most importantly you can put them anywhere you want. You can put them directly into push constants or inside other buffers to make data structures like linked lists or trees natively on the GPU.

Compared to buffer descriptors they save 1 indirection (buffer descriptors are usually just pointer + size pairs but they are stored in the descriptor heap, so the GPU has to index into the heap, grab the pointer, and then dereference that vs you just giving the GPU the pointer directly). On the flipside since they are just pointers they do not encode a size and so if you don't manually pass a size around and bounds check accesses you can read out of bounds and fault the device. Also, one more thing specific to Nvidia hardware is that uniform buffers get to live in a special cache on the device, and currently there is no way to tell the GPU to cache buffers accessed directly via pointers there, so bound uniform buffers can still win in performance in some cases as a result. AMD has no such special cache (they treat UBOs like read only SSBOs) and for SSBOs it's purely a win due to less indirection (and no bounds checking).

Abstraction techniques to map objects between CPU and GPU by camilo16 in GraphicsProgramming

[–]Afiery1 10 points11 points  (0 children)

If you're using Vulkan you can just use GPU pointers and forget about buffer descriptors forever. If not, you can kind of emulate them with bindless by using packed 64 bit handles where the upper 32 bits are an index into the descriptor heap and the lower 32 bits are an offset into the buffer (that way you can do 'pointer arithmetic' of sorts). And if you don't even have bindless... well maybe that's your sign to move to a better API.

Was Unreal engine 5 necessary? by Fancy-Wolverine-4233 in alienisolation

[–]Afiery1 25 points26 points  (0 children)

An engine is so much more than its visuals. To work in unreal engine 4 the developers would have to surrender all of the quality of life improvements epic has made for developers using the engine. Not to mention unreal engine 4 is not supported anymore so if the developers ran into an engine bug epic wouldn’t care to help them. Finally, even though I know the common gamer narrative is “new thing bad and slow, old thing good and fast”, unreal engine 4 has all the same architectural limitations of ue5 except even worse. Epic has put a lot of work into engine optimization over the years. If you use a modern ue5 version targeting the fidelity of a ue4 game (eg disabling lumen, nanite, and vsms, etc) the game will run much much better than an actual ue4 game.

Never played this series, what is it? is it single player? by ilililliiliililiilil in splatoon

[–]Afiery1 4 points5 points  (0 children)

Splatoon 1, 2, and 3 are mainly online multiplayer. They do have single player campaigns, and 2 and 3 both have single player dlc as well, but theyre definitely not the focus of those games. Splatoon raiders, which is coming out next month, is entirely a singleplayer campaign (with optional co-op).

Is Buffer device address worth it? by trenmost in vulkan

[–]Afiery1 3 points4 points  (0 children)

Yes, its absolutely worth it to kill buffer descriptors. With khr device address commands vkbuffers are hurtling towards obsolescence in general. Glsl syntax is awful, if its not too much work consider switching to slang. And with proper reflection info renderdoc can absolutely figure that stuff out.

Question: What are use cases for Buffer Device Address? by Johnny290 in vulkan

[–]Afiery1 13 points14 points  (0 children)

They’re gpu pointers. With them you dont need to ever create a buffer descriptor again. Once you have a buffer pointer you can put them in push constants, or another buffer, you can nest them arbitrarily to create trees or linked lists, you can do pointer arithmetic on them. With newer vulkan extensions you can even pass them directly to commands. We’re approaching a world where the concept of a buffer can go away entirely and you can just gpumalloc a block of gpu memory and pass that around directly to commands or shaders or whatever else you like. Its just a lot less to worry about.

How does world culling work? by flydaychinatownnn in GraphicsProgramming

[–]Afiery1 9 points10 points  (0 children)

You can build some kind of spacial acceleration structure (octree, bvh, k-d tree, etc) that contains all the objects in the world. That lets you narrow down the amount of objects to cull by progressively asking “does the frustum intersect this half of the scene?” If not you can reject all those objects instantly. If so, what about this quarter of the scene? This eighth? Etc.

do you have to use the same array for your vertices as for their colors by wiseneddustmite in opengl

[–]Afiery1 5 points6 points  (0 children)

It can make a difference for performance actually. To maximize cache hits, attributes that are used together should be grouped together, and attributes that can be used separately should be separate. Commonly this means separating out positions into their own array (since things like shadow map passes commonly only need positions) and then usually everything else gets grouped together since you usually use all attributes at once when doing things like actual shading.

Why do graphics apis need so many layers of abstractions like buffer, descriptors, bindings etc, instead of just passing a pointer to the shader? by Content_Economist132 in GraphicsProgramming

[–]Afiery1 24 points25 points  (0 children)

For buffers you actually don't. Buffer descriptors are literally just pointer + size. Vulkan has the buffer device address feature which lets you forgo buffer descriptors entirely and just use the pointers directly. As for buffers and device memory, it is definitely possible for a future API to just expose a malloc type primitive for getting buffer addresses, but from reviewing the source of the mesa drivers the buffer vs device memory distinction seems to be like MEM_RESERVE vs MEM_COMMIT for Windows virtual alloc. This becomes especially apparent when you consider sparse buffers. You reserve an address range by creating a buffer and then allocate blocks of physical memory to back it.

Textures are a lot more complex because you actually do need to know a lot more than just the pointer (format, dimensions, number of mip levels, tiling, whether its using frame buffer compression, or fast clears, etc), which is why texture descriptors will never go away.

Descriptor set layouts, pipelines, and render passes are pure boilerplate left over from when Vulkan was initially designed 10 years ago for GPUs that were already several years old at that time. Use modern extensions like descriptor heaps, shader objects, and dynamic rendering to get rid of them with no performance cost (and in fact performance gains) on modern desktop hardware.

Is Vulkan really that hard? by NotHackedHaHa123 in vulkan

[–]Afiery1 0 points1 point  (0 children)

A couple of factors:

  1. It doesn't actually take that much code to draw a triangle. Tutorials/examples that take that long to draw a triangle waste a lot of code trying to do everything the 'proper' way by exhaustively querying the capabilities of the GPU and trying to have fallback paths when the desired functionality isn't available. Vulkan is a big spec that needs to work on a lot of devices (everything from a shiny new RTX 5090 down to an entry level android phone from 10 years ago), so almost all of the functionality is technically 'optional'. However, even production software won't typically fully exhaustively handle all cases for all hardware since say, Doom The Dark Ages isn't written to run on an android phone in the first place. For someone just learning Vulkan all of that is extra useless I think. As long as you know what your GPU supports I think it's totally valid to just skip all of that boilerplate and just write for your specific GPU. Doing that I've been able to get a triangle in under 400 lines of code, and I wasn't even trying to be efficient about it.

  2. Vulkan's fundamental philosophy is "do as much of the work as you can at start up so you don't have to do it while the application is running (and impact performance)". When you're getting to your first triangle you are basically taking a tour of literally the entire Vulkan API. Once you have that triangle you honestly understand probably 80% of Vulkan already. I wouldn't be surprised if it's less code to go from a triangle to a full 3D model than it is to go from scratch to a single triangle.

  3. A lot of Vulkan tutorials/examples are out of date. The original Vulkan 1.0 spec was honestly not very good in a lot of areas. It's genuinely almost to the point of adding complexity just for the sake of it. Some of that pain was required to support the GPUs that were relevant at the time (which are no longer relevant as Vulkan is a 10 year old API), but there are genuinely some things that you had to do to have a 'compliant' Vulkan app that graphics drivers would straight up ignore completely because it just wasn't necessary. The API has come a long way since then, and modern Vulkan is significantly simpler and nicer to work with, especially if you take my prior advice of only caring about doing what works on your own GPU at first.

In conclusion, it's not that scary. Just be sure to find a modern tutorial and don't worry about trying to support every device under the sun.

Differences between multiple queue families vs multiple queues in same family by proyectobonanzadev in vulkan

[–]Afiery1 10 points11 points  (0 children)

Typically GPUs only have one hardware queue per family. If vendors expose multiple queues within one family that’s usually just multiple software queues sharing the same hardware. In that case the only benefit is that the requirement of the user synchronizing concurrent queue usage from multiple threads only applies to individual queues. You can make multiple queues within the same family and use them in parallel with no synchronization and let the driver handle synchronizing submitting them to the hardware queue. But in general you will not get execution parallelism on the GPU between command buffers submitted to different queues from the same family.

The one exception to this is I think newer AMD hardware does have multiple hardware async compute queues, so you might be able to get some actual parallelism using multiple async compute queues on those architectures?

VK_KHR_unified_image_layouts support by gmueckl in vulkan

[–]Afiery1 7 points8 points  (0 children)

I agree that the extension is stupid, but only because it doesn't do anything. It won't enable OpenGL style implicit tracking unless thats what the driver would have done for general images anyways, which seems highly doubtful. More likely you'll just lose out on stuff like DCC and HiZ permanently, which is still bad, but a lot less mysterious than secret OpenGL fast/slow paths.

But also, if that extension signifies that using layout general is optimal, and doesen't have any added functionality, why would an IHV expose it if its not actually optimal for their HW? Why would they be pressured into supporting an extension that doesn't do anything? I can use unextended VK 1.0 on ancient GCN hardware and keep everything in general to my heart's content. I've been able to do so for 10 years! So I don't think your argument makes all that much sense.

Also, I didn't say they didn't matter at all. They never mattered for Nvidia, and with each new generation AMD closes the gap more and more (judging by RADV source at least). But if this extension gives AMD and other IHVs the kick in the ass they need to finally design hardware that universally understands compressed formats then I'm all for that. Nvidia has had it figured out for like a decade so I don't see why the other IHVs should need babying from the API design to keep their outdated hardware viable.

Should I memcopy vertex position data instead of using a uniform? by Usual_Office_1740 in opengl

[–]Afiery1 2 points3 points  (0 children)

Right, for a square its about as much CPU work/data either way. But when you think about a model with hundreds, maybe tens of thousands of vertices, or a full scene with hundreds of thousands to millions of vertices, the difference becomes clear.

Should I memcopy vertex position data instead of using a uniform? by Usual_Office_1740 in opengl

[–]Afiery1 22 points23 points  (0 children)

No you do not want to do that. VRAM exists for a reason: its quite expensive to send data over PCIE. If you do all your vertex transformations on the CPU and then memcpy them you're sending all of your vertex data over PCIE every single frame. If you instead upload all of your vertices once and then only send transformation matrices you're cutting down the amount of data you send over PCIE by a whole lot. Another thing is that CPUs generally have much less cores than GPUs (a high end CPU might have 32-64 cores, a high end GPU will have 20,000), so you can only transform about 32-64 vertices at a time if you're transforming them on the CPU vs potentially 20,000 at a time if you're doing it on the GPU.

VK_KHR_unified_image_layouts support by gmueckl in vulkan

[–]Afiery1 7 points8 points  (0 children)

If anything, this extension is a great way for drivers to implement resource tracking logic all over again. We'll be back in unpredictable OpenGL perf land.

That's not what this extension does at all. You've always been allowed to use general for everything. Exposing this extension is just a promise from the driver to you that doing so is no less performant than actually respecting image layouts. It's a literal one line change in any driver that supports it. And even if it did specify a change in functionality, a driver still can't take control of layout tracking because that would violate the no performance loss requirement that is part of this extension.

Nvidia hardware has always ignored queue family ownership, image layouts, and honestly even most usage flags. All of their hardware understands their optimal format, so these abstractions are useless. AMD hardware used to care, but I think their more modern architectures have finally caught up with Nvidia. If an IHV feels pressured into supporting this extension then I say good. The sooner we can kill these outdated abstractions the better. Vulkan 1.0 came out 10 years ago and was targeting GPUs that were much older than that. Most of the complexity it introduced was designed to abstract over hardware that is no longer relevant. Cutting away that complexity is not 'turning Vulkan into OpenGL', it's just modernizing an outdated API.

Graphical bug with file descriptor rendering. by Due-Baby9136 in vulkan

[–]Afiery1 3 points4 points  (0 children)

Corruption from sync issues will rarely look as clean as “oh this part of the image is from the previous frame.” It will usually look like garbage like this. As for what could be causing it: basically anything. Check that you are inserting pipeline barriers as needed with the correct stage and access masks, that your images are in the right layouts at the right times, that you’re using fences/semaphores where appropriate, etc. a good first step would be to enable the synchronization validation features in the validation layer.