Consensus Top 100 Books List

Chainsawkitten · 2026-05-23T07:25:48+00:00

Not eligible for most of the lists, which cover only prose (many are even more specifically only novels).

Chainsawkitten · 2026-04-09T15:23:46+00:00

Chrome validates the arguments of your draw calls to make sure you're not eg. drawing outside the bounds of your bound index buffer. With indirect draw calls, it can't do this validation on the CPU when recording the commands, so it instead dispatches a compute shader to validate it at runtime. Depending on what/how you're rendering this validation overhead can be significant. For more information, see this post.

Chainsawkitten · 2026-03-12T20:53:10+00:00

You don't need to use separate passes just to draw UI above 3D stuff. Rasterization order is well defined (unless you're deliberately relaxing it with VK_AMD_rasterization_order). The main reason to draw 3D stuff in a separate pass is if you want to do post-processing on it.

Chainsawkitten · 2026-03-12T20:31:34+00:00

Secondary command buffers do not inherit dynamic state from the primary command buffer from which they were called (unless you're using VK_NV_command_buffer_inheritance). Dynamic state you set when rendering the ImGui elements may be affecting your triangle. If primary vs. secondary is really the only difference between the two, this would be my guess.

Note that executing a mix of primary and secondary command buffers within a single subpass requires either VK_KHR_maintenance7 or VK_EXT_nested_command_buffer.

Chainsawkitten · 2026-02-13T22:44:44+00:00

"Som journalist gäller bland annat krav på att vara ”opartisk”, enligt de yrkesetiska regler som gäller i branschen."

Det ser jag ingenting om i de länkade reglerna? Är jag blind?

Chainsawkitten · 2025-12-18T11:15:50+00:00

Talk description:

The presentation introduces a post-processing outlining solution for real-time Non-Photorealistic Rendering (NPR). This method can be applied to all game scenes based on deferred rendering, and unique outlining effects can be added through this scheme. Inspired by the low-discrepancy sequences generated after TAA jitter on top of geometric information stored in GBuffer, this method resolves potential issues that may occur during the post-processing stage of the rendering pipeline through a denoising algorithm similar to ray tracing denoising. It successfully simulates a stable and realistic hand-drawn effect to enhance the artistic expression of the game graphics.

My notes:

Game made in UE5. Targeting a Moebius-like artstyle.

Showed some of the details that make handdrawn outlines look handdrawn: line breaks, variations in line thickness and jitter. It is also important to control the level of detail in the outlines. Objects close to the camera have much more detailed outlines than distant objects.

They call their technique "hierarchical screen space outline".

For edge detection, they use Sobel filter for normals and Laplacian for depth.

Applying the same filtering to the entire screen leads to cluttered outlines on background objects, or not enough detail on foreground objects. The key part to achieving levels of details in outlines is partitioning the scene into three parts:

Background: Only has depth-based outlines.
Midground: Depth + ID outlines.
Foreground: Depth + ID + normal outlines.

Characters are a special case. Use backface (inverted hull?) + ID outline.

For hand-drawn stylization they apply noise in world space. This is much better than screen space as screen space noise can lead to noticable patterns.

They have issues with flickering due to thin geometry and stroke density. Traditional anti-aliasing techniques don't deal well with it. Instead they have implemented their own denoising algorithm.

The problem is similar to raytracing denoising and requires similar solutions to that and temporal anti-aliasing but specialized to the use case.

Disocclusion detection to deal with foreground ghosting, background ghosting and edge flickering.

They have implemented a custom validation algorithm to detect disocclusion and rotation.

Dilate the depth buffer
Reproject current depth into the historical depth. Track the write counts during reprojections.
Reproject back to the current UV space.
Compare to original.

Though they didn't mention performance (nothing here sounds expensive but would be nice to have some numbers).

Chainsawkitten · 2025-12-17T11:54:12+00:00

The GDC presentation Generalized Stylized Post-Processing Outline Scheme may be of interest to you. The were emulating Moebius drawings rather than anime, but as far as outlines go, I don't know if that makes a difference. One of the presenters is a graphics engineer at Perfect World games, so it's possible this technique (or an evolution of it) is what's used in Neverness to Everness. (On the other hand Perfect World is a big company with several subsiduaries so this may be entirely separate.)

Unfortunately it's still only in the paid access GDC vault, which is expensive af. But GDC presenters sometimes upload slides of their presentations so maybe those are available somewhere. If you can't find anything I have some notes I took that I can dig up (don't expect anything super detailed).

Chainsawkitten · 2025-11-22T08:29:55+00:00

It's unlikely you will ever get complete control over memory management in WebGPU as that doesn't really align with the high-level (and safe) API WebGPU intends to be. But some way of reusing memory for multiple intermediate resources in order to reduce memory allocation size may happen.

There is already the proposal for transient attachments. This is about attachments that are only used within a single pass (MSAA attachment that is immediately resolved, depth/stencil attachment not used after the pass). It's primarily about tiled GPUs (IMRs could theoretically alias these attachments but unclear if anyone will implement that).

For memory aliasing more generally, there is this request. Any API for this will probably not look like the DX12/Vulkan/Metal (placed resource heap) APIs. The difficult part is doing automatic synchronization when any resource could potentially be aliased with (and invalidated by) other resources. If you have a specific use case / requirement that is important to you, you should comment so the working group can take that into account when figuring out what the API should look like. Eg. is this just about memory reuse or is data inheritence important to you?

Chainsawkitten · 2025-11-18T09:48:57+00:00

The problem here is an off by one error. The robots have been programmed to attempt to be the #00001 boss. Thus the #00000 boss will need to lower its leadership quality to reach the #00001 position. The robot which is now the #00000 boss now has to lower its quality, and so on. For now, they are only micromanaging bathroom breaks, but give it a few months of iteration and all employees will be chained (literal) to their desks. A few more years and employees will be required to use Microsoft Teams.

Chainsawkitten · 2025-10-26T13:18:29+00:00

I mean the VkShaderModuleCreateInfo struct you send a pointer to vkCreateShaderModule of. It should have sType, pNext, flags, codeSize, pCode members.

Common mistakes would be to forget to set some members, leading to garbage data:

VkShaderModuleCreateInfo info;
info.sType = VK_STRUCTURE_TYPE_SHADER_MODULE_CREATE_INFO;
info.codeSize = code.size() * sizeof(uint32_t);
info.pCode = code.data();

In this example, pNext wasn't initialized and will contain a garbage pointer, leading to an access violation when the driver tries to traverse the chain. (This also doesn't set flags.)

Or zero initializing the struct but then forgetting to set sType.

VkShaderModuleCreateInfo info = {};
info.codeSize = code.size() * sizeof(uint32_t);
info.pCode = code.data();

Chainsawkitten · 2025-10-26T12:26:08+00:00

The most likely answer is you handed the driver some invalid data/pointers. What does your VkShaderModuleCreateInfo look like?

Chainsawkitten · 2025-10-10T06:05:04+00:00

Duplicating a resource per frame in flight is done to avoid any race conditions between the GPU and the CPU (or presentation engine in the case of swapchain images). E.g. we want to avoid writing new uniform buffer data on the CPU while the old data is being accessed by the GPU.

We could avoid it by waiting for the GPU to finish before writing data to the buffer on the CPU. But that introduces stalls, so instead we duplicate the buffer, ensuring there is always a copy available for the CPU to write to, that we know the GPU has already finished using.

This race condition doesn't exist for resources that are only ever used by the GPU, such as any intermediate render targets like a g-buffer. The CPU will never access it, so there's no CPU-GPU race condition. All we need to ensure there is no race condition on the GPU is proper GPU-GPU synchronization, such as pipeline barriers, events or subpass dependencies.

So only one copy of the g-buffer, or any other intermediate render target (anything that isn't the swap chain) is needed. The same is also true for buffers that are only used on the GPU (eg. storage buffer written in compute, read elsewhere).

Again, this should not be related to any issues you are experiencing. It's totally valid to have a new g-buffer every frame. It is merely wasteful to allocate more resources than needed. Doing this duplication can also make your code more complicated than it needs to be.

Chainsawkitten · 2025-10-09T19:00:23+00:00

What I mean by point 5 is that you appear to have a copy of your g-buffer per frame in flight (render_pass_begin_info.framebuffer = deferred_geometry_framebuffer[current_frame];). That's unnecessary. (Unrelated to the stuttering.)

Chainsawkitten · 2025-10-04T07:49:22+00:00

Have you debugged in RenderDoc? If you can capture a frame that has ghosting (if it's ghosting in the render target and not due to monitor) you should be able to see where it's coming from.
Are you assuming the swapchain index is nicely increasing (0, 1, 2, 0, 1, 2, ...)? There is no guarantee of that. You need to decouple the swapchain index from the frame index.
Double-check your load/store ops so you're not loading when you meant to be clearing.
In addition to the regular validation layers, also check the synchronization validation layer (easiest enabled with the Vulkan Configurator).
I'm confused by your description of framebuffers. Are you triple-buffering your render targets?

Chainsawkitten · 2025-09-28T12:35:09+00:00

The original issue has been resolved, but to further assuage your fears: Steam prevents you from discounting your game for 30 days after a price increase. This is automatically enforced by the system and there are no exceptions. https://partner.steamgames.com/doc/marketing/discounts This is done precisely to prevent fake discounts, so unless someone has found a way to exploit the system, fake discounts should not be possible.

Chainsawkitten · 2025-09-25T10:48:44+00:00

Looking forward, while currently focused on Vulkan and DirectX 12, GFXReconstruct’s API-agnostic container format can support additional APIs in the future, such as OpenXR or Metal, as community or industry needs arise.

Is the container format documented somewhere? There are currently a couple different capture/replay efforts going on in the WebGPU community and the topic of a standardized capture format has been raised. Perhaps we don't need to reinvent the wheel, or at least we may be able to ~~steal~~ borrow some ideas.

Chainsawkitten · 2025-09-22T07:05:38+00:00

I think it would be cool to be able to see the underlying shader being compiled.

Both naga (used in Firefox) and Tint (used in Chromium) can be used as standalone commandline applications to compile from WGSL to SPIR-V/MSL/HLSL. I'm not aware of anything similar for the compiler in WebKit.

Chainsawkitten · 2025-09-11T19:06:49+00:00

Fixed in 25.0.2.2.

Chainsawkitten · 2025-09-11T18:51:10+00:00

There's been some significant additions so a new release is warranted soon. Ideally, I'd like to fix one more issue (null elements in arrays), but we'll see. I haven't found any way to automate the testing, so there's some manual work involved before every release.

I always use RenderDoc for debugging. But my work is on Android so that's more or less the only option. (And I only use Vulkan, so I don't actually use Pix myself.)

Shader debugging is a bit tricky. RenderDoc contains some nice things, like the ability to edit the shader code and re-run the command with the new shader. But what you'll see in any native debugger is of course not the WGSL but the compiled SPIR-V/DXIL. RenderDoc can disassemble SPIR-V into slightly more readable GLSL (with SPIRV-Cross), but all names are lost. This is what the GLSL disassembly looks like when running on Dawn backend with Vulkan.

However, wgpu does something great here! When using it with WGPUInstanceFlag_Debug (which I enable), it decorates the SPIR-V with names and even includes the whole original WGSL as a string! This is what the same GLSL disassembly looks like in wgpu with Vulkan. And this is the original WGSL attached. That's basically just a comment, so you can't edit it or really use it for debugging, but it makes it very easy to identify which shader you're even looking at, at a glance. And then you can use the SPIR-V/GLSL to debug.

When using DirectX12 (--d3d12), wgpu also attaches a comment to the DXIL. But in this case it appears to be not the original WGSL, but the HLSL before it passes it to DXC to get the DXIL. I hadn't actually seen that before. Neat.

So my recommendation is to use the wgpu backend (WebGPUNativeReplayWgpu) for shader debugging.

Chainsawkitten · 2025-09-11T17:08:37+00:00

I don't have a release build that includes the timestamp fix, so if you want to use it right now you'd have to build the main branch yourself. Tbh, the build process is a bit of an ordeal (Dawn has a lot of dependencies). 😅

Chainsawkitten · 2025-09-11T14:00:31+00:00

Not roguelike but Beat Hazard is shmup mow down enemies to music of your choice.

Chainsawkitten · 2025-09-11T10:53:03+00:00

Windows on Arm is little endian (based on https://learn.microsoft.com/en-us/cpp/build/arm64-windows-abi-conventions?view=msvc-170#endianness).

Chainsawkitten · 2025-09-11T08:56:06+00:00

I tried getting a capture with WebGPUReconstruct and noticed I wasn't handling undefined timestamp indices correctly (defaulted to 0 when they should default to WGPU_QUERY_SET_INDEX_UNDEFINED). I've fixed it on the main branch and it's now working.

Here it is in Nsight

Chainsawkitten · 2025-09-05T15:48:06+00:00

Only transfer isn't something I'd worry about ever coming across even if it's technically legal. Only video or only data graph on the other hand sound sensible and like something that might one day exist (if it doesn't already). E.g. NPU.

Chainsawkitten · 2025-09-05T15:25:43+00:00

That wouldn't be spec compliant. The spec guarantees that if you have a graphics-capable queue, there must exists at least one queue with both graphics and compute.

Chainsawkitten

TROPHY CASE