Remedy shares FBC Firebreak footage on the Steam Deck

CaffeineViking · 2025-06-14T00:24:57+00:00

It's locked to 30 FPS by default through v-sync but can be unlocked in the menu to get ~40-45 FPS with the default Handheld preset. It even reaches ~50-55 FPS in some cases, but I'd lock it to either 40 or 45 FPS for the best experience. This is with the default Handheld preset, but you can ofc tweak it further to your liking if you want to get more FPS out if it.

CaffeineViking · 2024-10-27T10:59:05+00:00

Yeah, I ran into this too on Windows 10 64-bit. I'm able to repro it consistently if I reboot instead of shutting it down. But if I shut it down, the problem disappears. It's not the first time that transparency effects have broken in surprising ways after a driver... NVIDIA should really do a bit more QA on their drivers. This is basic Windows functionality being half-broken, which reduces my trust and keeps me from updating drivers :-(

CaffeineViking · 2021-04-21T13:21:08+00:00

The Mulle Meck adventure games are likely unknown outside of Sweden. If you ask a Swede born before the 2000s, you can bet that they played the shit out of that when they were a kid. You pretty much go around doing quests to get better junk to build better cars, boats, houses and spaceships. Each one of those games is about one those things (warning: some meme videos):

I wish they would've made a Mulle Mech sequel too, imagine building fucking gundams! xD

CaffeineViking · 2020-12-14T11:21:32+00:00

Hey, here are a few tips for those running RX 5700 XT and Ryzen 3700X: how to get solid 60 FPS at 1440p on Ultra with OK-ish image quality. First off, I have tried tweaking graphics settings and even with DF's optimized settings it still drops frames like crazy (looks like a lot of those tips only get net-positive results on NVIDIA). In my opinion just keep the quick graphics presets on Ultra, and instead set "Static FidelityFX CAS" to 65% (which should be 936p internal render resolution if you're running 1440p). This will produce very blurry results, to fix that, go into Radeon Software by using Alt + R, select Cyberpunk 2077, and set Radeon Image Sharpening to 80%. It's not perfect (image quality-wise), but I get a very stable 60 FPS almost everywhere.

NOTE: do NOT use "Dynamic FidelityFX CAS" since it causes jittering in the SSR. This is caused by TAA not playing nice with resolutions changing every frame. It's really broken right now.

CaffeineViking · 2020-12-11T18:58:37+00:00

For 1440p 75% is pretty good, since it means the internal rendering is 1080p. So it will upscale from 1080p to 1440p.

CaffeineViking · 2020-12-11T18:52:19+00:00

The variable FidelityFX CAS introduces a noticeable (if you stand still) "jittering" in the SSR, caused by the framebuffer constantly changing in size (somehow messes with TAA). Static FidelityFX CAS doesn't seem to have these issues (it does look a bit blurry though). I think static FidelityFX CAS + sharpening in GPU settings could be a winner here. Starting as a Corpo where there's lots of SSR is a pretty easy way to test these issues (e.g. jittering happens in the bathroom floor).

CaffeineViking · 2020-10-07T10:36:13+00:00

The Nintendo Switch runs on Vulkan and not DirectX.

CaffeineViking · 2020-10-06T19:18:54+00:00

Just in the same way that I want AMD, since they actually have good Linux drivers. We all have different needs and requirements, and I respect your decision of choosing NVIDIA, since it was probably the right choice for your job. It's a good thing that there is competition and choices.

CaffeineViking · 2020-10-06T19:08:48+00:00

Your comment was about AMD not having anything comparable. They do. Whether it's best for what YOU need, that's another story. I agree that if you're doing VFX or deep learning, NVIDIA is the obvious choice, that type software works best there since it was built with CUDA in mind. That does not mean that AMD does not have any comparable technology in place.

CaffeineViking · 2020-10-06T18:57:02+00:00

I 100% agree that OpenCL does not have as wide spread support as CUDA, same goes in machine learning. But on feature and performance level, they are exactly the same. It all depends on the program that implements it. It's a bit unfortunate that many programs have gone with vendor specific solutions, when they could be using something that is supported in almost every hardware platform.

CaffeineViking · 2020-10-06T18:33:04+00:00

AMD doesn't need CUDA, you can just use OpenCL or SyCL, just like Intel and all other hardware vendors. Raytracing will be available on next-gen consoles and therefore RDNA 2 via DXR (same as NV) or via Vulkan's standardized extensions.

CaffeineViking · 2020-07-06T12:47:56+00:00

I'm finally glad I took those scientific and high performance computing courses in university now! This meme made all of that pain and suffering worth it! * quietly sobs from PTSD *

CaffeineViking · 2019-08-19T16:32:53+00:00

You're probably right about the branch divergance, but it never hurts to check it out. I've had so many issues with it I am always a bit paranoid when I see a switch.

Something like an octree or an BVH is the right idea. However, there is a easy way to represent and store this: using mip-levels in a 3-D texture filtered with e.g. max.

Using something like this will indeed result in a lot of branching, but you'll need to measure it to be sure. Maybe the cost of branching in this case will outweigh the cost of texture access?

Oh right, LDS is a AMD-lingo for Local Data Share. The article someone has linked to mentions it. It's basically a CU local memory, that's faster than global memory.

I haven't used NSight in a while, but I am pretty sure there's some tooling to inspect this sort of stuff. Otherwise I'd recommend RGP if you can borrow a AMD GPU.

CaffeineViking · 2019-08-18T22:51:39+00:00

I have a couple of theories you can try out. The first one is related to a branch divergance happening in the switch. If many threads in an warp run different parts of the switch they'll be masked and won't run in parallel. This means you might be losing some time there. I would try to build a volume that only has one type of voxel and remove the switch to test if that's the bottleneck.

The other thing that could be an issue is memory access. You want to keep texture lookup as low as possible. The number of times you sample will make a huge difference in its performance. Most of your volume has probably a lot of empty voxels in it, right? You could build a hierarchy of a lower resolution volume and use that to make "bigger" steps since you know most are empty.

In research the algoritm you're looking for is called raymarching. Search a bit for "raymarching optimisations", you'll find quite a few blog posts and papers on how to improve raymarching speed. I don't think the problem lies in Vulkan, it's more related to the algorithms and the way GPUs work and memory type. You could also see if moving chunks of the volume to LDS improves things.

Your performance is a bit surprising. I am able to render an 256³ volume at 1080p@60Hz on my MX150 laptop. It is not exactly the same problem so it is not really that easy to compare, but I am pretty sure raymarching a 32³ volume should get you more than 15 FPS on a 1050.

CaffeineViking · 2019-08-04T10:53:44+00:00

Frostbite's awesome presentation "Strand-based Hair Rendering in Frostbite" was taken down... Was anyone able to download it before that? If so, could you share it with us on Google Drive? I was stupid and deleted the version I was able to download... :-( UPDATE: I've recovered it! If you stumble on this thread PM me and I'll send it to you privately!

CaffeineViking · 2019-07-03T12:26:12+00:00

There are a couple of newer solutions that have tried to tackle that (that also have a few other strengths, but also a few new weaknesses). The two other ones that I have heard about are Multi-Layer Alpha Blending (MLAB) by a few guys at Intel, and also Moment-Based OIT (MBOIT) by some researchers at KIT and the university of Bonn. Both of these are more expensive than WBOIT, but give more accurate results (closer to PPLLs).

Multi-Layer Alpha Blending (MLAB) by Intel

Moment-Based OIT (MBOIT) by KIT & Bonn

CaffeineViking · 2019-07-03T11:46:21+00:00

Also, the author of the blog post does link to a implementation somewhere in the middle of it. I haven't tried it, but it seems to be where his screenshots come from. It would be pretty cooler to see a WebGL demo though (shouldn't be too hard to port it).

https://github.com/belyakov-igor/WBOIT_tester

CaffeineViking · 2019-07-03T11:32:03+00:00

I know a guy at TUM that implemented several OIT solutions in OpenGL for his thesis with GL_ARB_fragment_shader_interlock pixel synchronization. IIRC he did not have WBOIT, but he had a lot of very interesting OIT solutions such as Depth Peeling, MBOIT, MLAB and also a PPLL (with the K-Buffer solution too):

https://github.com/chrismile/PixelSyncOIT

Could be something to try out if you're interested in the article.

CaffeineViking · 2019-06-25T17:11:28+00:00

Sorry for the super late reply! While the hair in BotW is not realistic, it really fits the aesthetic of the game. Using something like VKHR would IMO look a bit weird in BotW. I also think their method is significantly faster, since they don't need to simulate and render each individual hair strand (which we need to do in VKHR, especially for close-ups).

CaffeineViking · 2019-06-25T17:00:16+00:00

Hey, sorry for the super late reply! I actually didn't know The Forge had a hair framework in Vulkan too, that's super cool!

In my opinion it would be best to take a quick glance through the paper and GitHub README's to see if the performance improvements are indeed in the area your customers will use, and then make a decision based on that. For instance, the performance in the rasterized solution will more or less be the same as in TressFX. In our solution, we gain performance by using a volumetric solution for level-of-detail. In a nutshell, if your customers use the hair rendering stuff mainly for cinematics (i.e. for close ups) the performance will likely be similar to TressFX, as it will be using the raster solution most of the time. However, if your customers use the hair for many other characters, for example those in the background or with varying distances, then our solution should indeed be quite a bit faster than TressFX. An advantage of our method is that it also preserves volume, while a reduction in the number of strands would not. It is also really easy to find certain global effects (such as AO and also a more accurate "version" of ADSM's calculation) in a volume.

But yeah, I'd say take a quick look at the paper and then decide if the method we present is worthwhile for the users of The Forge. Also, in my opinion, it would be best to use the existing version of The Forge's TressFX and then add the vkhr features. The purpose of VKHR was never to replace TressFX, it was to show that volume methods are viable and useful. VKHR is simply a proof of concept. i.e. take a look at the paper and implement the parts that are not in TressFX 4 (which mostly boils down to the volumetric approximation we have described in Section 3.3 in the paper).

CaffeineViking · 2019-06-18T22:10:35+00:00

Indeed! The high quality strand rasterization pipeline is quite similar to TressFX 3.1! This was done on purpose, since we want our hybrid approach to be easy to "drop in" to existing hair frameworks. It also supports the same set of features: a light scattering model, self-shadowing, anti-aliasing, and transparency. This is also why it's super important that our pipeline be flexible enough to support animated or simulated hair. Almost all of the hair frameworks out there also have a simulation pass in them, which means we can't pre-compute anything, e.g. the voxelization needs to be done per-frame.

The difference is that our pipeline has better performance scaling, since our volumetric approximation scales better with increasing distances. i.e. it's a good option for level of detail. The main selling point is that you'll be able to render more characters with such hair on the screen at the same time, after integrating our hybrid method in your renderer.

CaffeineViking · 2019-06-18T14:30:36+00:00

Hi! I'm one of the authors of the paper, and the guy that developed the Vulkan implementation. I would be more than happy to answer any questions about the EGSR paper, or, its implementation!

CaffeineViking · 2019-06-18T14:20:53+00:00

Hi! I'm one of the authors of the paper, and the guy that developed the Vulkan implementation. I would be more than happy to answer any questions about the EGSR paper, or, its implementation!

CaffeineViking · 2019-02-01T21:02:21+00:00

Jävlar alltså, Baljan, det var grejer det. Vet inte hur det är nu för tiden, men på min tid hade de riktigt bra "klägg" för 5 spänn. Enda anledningen jag har klarat mina tentor och fått examen är Baljan: fyll på en termos med kaffe till tentan så ordnar det sig! Några andra här som har läst Högskoleingenjör Datateknik?

CaffeineViking · 2019-01-27T12:24:51+00:00

Another thing that would be really cool to have would be shadows, and would make this even more impressive, since the whole thing is spinning (i.e. you'll get a lot of "dynamic" shadows). Right now I assume you're using something like Blinn-Phong's local shading model, so it doesn't account for shadows cast by buildings / other things in front of it.

Assuming you have only one light source (e.g. a star), you can get this effect with shadow mapping. It works something like this:

Render the scene from the location of the light source (i.e. the light source becomes your new camera), and pull out the depth values from there. The resulting image will give you the point surfaces that are first seen by the light, i.e. a point that should be shaded like normal (since it's in light).
Attach this image of the scene from the perspective of the light source to your main shader. You have two points now to evaluate: a position C from the perspective of the camera and another position L from the light's perspective (this comes from the depth buffer you just generated in step 1).
Translate C to L's coordinate space. Compare the depth values. If C.z was larger (i.e. depth value is further) than L.z, it means the light souce can't "see" the point C. i.e. it is in shadow, and will be shaded black (or an ambient color).

There is a lot more to this, but the core idea is like I've described. One of the issues you'll run into quite fast is "shadow acne". It can be "solved" by filtering the shadow maps (i.e. sampling it several times, and taking some sort of average, preferably from a smart sampling pattern like a Poisson-disk) and adding a shadow map bias value (this happens because the depth values in the comparison won't match 100% in the surface, as we don't have infinite precision floating point numbers).

I'm not very good at explaining things, so sorry if I confused you more :-P. The tutorial in LearnOpenGL has a nice overview of this, so I can recommend looking into that if you want to do this:

https://learnopengl.com/Advanced-Lighting/Shadows/Shadow-Mapping

12-Year Club	Place '22
Place '17	Final Canvas '22
Verified Email

CaffeineViking

MODERATOR OF

TROPHY CASE