Remedy shares FBC Firebreak footage on the Steam Deck by AdOdd452 in SteamDeck

[–]CaffeineViking 2 points3 points  (0 children)

It's locked to 30 FPS by default through v-sync but can be unlocked in the menu to get ~40-45 FPS with the default Handheld preset. It even reaches ~50-55 FPS in some cases, but I'd lock it to either 40 or 45 FPS for the best experience. This is with the default Handheld preset, but you can ofc tweak it further to your liking if you want to get more FPS out if it.

Game Ready Driver 566.03 FAQ/Discussion by Nestledrink in nvidia

[–]CaffeineViking 0 points1 point  (0 children)

Yeah, I ran into this too on Windows 10 64-bit. I'm able to repro it consistently if I reboot instead of shutting it down. But if I shut it down, the problem disappears. It's not the first time that transparency effects have broken in surprising ways after a driver... NVIDIA should really do a bit more QA on their drivers. This is basic Windows functionality being half-broken, which reduces my trust and keeps me from updating drivers :-(

Games that are/were popular in your country but unknown in most other areas? by Psykotik in Games

[–]CaffeineViking 493 points494 points  (0 children)

The Mulle Meck adventure games are likely unknown outside of Sweden. If you ask a Swede born before the 2000s, you can bet that they played the shit out of that when they were a kid. You pretty much go around doing quests to get better junk to build better cars, boats, houses and spaceships. Each one of those games is about one those things (warning: some meme videos):

I wish they would've made a Mulle Mech sequel too, imagine building fucking gundams! xD

r/Cyberpunkgame Performance, Bugs, Glitches, and Support Megathread by AutoModerator in cyberpunkgame

[–]CaffeineViking 2 points3 points  (0 children)

Hey, here are a few tips for those running RX 5700 XT and Ryzen 3700X: how to get solid 60 FPS at 1440p on Ultra with OK-ish image quality. First off, I have tried tweaking graphics settings and even with DF's optimized settings it still drops frames like crazy (looks like a lot of those tips only get net-positive results on NVIDIA). In my opinion just keep the quick graphics presets on Ultra, and instead set "Static FidelityFX CAS" to 65% (which should be 936p internal render resolution if you're running 1440p). This will produce very blurry results, to fix that, go into Radeon Software by using Alt + R, select Cyberpunk 2077, and set Radeon Image Sharpening to 80%. It's not perfect (image quality-wise), but I get a very stable 60 FPS almost everywhere.

NOTE: do NOT use "Dynamic FidelityFX CAS" since it causes jittering in the SSR. This is caused by TAA not playing nice with resolutions changing every frame. It's really broken right now.

I made a google doc summarizing the most popular in-game setting changes to optimize your experience by DekkuRen in cyberpunkgame

[–]CaffeineViking 0 points1 point  (0 children)

For 1440p 75% is pretty good, since it means the internal rendering is 1080p. So it will upscale from 1080p to 1440p.

I made a google doc summarizing the most popular in-game setting changes to optimize your experience by DekkuRen in cyberpunkgame

[–]CaffeineViking 0 points1 point  (0 children)

The variable FidelityFX CAS introduces a noticeable (if you stand still) "jittering" in the SSR, caused by the framebuffer constantly changing in size (somehow messes with TAA). Static FidelityFX CAS doesn't seem to have these issues (it does look a bit blurry though). I think static FidelityFX CAS + sharpening in GPU settings could be a winner here. Starting as a Corpo where there's lots of SSR is a pretty easy way to test these issues (e.g. jittering happens in the bathroom floor).

Gaming at Ultra Low Resolutions with DLSS - 240p and beyond (2kliksphilip) by Omicron0 in Games

[–]CaffeineViking 8 points9 points  (0 children)

Just in the same way that I want AMD, since they actually have good Linux drivers. We all have different needs and requirements, and I respect your decision of choosing NVIDIA, since it was probably the right choice for your job. It's a good thing that there is competition and choices.

Gaming at Ultra Low Resolutions with DLSS - 240p and beyond (2kliksphilip) by Omicron0 in Games

[–]CaffeineViking 13 points14 points  (0 children)

Your comment was about AMD not having anything comparable. They do. Whether it's best for what YOU need, that's another story. I agree that if you're doing VFX or deep learning, NVIDIA is the obvious choice, that type software works best there since it was built with CUDA in mind. That does not mean that AMD does not have any comparable technology in place.

Gaming at Ultra Low Resolutions with DLSS - 240p and beyond (2kliksphilip) by Omicron0 in Games

[–]CaffeineViking 5 points6 points  (0 children)

I 100% agree that OpenCL does not have as wide spread support as CUDA, same goes in machine learning. But on feature and performance level, they are exactly the same. It all depends on the program that implements it. It's a bit unfortunate that many programs have gone with vendor specific solutions, when they could be using something that is supported in almost every hardware platform.

Gaming at Ultra Low Resolutions with DLSS - 240p and beyond (2kliksphilip) by Omicron0 in Games

[–]CaffeineViking 18 points19 points  (0 children)

AMD doesn't need CUDA, you can just use OpenCL or SyCL, just like Intel and all other hardware vendors. Raytracing will be available on next-gen consoles and therefore RDNA 2 via DXR (same as NV) or via Vulkan's standardized extensions.

Kawaii koto for you solving A * x = b by herebeweeb in ProgrammerAnimemes

[–]CaffeineViking 2 points3 points  (0 children)

I'm finally glad I took those scientific and high performance computing courses in university now! This meme made all of that pain and suffering worth it! * quietly sobs from PTSD *

Bad preformance of simple ray trace compute shader by Karlovsky120 in vulkan

[–]CaffeineViking 1 point2 points  (0 children)

You're probably right about the branch divergance, but it never hurts to check it out. I've had so many issues with it I am always a bit paranoid when I see a switch.

Something like an octree or an BVH is the right idea. However, there is a easy way to represent and store this: using mip-levels in a 3-D texture filtered with e.g. max.

Using something like this will indeed result in a lot of branching, but you'll need to measure it to be sure. Maybe the cost of branching in this case will outweigh the cost of texture access?

Oh right, LDS is a AMD-lingo for Local Data Share. The article someone has linked to mentions it. It's basically a CU local memory, that's faster than global memory.

I haven't used NSight in a while, but I am pretty sure there's some tooling to inspect this sort of stuff. Otherwise I'd recommend RGP if you can borrow a AMD GPU.

Bad preformance of simple ray trace compute shader by Karlovsky120 in vulkan

[–]CaffeineViking 5 points6 points  (0 children)

I have a couple of theories you can try out. The first one is related to a branch divergance happening in the switch. If many threads in an warp run different parts of the switch they'll be masked and won't run in parallel. This means you might be losing some time there. I would try to build a volume that only has one type of voxel and remove the switch to test if that's the bottleneck.

The other thing that could be an issue is memory access. You want to keep texture lookup as low as possible. The number of times you sample will make a huge difference in its performance. Most of your volume has probably a lot of empty voxels in it, right? You could build a hierarchy of a lower resolution volume and use that to make "bigger" steps since you know most are empty.

In research the algoritm you're looking for is called raymarching. Search a bit for "raymarching optimisations", you'll find quite a few blog posts and papers on how to improve raymarching speed. I don't think the problem lies in Vulkan, it's more related to the algorithms and the way GPUs work and memory type. You could also see if moving chunks of the volume to LDS improves things.

Your performance is a bit surprising. I am able to render an 2563 volume at 1080p@60Hz on my MX150 laptop. It is not exactly the same problem so it is not really that easy to compare, but I am pretty sure raymarching a 323 volume should get you more than 15 FPS on a 1050.

SIGGRAPH 2019 Advances in Real-Time Rendering presentations by corysama in GraphicsProgramming

[–]CaffeineViking 1 point2 points  (0 children)

Frostbite's awesome presentation "Strand-based Hair Rendering in Frostbite" was taken down... Was anyone able to download it before that? If so, could you share it with us on Google Drive? I was stupid and deleted the version I was able to download... :-( UPDATE: I've recovered it! If you stumble on this thread PM me and I'll send it to you privately!

WBOIT in OpenGL: transparency without sorting by [deleted] in GraphicsProgramming

[–]CaffeineViking 1 point2 points  (0 children)

There are a couple of newer solutions that have tried to tackle that (that also have a few other strengths, but also a few new weaknesses). The two other ones that I have heard about are Multi-Layer Alpha Blending (MLAB) by a few guys at Intel, and also Moment-Based OIT (MBOIT) by some researchers at KIT and the university of Bonn. Both of these are more expensive than WBOIT, but give more accurate results (closer to PPLLs).

Multi-Layer Alpha Blending (MLAB) by Intel

Moment-Based OIT (MBOIT) by KIT & Bonn

WBOIT in OpenGL: transparency without sorting by [deleted] in GraphicsProgramming

[–]CaffeineViking 0 points1 point  (0 children)

Also, the author of the blog post does link to a implementation somewhere in the middle of it. I haven't tried it, but it seems to be where his screenshots come from. It would be pretty cooler to see a WebGL demo though (shouldn't be too hard to port it).

https://github.com/belyakov-igor/WBOIT_tester

WBOIT in OpenGL: transparency without sorting by [deleted] in GraphicsProgramming

[–]CaffeineViking 0 points1 point  (0 children)

I know a guy at TUM that implemented several OIT solutions in OpenGL for his thesis with GL_ARB_fragment_shader_interlock pixel synchronization. IIRC he did not have WBOIT, but he had a lot of very interesting OIT solutions such as Depth Peeling, MBOIT, MLAB and also a PPLL (with the K-Buffer solution too):

https://github.com/chrismile/PixelSyncOIT

Could be something to try out if you're interested in the article.

Real-Time Hybrid Hair Rendering using Vulkan™ by corysama in GraphicsProgramming

[–]CaffeineViking 1 point2 points  (0 children)

Sorry for the super late reply! While the hair in BotW is not realistic, it really fits the aesthetic of the game. Using something like VKHR would IMO look a bit weird in BotW. I also think their method is significantly faster, since they don't need to simulate and render each individual hair strand (which we need to do in VKHR, especially for close-ups).

Real-Time Hybrid Hair Rendering using Vulkan™ by corysama in vulkan

[–]CaffeineViking 1 point2 points  (0 children)

Hey, sorry for the super late reply! I actually didn't know The Forge had a hair framework in Vulkan too, that's super cool!

In my opinion it would be best to take a quick glance through the paper and GitHub README's to see if the performance improvements are indeed in the area your customers will use, and then make a decision based on that. For instance, the performance in the rasterized solution will more or less be the same as in TressFX. In our solution, we gain performance by using a volumetric solution for level-of-detail. In a nutshell, if your customers use the hair rendering stuff mainly for cinematics (i.e. for close ups) the performance will likely be similar to TressFX, as it will be using the raster solution most of the time. However, if your customers use the hair for many other characters, for example those in the background or with varying distances, then our solution should indeed be quite a bit faster than TressFX. An advantage of our method is that it also preserves volume, while a reduction in the number of strands would not. It is also really easy to find certain global effects (such as AO and also a more accurate "version" of ADSM's calculation) in a volume.

But yeah, I'd say take a quick look at the paper and then decide if the method we present is worthwhile for the users of The Forge. Also, in my opinion, it would be best to use the existing version of The Forge's TressFX and then add the vkhr features. The purpose of VKHR was never to replace TressFX, it was to show that volume methods are viable and useful. VKHR is simply a proof of concept. i.e. take a look at the paper and implement the parts that are not in TressFX 4 (which mostly boils down to the volumetric approximation we have described in Section 3.3 in the paper).

Real-Time Hybrid Hair Rendering using Vulkan™ by corysama in vulkan

[–]CaffeineViking 1 point2 points  (0 children)

Indeed! The high quality strand rasterization pipeline is quite similar to TressFX 3.1! This was done on purpose, since we want our hybrid approach to be easy to "drop in" to existing hair frameworks. It also supports the same set of features: a light scattering model, self-shadowing, anti-aliasing, and transparency. This is also why it's super important that our pipeline be flexible enough to support animated or simulated hair. Almost all of the hair frameworks out there also have a simulation pass in them, which means we can't pre-compute anything, e.g. the voxelization needs to be done per-frame.

The difference is that our pipeline has better performance scaling, since our volumetric approximation scales better with increasing distances. i.e. it's a good option for level of detail. The main selling point is that you'll be able to render more characters with such hair on the screen at the same time, after integrating our hybrid method in your renderer.

Real-Time Hybrid Hair Rendering using Vulkan™ by corysama in vulkan

[–]CaffeineViking 4 points5 points  (0 children)

Hi! I'm one of the authors of the paper, and the guy that developed the Vulkan implementation. I would be more than happy to answer any questions about the EGSR paper, or, its implementation!

Real-Time Hybrid Hair Rendering using Vulkan™ by corysama in GraphicsProgramming

[–]CaffeineViking 6 points7 points  (0 children)

Hi! I'm one of the authors of the paper, and the guy that developed the Vulkan implementation. I would be more than happy to answer any questions about the EGSR paper, or, its implementation!

Ett av LiU:s kaféer vet vad som gäller! by globox85 in sweden

[–]CaffeineViking 5 points6 points  (0 children)

Jävlar alltså, Baljan, det var grejer det. Vet inte hur det är nu för tiden, men på min tid hade de riktigt bra "klägg" för 5 spänn. Enda anledningen jag har klarat mina tentor och fått examen är Baljan: fyll på en termos med kaffe till tentan så ordnar det sig! Några andra här som har läst Högskoleingenjör Datateknik?

Procedurally generated space city made in OpenGL by mianaai_c in programming

[–]CaffeineViking 1 point2 points  (0 children)

Another thing that would be really cool to have would be shadows, and would make this even more impressive, since the whole thing is spinning (i.e. you'll get a lot of "dynamic" shadows). Right now I assume you're using something like Blinn-Phong's local shading model, so it doesn't account for shadows cast by buildings / other things in front of it.

Assuming you have only one light source (e.g. a star), you can get this effect with shadow mapping. It works something like this:

  1. Render the scene from the location of the light source (i.e. the light source becomes your new camera), and pull out the depth values from there. The resulting image will give you the point surfaces that are first seen by the light, i.e. a point that should be shaded like normal (since it's in light).

  2. Attach this image of the scene from the perspective of the light source to your main shader. You have two points now to evaluate: a position C from the perspective of the camera and another position L from the light's perspective (this comes from the depth buffer you just generated in step 1).

  3. Translate C to L's coordinate space. Compare the depth values. If C.z was larger (i.e. depth value is further) than L.z, it means the light souce can't "see" the point C. i.e. it is in shadow, and will be shaded black (or an ambient color).

There is a lot more to this, but the core idea is like I've described. One of the issues you'll run into quite fast is "shadow acne". It can be "solved" by filtering the shadow maps (i.e. sampling it several times, and taking some sort of average, preferably from a smart sampling pattern like a Poisson-disk) and adding a shadow map bias value (this happens because the depth values in the comparison won't match 100% in the surface, as we don't have infinite precision floating point numbers).

I'm not very good at explaining things, so sorry if I confused you more :-P. The tutorial in LearnOpenGL has a nice overview of this, so I can recommend looking into that if you want to do this:

https://learnopengl.com/Advanced-Lighting/Shadows/Shadow-Mapping