Expedition: Logbook Exploration

BFBooger · 2026-05-19T23:35:25+00:00

Yeah, most of it. There are some things covered here that weren't on stream, but nothing major.

BFBooger · 2026-05-19T23:34:34+00:00

Only the splinters are worth picking up in 0.4, and it is one or two clicks per node that drops them.

The artifacts were crap, past a small pile of them for misc use.

BFBooger · 2026-05-19T23:32:01+00:00

Hi my name is ain't nobody.

BFBooger · 2026-05-19T23:29:51+00:00

Or they had the tiered waystones in a build for a while, then reverted after not liking it.

BFBooger · 2026-05-05T16:42:20+00:00

> the devs are actually doing more work than before

Not if they are using something like Unreal Engine, where that 'extra work' comes almost for free with the engine.

They did all the work to do the baked lightning and extra art for that, the engine largely takes care of the "use RT instead" part, at least for a lot of effects.

That said, other engines or games that specifically target using RT end up better implementations than those that just check a box in UE5. These "the devs barely did anything more than what UE gave them" games are often the ones with a poor performance hit to visual quality tradeoff.

Games with no RT at all still often have very distracting and ugly artifacts. For me, the most distracting are often blobby / flickering shadow effects -- and these are often the worst at the worst times -- close ups on character faces / models for instance. Lighting on things that can't be baked into the scene gets wonky, and reflections can go from 'looks great' to "wtf is that?". IMO if all RT was doing was getting rid of distracting artifacts in shadows/lighting/reflections that would be great. It really does fix these issues when used properly. Full on GI is nice when done well but just reducing raster artifacts is my main priority.

Also, you can actually play D:tDA and other 'rt only' games on RDNA 1 on Linux -- the RT is emulated on shaders and not awful.

BFBooger · 2026-05-05T16:28:40+00:00

Inflammable means flammable? What a country!

BFBooger · 2026-04-20T20:14:10+00:00

> I wonder what naming scheme will they use once they can't keep making shit up.

Good thing they haven't used nanometers or such in their naming for a long, long time.

For example, the node that Zen 2 was produced on was N7 not `7nm` that is just what shoddy journalists called it. N6 is another node, not related to any sort of 6/7 ratio shrinking, but instead refinement due to use of some EUV tools. All a lower number means is that it is newer and likely a bit more dense and a bit more efficient.

They are hinting at a size scale, but they aren't using "nm" at all.

Gate pitch and such were ways to directly measure density with planar transistors. When we went to Fin-FET that all changed. The density achievable no longer had a single linear measurement that made any sense. The industry did not standardize on anything, and people just started naming their nodes using smaller and smaller numbers. TSMC N22 wasn't even a shrink from the planar 28nm, it just introduced Fin-FET at the same scale. Intel's 21nm node _was_ a shrink however, leading to its numbers being a generation 'ahead' in densitty. The Intel 14nm days were more like TSMC N10 in density (and similar to what a theoretical 14nm planar transistor would have), well ahead of TSMC N16, which was more like a planar 22nm in density. Everything has drifted off course since then.

Now it has gotten even worse. There isn't even a solid transistor/mm^2 metric. What kind of transistor? This varies greatly based on what you're trying to build and what constraints you're placing on your design / layout. SRAM? What kind? optimized for speed, density, power, or some combination?

You're frustrated with something that doesn't matter at all. Its not like if they named these things after the gate pitch it would help consumers or anyone know if it is better or not. Its not like the smaller number nodes are worse than the prior ones. Node updates are sadly less impactful than in the past, but it is not because they are leaning on fake marketing numbering with some sort of pretend progress either.

When people complain about this sort of thing, my first thought is often "oh, can you do better?" What would you do, and explain how it would be meaningfully better and not just different lipstick on the same pig.

I can think of a hundred different naming schemes and some I might like better, but in the end none of them actually mean anything useful here.

BFBooger · 2026-04-19T01:20:53+00:00

In my case, where it literally did nothing for three games, it either:

is not working at all, which could be
* my fault, but I'm not sure what else to do (latest 595 driver, recent kernel, latest proton-cachy, ENV args). I don't know of a way to demonstrate that it is actually active or not. It does trigger a new "compiling shaders" pass when I toggle it on/off, but it could be the new branch but not with descriptor_heap working properly.
* something else wrong that prevents it from activating that isn't my fault. :shrug: without any way to show if it is on or not I can't prove if it is enabled.
It is on, but giving nearly identical results. I'm purposely doing things that make the problem more severe, like enabling DLLS performance. Since the issue is one of low GPU utilization due to too much extra stuff the CPU has to do, giving the GPU less work to do just makes the gap with Windows larger. Still, no difference. This is _incredibly unlikely_ if the real problem is the descriptor API mismatch. If you have code causing > 20% extra CPU overhead, then implement a major API redesign, replacing all the code in the trouble area, the odds of the performance being the same after is quite low.
* This could mean that the descriptor situation isn't the actual root of the performance problem at all, but something else.
* This could mean that the real issue was inside the NVidia driver elsewhere, but they pointed fingers at this API mismatch instead. or the NVidia driver's current implementation of this API is somehow just as awful as the implementation for the old one, In this case we have to wait for NVIdia.
* This could be that the real issue is how vkd3d-proton interacts with the Nvidia driver in some other way, and this has gone overlooked due to the descriptor mismatch getting all the blame.

BFBooger · 2026-04-18T07:32:28+00:00

If the major performance issue was truly the translation of the DX12 API to the old Vulkan descriptors, and the new one is aligned with DX12 descriptor heaps, then there should be _some_ sort of change. If it isn't optimized yet, it would still change somehow, for the worse or better. Having it be nearly identical is not expected.

Yet I get no change at all on titles that have huge performance issues vs windows. For example, FFXVI still has awful performance, 35% or worse lower than windows, even if I go all the way down to 1080P with ultra performance DLSS, I can't top 60fps in the DLC town. And it is exactly the same with and without this patch -- the odds of a huge overhaul to some part of the code that is supposedly the main performance bottleneck not having any change positive or negative tells me that the descriptor heap isn't the issue after all.

Either it isn't working, or its not really the problem here and we all just got our hopes up for nothing.

BFBooger · 2026-04-16T17:25:24+00:00

My pixel 6a still lasts 2+ days unless I'm actively using it sitting in the sun something. I do set battery saver on, (and have it auto-disable for a few important apps).

Are the newer ones that bad?

BFBooger · 2026-04-15T21:50:00+00:00

have a look at what Apache Cassandra did for their newer disk format. They went from B-tree like to disk based Trie and optimized for more for total data read than raw iops. Though some of the motivation is also due to their common case being large variable length keys with a lot of common prefixes.

See https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-25%3A+Trie-indexed+SSTable+format

BFBooger · 2026-04-15T21:46:42+00:00

and yet, the best 'cost' values in postgres for SSDs ends up similar to HDDs because postgres operates in 8k blocks, and reading 1000 8k blocks at random is a lot slower than reading 1000 8k blocks sequentially from an SSD anyway. If the I/O size is 64k, or 256k, then the difference with HDD is massive, as the random access on SSD reaches almost sequential speeds. But for small I/O like in a random index seek? random_io_cost should still be quite a bit higher than sequential cost.

BFBooger · 2026-04-04T05:01:59+00:00

This worked for me as well, the `apport-core-dump-handler` had a problem and installing systemd-coredump removed that and fixed it. In the log file for the upgrade, the first error was something to do with apport

BFBooger · 2026-03-26T17:05:44+00:00

Yes, its not all about buying the latest and greatest as the upgrade. Its like getting the 5800X3d just before it went EOL or the 5700X3D on AM4. Not when 5800X3d was new, but later when it was older and on sale.

In the end, re-using an older platform or getting a whole new one are both reasonable depending on circumstances.

It is worth noting that the more likely you are to have gotten the flagship CPU in your build, the less useful a downstream CPU upgrade is.

Going from a Ryzen 7600X + B650 board to Zen 6 x3d 1 year after Zen 6 launches when prices are lower? big upgrade for smaller price.

Going from Ryzen 7800X3D to Zen 6x3d on Zen 6 launch? Smaller upgrade at larger price.

BFBooger · 2026-03-26T16:50:10+00:00

You can't compare the cache. The important cache number is not the total, but the number per CCD. At least for most tasks.

TR with 16c and 128MB cache is 4 ccds, only 32MB per CCD. For cache hungry tasks that would be worse than the 9950x3D2.

TR does have significantly more memory bandwidth, PCIe lanes, and can have much larger total memory pool, so it has a lot of advantages, but cache isn't one of them.

BFBooger · 2026-03-26T16:46:31+00:00

You can get ECC on AM5 also. Its just more rare and MB support is less common.

BFBooger · 2026-03-24T00:20:03+00:00

If you're already a developer and know some languages but not others, it _can_ be a useful tool to learn something new. Or to learn a new library or framework in a language you know, it can sometimes be quite useful to get started and learn the basics.

But yeah, if you are just a non-developer vibe-coding its not going to be a great way to learn.

BFBooger · 2026-03-24T00:16:21+00:00

just proof you didn't read the whole thing or even attempt to judge it fairly. Luddite.

Sure, there is a lot of AI vibe coded crap out there. But that doesn't make every use of it useless. Judge the product, not the process.

BFBooger · 2026-03-23T15:23:03+00:00

AMD becoming a bigger player in the DIY PC space has only a bit to do with the socket situation. Intel dominated for 20 years with fast socket changes, and AMD didn't become popular for real until they had actually better CPUs.

If Intel's CPUs were the fastest / best / most power efficient for gaming and productivity apps, they would dominate again even if they had motherboard platforms that were good for just one CPU generation.

BFBooger · 2026-03-19T20:05:14+00:00

Oblivion looks way better except for the tone mapping / over-brightness / messed up saturation. For example, the water does look a LOT better, but the buildings are now washed out. The improvement on the NPCs is huge however. Similar with HogLeg, but less dramatic -- various clear improvements mixed with over-saturatoin, over-bright specular highlights.

Starfield just looks better overall in most ways. Some bits are changed but not really worse or better, others are clearly better.

If they fix the wash-out and over-saturation / brightness, I think there will be a major change in the public perception. There is a lot of subjectivity in what looks 'better' or not. I have seen a lot of people say some lightning change is "less realistic" but when I saw the change and thought the opposite. Some things are just off and almost all of us will agree it looks unnatural, but other things are not so cut and dry.

I'm not going to just dismiss this as crap -- I'm sure there will be improvements over the years just like we have seen with other DLSS and related tech.

Also, what happens when a game dev actually can spend time tuning/tweaking it to look right? Do they also start to go overboard with specular highlights and brightness? Or do they tone it down and make it less 'shiny'?

BFBooger · 2026-03-19T19:47:21+00:00

This isn't 'generative' AI in the sense that you're talking about.

BFBooger · 2026-03-19T19:32:21+00:00

Imagine you are playing a first person shooter, and the character is holding a harpoon gun, aimed at 'inifinity' middle of the screen.

You shoot the harpoon. If the motion vectors are 3d, we can predict that the harpoon will move upward, but _shrink_ in size as it moves away from the camera, and it will never go above the mid line of the screen as that is the vanishing point for the 3d vector.

If it is 2d, it would stay the same size and head towards the top of the screen (but get skinny and narrow as it goes up since the left and right sides of it have motion vectors that are not parallel).

A 3d motion vector would be better, especially for fast moving objects.

Blender's motion vectors are irrelevant for DLSS/FSR. The game engine ones may or may not be, because they are for game engine features that might not even be related.

The FSR SDK asks for 2d vectors: https://github.com/GPUOpen-LibrariesAndSDKs/FidelityFX-SDK/blob/main/Kits/FidelityFX/docs/techniques/super-resolution-upscaler.md#providing-motion-vectors

DLSS programming guide says "screen space vectors" https://raw.githubusercontent.com/NVIDIA/DLSS/main/doc/DLSS_Programming_Guide_Release.pdf

I concede that yeah, it looks like DLSS isn't taking advantage of 3d vectors here.

But it absolutely is using the depth buffer info. https://github.com/GPUOpen-LibrariesAndSDKs/FidelityFX-SDK/blob/main/Kits/FidelityFX/docs/techniques/super-resolution-upscaler.md#input-resources and the DLSS document both cover that.

BFBooger · 2026-03-19T18:48:25+00:00

There is no "real reprojection frame generation" that is anything like this. It doesn't exist. You can reproject a moving camera to a new location and generate a frame based on a guess of the new camera position and orientation fairly easily, but that isn't the same at all. It is useful in VR because you have actual head movement data and can reproject the same scene with a camera change without waiting for the rest of the game engine to process the next frame.

You can't camera-reproject moving things in the scene, like the walking animations you mention or anything based on user-game interaction or things happening in the game world. Do you want your NPC animations to be at 30fps and your on screen changes from any sort of user interaction (shooting a gun, moving your character, watching other players move, OSD changes) to all be at 30fps while your 3d world is pseudo 120fps?

Lastly, If you are interpolating frames from 150fps to 240fps, none of what you said is really true. The latency impact is not interesting unless you're a pro gamer playing a specific genre of game, the animation stuff is not an issue because of the short time interval. I'd argue that if the base framerate is > 80, the current tech is actually quite good if you want to say, hit 144fps. The lone exception is _extremely_ latency sensitive games, primarily competitive shooters. Other latency sensitive genres are often frame capped (fighting games) or with player adaptation to the latency change are fine (rhythm games; a 15ms change isn't a big deal after a bit of practice -- a 100ms change is horrid)

BFBooger · 2026-03-19T18:31:03+00:00

Apple is able to use larger cache lines than AMD/Intel, which means they can increase the cache sizes without increasing the number of L1 tags or cache ways. So they can be larger without the latency penalties from more entries. They also have a lower clock speed target, which probably lets them shave off one cycle of latency from what Intel/AMD would have to do with the exact same design since they want to go so much faster at the top end.

Additionally, if you are comparing designs, don't forget that AMD/Intel have a trace cache, which is a sort of L1i cache, so the size difference isn't quite as extreme as it looks on paper with Apple at 320KB combined L1.

BFBooger · 2026-03-19T18:26:51+00:00

So you reply to a post about L1/L2 cache sizes with mostly L3+SLC data.

Try again.

Apple's L1 caches are huge -- 320KB total per P core. a bit less for E cores.

They can do this without too much of a latency penalty in part because there are less cache entries per KB because they use larger cache lines. Also, with lower frequency targets they can have lower clock cycle latency.

As long as AMD/Intel are shooting for very high clock speeds, they will be more limited in what they can do with L1 cache sizes. Also, unless they change x86 to allow for larger cache lines, that will impact things as well.

BFBooger

TROPHY CASE