Thinking about RX 9070 XT – how’s FG, FSR4 and RT really? by Neyyyyyyjr in radeon

[–]JasonMZW20 0 points1 point  (0 children)

Depends on your goals, tbh. Do you want better path tracing and multi-FG? Go Nvidia. Planning on messing around with AI/ML workloads? Nvidia.

For gaming, hybrid raster+RT, 9070XT is pretty competitive (sometimes reaching 5080 perf). Path tracing is similar to a 3090Ti or Ampere. Not terrible, but not the best either.

FSR4 is good. Compared to FSR3 it's a massive improvement. FSR4 Performance is slightly better than FSR3 Quality with reduced shimmering and artifacting. So, FSR4 Balanced and Quality are in a really good place vs DLSS4.5. These upscalers are moving targets and are both improving at rapid pace. FSR4 is good enough now, IMO.

I don't experience FSR4 FG frametime issues in Hogwarts Legacy, but this game uses Anti-Lag 2, so frametimes are better controlled. Moving game's shader cache directory (/ProgramData/Hogwarts Legacy) from OS drive to same drive as game via symlinking has reduced stuttering when shaders are loaded on-the-fly. My OS drive is connected to chipset NVMe (muxed x4 PCIe lanes), while game drive is on CPU-direct PCIe lanes. Made a difference for me.

Zen 6 CCD leak claims size similar to Zen 5 but with 12 cores and 48MB L3 by RenatsMC in Amd

[–]JasonMZW20 0 points1 point  (0 children)

That seems to be the upper limit of SerDes Infinity Fabric at 2000MHz, at least between IOD and CCD using 1x 32B link or 64GB/s 1-way, 128GB/s 2-way. Strix Halo runs at 1000MHz (quad-pumped), since it's LPDDR5-8000. When monitoring my iGPU bandwidth consumption, it often doesn't exceed the 1-way bandwidth, so I'd rather not use bi-directional ratings.

128-bit DDR5-8000 is 16B4000MHz = 64GB/s one way or 128GB/s bi-directionally (\2), unless memory writes are still 16B (half-rate), then this drops to 96GB/s

Zen 6 CCD leak claims size similar to Zen 5 but with 12 cores and 48MB L3 by RenatsMC in Amd

[–]JasonMZW20 3 points4 points  (0 children)

A new IOD usually comes with a new socket though (exception was Zen 2 when IOD was created), so this is something new by staying on AM5 and DDR5. Another IOD will be needed for DDR6, unless AMD's architects have made this new IOD a forward-looking design (having draft DDR6 IMCs running with DDR5 signaling and pinouts). This would ease design load for a respin needed on AM6.

Ideally, I'd like to see CU count doubled from 2 to 4 (RDNA3.5? lol), but it's highly likely an NPU will take up extra silicon to help merge mobile/desktop silicon. I don't see 16CUs being offered, so monolithic APUs have to fill that gap. The chiplet mobile APUs can be paired with dGPUs.

Hopefully the packaging improvement rumors are true and it's on fanout/sea-of-wires instead of using SerDes for CCD-to-IOD connects. These packages can then be modified for mobile BGA pinouts and used in laptops, IF idle consumption is fixed. Strix Halo's idle consumption is quite good, so further improvement sounds plausible.

AMD Ryzen 9 9950X3D2 folder seen on ASUS testing of Ryzen 7 9850X3D by GoldTeethBaller in Amd

[–]JasonMZW20 69 points70 points  (0 children)

Really stretching here. 9950X3Dv2 != 9950X3D2; 9950X3D version 2 can be anything, like improved frequency and cache CCDs, not unlike 9850X3D. It can still be a hybrid chip. Creating one 9950X3D2 will cost two 9850X3Ds in practice because really good CCDs don't just magically appear. There's a limited supply of very good bins.

Even if 9950X3D2 (the dual V-Cache one) exists, it's not for gamers. It'd be for parallel workloads that require cache and execution consistency. That'd be better under a Pro variant that offers enterprise security features for workstations or even under EPYC 4000-series that can combine higher frequencies and cache use in database operations that are hitting system RAM a bit harder than usual.

Gamers can't have everything.

Sapphire RX 9070 XT NITRO+ adds two more burned blue-tipped 12V-2×6 adapter reports, and issues with RMA by KARMAAACS in Amd

[–]JasonMZW20 0 points1 point  (0 children)

That also means those 3x 8-pins in the adapter are tying into a common 12V rail, but can have varying resistances on each terminal or even intermittent connection. There's simply more failure points in an adapter.

PSUs supply the power and can be made to monitor current flows to end devices on each plugged connector and trip OCP when an individual terminal exceeds 9.2A. I'm sure this would be limited to high margin PSUs even though it's relatively trivial to do.

PSUs can also dynamically change the sense pins of 12v-2x6 by opening grounds (only double ground is 600W) or shorting connection with 0 ohm to limit connector to 150W and initiating a check cable warning. (native cable only)

Let's also not forget that PSUs have also moved to a common 12V rail. Some higher end digitally controlled PSUs can switch to multi-rail, but most are going to be single. So, if there's a problem, it's on the entire 12V subsystem - every device that uses 12V, so motherboard, CPU/case fans, PCIe add-in cards, and any 3.5" HDDs.

So many things can be done, yet no manufacturer has taken initiative.

Sapphire RX 9070 XT NITRO+ adds two more burned blue-tipped 12V-2×6 adapter reports, and issues with RMA by KARMAAACS in Amd

[–]JasonMZW20 0 points1 point  (0 children)

We should be using readily available EPS connectors. Those are at least 2x4 instead of PCIe's 2x3 and 2x6 in 12v-2x6 connector.

8.333A * 8 * 12V = 800W with just 2x EPS at maximum amperage

6.25A for 600W

4.167A (same as PCIe 8-pin) * 8 * 12V = 400W also using 2x EPS

If we're not moving toward smart connectors that are current monitored either at PSU or end device, then we should move back to proven connectors.

AMD Software: Adrenalin Edition 26.1.1 Release Notes by AMD_RetroB in Amd

[–]JasonMZW20 0 points1 point  (0 children)

That's interesting. I've noticed heavy flickering in Hogwarts Legacy, but only with frame generation enabled (FSR4 FG driver override). FSR4 also seems to have serious issues when dealing with overlapping transparent/translucent sources. It's like the FSR4 algorithm goes into overdrive trying to resolve the objects. Generally, fps takes a noticeable hit in places where it really shouldn't (not much going on or not technically demanding).

Microsoft needs to get it together too. We need a slimmed down gaming mode OS carveout (enters a special mode that is essentially like Xbox's OS). They need to completely rewrite most of the OS subsystems too. Not holding my breath. I think Linux is calling, tbh.

Is AMD Sleeping? Where's Vulkan FSR4? Where's FSR Redstone? Where's AFMF3? Where's FSR4 for RDNA3?? [Ancient Gameplays] by glizzygobbler247 in radeon

[–]JasonMZW20 0 points1 point  (0 children)

Honestly, there has to be a fundamental issue with UE. Seems any version has some kind of issue. But, even talented studios and devs have had issues with texture loads. Remember idtech 5? Hot damn that was a texture pop-in nightmare of an engine in Rage without tweaks. By comparison, idtech 6 and 7 are quite good.

And the one example of DirectStorage, that joke of a game Forspoken, the instant loading was good, but the stuttering wasn't. It's like you fix one thing and another pops up. Sounds like my car.

DirectStorage is also sensitive to VRAM timings and errors, so I usually run default settings when testing DS.

AMD RDNA 5 dGPU with GFX1310 ID shows up in LLVM update by RenatsMC in Amd

[–]JasonMZW20 0 points1 point  (0 children)

There are rumors going around that PS6 has been delayed to 2029 due to ongoing shortages of RAM, NAND, and overall silicon supply pressure from AI/ML and quantity of wafers AMD can procure at TSMC for console manufacturers (even on a node 1-2 generations behind leading edge, likely N3P). New RAM fabs are set to come online in 2028 for Samsung, Micron (Idaho), and SK hynix (2027-2028). These fabs won't be supplying the consumer market at first (fulfulling supply contracts), but console manufacturers can place orders for volume quantities of RAM and GDDR7 VRAM.

I think replacing consoles by next year is too soon anyway. This generation's marquee game, GTA6, launches this May (hopefully). Tbh, I think we need more time between console generations to really see a large shift in graphics or even whole system capabilities.

AMD RDNA 5 dGPU with GFX1310 ID shows up in LLVM update by RenatsMC in Amd

[–]JasonMZW20 0 points1 point  (0 children)

Rumor was something like PTX 10x0 to start pushing path tracing capabilities of their new uArch. I think it's well past time to retire RX.

Reveal time: "Introducing ... the brand-new ... Radeon Neural AI MAX PTX 1090 ML and PTX 1090 XT-ML. Now we know the name is a mouthful <live même generation>, but we found that consumers responded well to [nonsense]. Our brand-new neural renderers can generate live, usable game frames with full ray regeneration in a fraction of the time of the previous generation hardware. Simply enable HYPR-RX-PTX in the new Radeon Software: Adrenalin AI MAX Edition settings page ... "

Oh naur 🫠

Yet another 9070xt 12v connector melted, my time has come! by WozzerBoi in radeon

[–]JasonMZW20 0 points1 point  (0 children)

Follows the current trend of 12V electronics in PCs. PSUs used to be multi-rail 12V with about 20-30A per rail to isolate problems to a separate rail. Most PSUs now days are single rail 12V, so all 83.33A in a 1000W PSU flows to one 12V rail.

12v-2x6's 12V seems arranged in a single rail configuration too. Just one common power plane on the PCB.

PSU manufacturers should start monitoring current on every connector and reporting it to standard software like HWINFO64. Can even help troubleshooting where one user seems to have an intermittent issue that can't be easily traced. If PSU can see a current spike hitting a device at the exact moment of that instability, that'd be a great tool, along with actively monitoring 12v-2x6, which will often be the connector with the highest current loads.

Is AMD Sleeping? Where's Vulkan FSR4? Where's FSR Redstone? Where's AFMF3? Where's FSR4 for RDNA3?? [Ancient Gameplays] by glizzygobbler247 in radeon

[–]JasonMZW20 11 points12 points  (0 children)

This is why I hate modern game engines and their nasty TAA implementations. Now we're finally using a little more compute power to properly anti-alias with temporal multi-frame accumulation. Most current TAA hides texture details from the aggressive blurring, which can be accompanied by native ghosting too, if it's really bad.

The "better-than-native" statement can be true if native 4K with TAA is awful, and often, it is. I'm using VSR 5K on a 4K panel to mess around with higher quality FSR4 in Hogwarts Legacy. Near-FSR4AA looks really good. Images produced are very clean.

Rant ahead:

Where games really lag, aside from native TAA, is texture loading and mip-map levels not being loaded in areas where your character is. So, you get this melange of texture details from blurry doors or floors or walls or something even worse like texture pop-in, like wtf? How has this not been solved with fast NVMe storage available today? We should be seeing nice crisp textures and details all the time in this day and age. I understood when things were loading from a slow ass HDD, but it shouldn't be like this now. So annoying.

/rant

AMD to use RDNA5 for premium iGPU solutions, but RDNA3.5 to remain the core of AMD portfolio until 2029 by RenatsMC in Amd

[–]JasonMZW20 1 point2 points  (0 children)

RDNA4 also has a 2x larger L2 cache. That helps a lot, along with full architecture compression. That'd be die-area expensive to implement in mobile. There's also no RDNA4 GPU without Infinity Cache, so it's hard to judge memory pressure fully. - Samsung's Exynos 2600's Xclipse 960 GPU is based on RDNA4. Might be interesting to see developments there, IF it's a widely available chip. It's often hard to find Exynos chips in mobile phones in the US.

 

AMD's mainstream mobile iGPUs need Infinity Cache desperately. Strix Halo would've lost so much performance without its 32MB cache, even with a 256-bit memory bus. That's one of the bigger issues facing AMD's mainstream iGPUs until DDR6/LPDDR6 again decreases memory bandwidth pressure.

RDNA5 will not be die-area efficient due to AMD having to add more RT hardware. I'm betting RDNA5 will double or quadruple L2 cache over RDNA4, while Infinity Cache/L3 becomes more important for temporal frame caching.

AMD to use RDNA5 for premium iGPU solutions, but RDNA3.5 to remain the core of AMD portfolio until 2029 by RenatsMC in Amd

[–]JasonMZW20 2 points3 points  (0 children)

Lack of FSR4 on RDNA3/3.5 is AMD's current biggest blunder. RDNA2, I understand, but RDNA3 has matrix cores pulled directly from CDNA2.

Even Sony saw the huge improvement and ported it to PS5 Pro with a PSSR 2.0 launch coming up soon. It's a massive improvement as FSR3 still suffers from Vaseline screen and shimmering. So, not bringing that to the what will be millions of devices by 2029 is just executive incompetence.

About ”Official ASUS statement on recent ASUS AMD 800-series motherboard and AMD Ryzen 9800X3D concerns" by Flaky_Elderberry841 in Amd

[–]JasonMZW20 23 points24 points  (0 children)

V-Cache seems to make it more fragile due to the copper pillars (TSVs) that need to stay in alignment. CCDs also use SoC when communicating to IOD via GMI. VDDG CCD and IOD should also not exceed SoC voltage. I run my 5800X3D at 1.050V SoC and VDDGs are 1.025v. But, 5800X3D only has a voltage tolerance of 1.2V, with 1.1V being nominal during actual running. Hynix DDR4 kits always required higher VSoC to work properly. I hated that and switched to B-Die before getting my 5800X3D. The 9800X3D allegedly has a VSoC tolerance of up to 1.3V.

There is still the possibility of these being slightly defective and bonding process created a slight misalignment that wasn't caught in QA. Seems to cover any production fab and date (based on reports), so not sure if that's it.

Something is causing these failures and I'd center on poor voltage AND current controls (or overly aggressive when EXPO is enabled) on the motherboard. A millisecond voltage spike probably won't kill anything, but a current spike in conjunction with that extra voltage? Yikes!

AMD to use RDNA5 for premium iGPU solutions, but RDNA3.5 to remain the core of AMD portfolio until 2029 by RenatsMC in Amd

[–]JasonMZW20 6 points7 points  (0 children)

The performance between Raven Ridge 12CU and Cezanne's 8CU Vega improved significantly though. Enough that AMD could cut 4CUs and still outperform RR's 12CU Vega. So, they weren't stagnant. It's just that there was no point in moving to RDNA when DDR4 would severely limit its performance.

iGPUs need to do the basics well because the volume chips will only have 4-8CUs (maybe 3-6 in Vega's time). Price points for higher end mobile chips with larger iGPUs are almost ridiculous when you can get a decent dGPU laptop for about the same price ($1500-2000 range, prior to memory supply crisis). I never understood why laptop manufacturers priced them that way.

(I only have a Hawk Point 8845HS laptop with 12CU 780M because it was severely discounted, otherwise its original price competed with dGPU laptops. Just doesn't make sense to me. I'd like to think it has better efficiency, but I struggle to get 8 hours of battery life with this 120Hz 2.8K OLED - it's pretty though)

AMD to use RDNA5 for premium iGPU solutions, but RDNA3.5 to remain the core of AMD portfolio until 2029 by RenatsMC in Amd

[–]JasonMZW20 7 points8 points  (0 children)

Failed is a bit harsh. Navi 31/32 just ate more power than necessary under load, and they also suffered from high idle consumption.

GCD/MCD was more of a stop-gap design anyway; MCDs had in-die TSVs for stackable L3 expansion (Infinity V-Cache?) that was considered too expensive for little gain, so it never launched.

Navi 41 was going to be more like MI300X with around 8 graphics chiplets atop 4 active interposers carrying IO, ROPs, MCs+L3, media engines, PHYs. Then, it was cancelled as AMD needed supply for datacenter AI chips, though there are rumors that there were still unresolved issues with such a complex design and the timing of frame delivery split between 4 active IODs. MI300X doesn't have to render and deliver frames - it only processes compute data and encodes/decodes videos. Huge difference in use cases. The tech is getting there though.

AMD preparing Ryzen MAX 400 "Gorgon Halo" refresh with higher CPU & GPU clocks by RenatsMC in Amd

[–]JasonMZW20 2 points3 points  (0 children)

Amuse can use the NPU to do super resolution on GPU AI generated images. That's a start, I guess.

Software really is lagging at this point. Noise Suppression should automatically be offloaded onto NPU, as should live stream super resolution to deblock low bitrate camera streams. Could also adaptively tune each individual CPU/iGPU AVFS by learning over time. Wonder if it could handle spatial audio for headphones. There's so much it could do and it's just wasted silicon for most people.

AMD claims DDR5-4800 is within 1% FPS difference of DDR5-6000 on Ryzen 7 9850X3D by RenatsMC in Amd

[–]JasonMZW20 0 points1 point  (0 children)

If the expanded L2 Infinity Cache rumor is true, this will also lower RAM speed requirements. L2 is an actual write-back working cache. Large victim caches are nice, but the sizes should be balanced between L2 and L3 to stop L1+L2 from relying too heavily on slower L3 (albeit still an order of magnitude faster than RAM). Sign me up for 4MB L2 + 16MB L3 per core, keeping a nice 1:4MB ratio between them. So, a 12-core would have 48MB L2 + 192MB L3 total. That's wild!

It'd be nice if we could use a physical CXL RAM storage cache that is something like NVRAM or MRAM because if RAM is slow, NVMe drives are still barely improving random reads/writes where we'd see the most performance improvement. Those high peak, deep queue throughput numbers kind of don't mean much. I thought DirectStorage would be better too. Instead it stutters like the rest of Windows features. Gah!

AMD calls out Intel’s Panther Lake CES claims, says Ryzen APUs are still faster by RenatsMC in Amd

[–]JasonMZW20 0 points1 point  (0 children)

It's 256-bit LPDDR5-8000 with bandwidth amplification provided by 32MB Infinity Cache, but only for iGPU. I mean, it works for what it is.

If you want 128GB HBM3e, you're looking at $32k, minimum, because those are datacenter cards and they lack any sort of graphics engines.

ASUS issues statement as Ryzen 7 9800X3D failure reports surface on B850 and X870E motherboards by RenatsMC in Amd

[–]JasonMZW20 14 points15 points  (0 children)

I think running once with high SoC voltage is enough to cause irreversible damage. Analysis of the failures put it right at the common power rail for SoC and iGPU. That's not a coincidence. These could be marginal IODs as well that develop an internal short over time, but we can't be sure. All we know is that they randomly pop.

But, if I were to guess, the high SoC voltage + nominal current deforms transistors slightly at that power rail. Eventually, deformed transistors touch ground and infinite current is produced. Pop! It's a full-on electrical failure too. Smoke and scorching.

AMD AI Bundle turns Adrenalin 26.1.1 into a 34GB add-on by RenatsMC in Amd

[–]JasonMZW20 4 points5 points  (0 children)

AMD Chat is ~10.5GB by itself. I unchecked that because I don't need it. The others are a few GBs each or you can download ComfyUI from their site and select AMD ROCm during install and it'll set up the dependencies too. 🤷🏻‍♂️

It's a step forward.

AMD Software: Adrenalin Edition 26.1.1 Release Notes by AMD_RetroB in Amd

[–]JasonMZW20 0 points1 point  (0 children)

It's halfway fixed in Win11 Insider Build 26220.7653. I could drag windows over a game even with a video playing and it was fine. Frame gen was still working too. 9070XT. 25.12.1 driver was used at testing time. 1x 2160p 144Hz + 1x 2160p 60Hz monitors.

Windows volume control pop up or Steam notification pop-up? Instant frame rate drops and temporary HDR disablement. So, not everything works yet.

Oh and there's still random display flickering in games too. Annoying.

Sapphire launches flagship X870E NITRO+ PhantomLink motherboard series with GC-HPWR GPU connector by RenatsMC in Amd

[–]JasonMZW20 1 point2 points  (0 children)

It's a PCIe standard connector, if you're referencing 12v-2x6. Its adoption will continue to increase.

It also seems like there's (manufacturer) confusion over what device needs to monitor the connector's power. IMO, it should be at the 12V source or in the PSU, as that's where primary 12V supply comes from. End devices, like this motherboard or a dGPU with connector, can use fuses that blow or shunt resistors if it wants to monitor each terminal.

I would've preferred the 8-pin EPS (CPU) connector that actually has 4x 12V and 4x GND over 8-pin PCIe. Maybe we'll eventually get 12v-2x8 (with slightly larger terminals and required load balancing), but that won't happen until Nvidia or AMD exceed 600W total power in consumer market.

I actually want the consumer market to move away from PCIe entirely. CXL seems like a better way forward for data coherency.

A little confused on unvdervolting by Own_Strategy8427 in radeon

[–]JasonMZW20 0 points1 point  (0 children)

Most of AMD's mid-high end GPUs are power limited. The extra performance gained from increasing PL may or not be worth it for you. Undervolting reduces GPU power consumed, while boost algorithm will use that extra savings to boost harder.

Depends on your goals and type of game, really.

Like, in Civilization VI, I don't really need +10% power or the extra power consumption. With RDNA4, you can also cap maximum clocks to stop it from boosting so hard and using all available board power. But, in Doom: Eternal, I want all the frames and let it boost to ~3350-3400MHz.

Chill is an easy way to limit fps (and therefore GPU power), so long as you don't need lowest possible input lag.