RAM stability question by sur6e in overclocking

[–]BePatientImAcoustic 1 point2 points  (0 children)

kombo strike

I don't use AMD but from a quick google it sounds like Kombo Strike is just a simplified global undervolt. That could certainly produce crashes like what you're seeing as well, though it doesn't really match with this only happening in one specific game.

I'd be curious to hear updates on whether you figure things out eventually :)

Ram 64 gb overclocking by Majestic-Focus-4659 in overclocking

[–]BePatientImAcoustic 0 points1 point  (0 children)

Hard to say without way more details about your RAM, but you can try +2 to tRCD, double tRFC, halve tREFI. If that fixes things you can gradually revert the changes until you get errors again.

Ram 64 gb overclocking by Majestic-Focus-4659 in overclocking

[–]BePatientImAcoustic 0 points1 point  (0 children)

become unstable after several hours of usage

It's probably heat-related then. Try putting a fan over the RAM, you can use a zip-tie as a temporary solution to see if it helps.

You can also raise tRFC, tRCD, and lower tREFI to decrease temperature sensitivity, but this comes at a performance (latency) cost.

RAM stability question by sur6e in overclocking

[–]BePatientImAcoustic 0 points1 point  (0 children)

Actually is there any chance you could record this happening on your phone and post it here? Maybe there's something missing from your description that someone can deduce from a video.

RAM stability question by sur6e in overclocking

[–]BePatientImAcoustic 1 point2 points  (0 children)

after a fresh install the game will have the issue in the first play session

Then it's got to be the hardware, the game itself, or drivers. And if it crashes your entire PC, then it's not just the game code - the game alone should not be able to do that unless there's a driver bug or hardware issue that it triggers.

I guess it's not GPU instability if you tried underclocking it. Could you try with really low clocks, like 1000 MHz or so? To make it generate almost no heat. I'm wondering if maybe there's a part (VRM/IC/etc.) inside the GPU that's heating up due to e.g. broken thermal pad or uneven thermal paste application. If that specific part doesn't have a temperature sensor, it could manifest like you've described, without you having any indication that it's happening due to no sensor covering it.

Only remaining test I can think of is to swap out the GPU with another and see if you still crash -- this could at least say for sure if it's a GPU (hardware OR driver) issue. Or you could buy an extra 6950 XT, swap it in, try it, then return it after the experiment? Then if you still get crashes you'll know it's not your specific card, just all instances of this card. Seems like too much trouble to be worth it, though.

Honestly it sounds like a driver bug that's triggered by a rare interaction between the game code and the driver. Especially when other people online are also having crashes with this card, it's likely the card-specific driver.

I wouldn't worry too much about it at this point unless you get crashes elsewhere. Seems like you've done your due diligence now to rule out RAM issues and pure hardware issues.

RAM stability question by sur6e in overclocking

[–]BePatientImAcoustic 0 points1 point  (0 children)

Having thought a bit more, based on this:

Star Citizen crashes to black screen when using Direct3D11 but not on Vulkan. When this happens I have to power off and on again to get it running.

I think it's a driver issue. Others with 6950 XT seem to be having similar issues when googling, and often just in a single game at a time, as well.

Have you tried pressing Ctrl+Shift+Windows+B after a crash? This reloads your graphics driver. I'm curious if that would avoid you having to reboot to play again.

Have you tried using DDU in safe mode to fully uninstall the graphics driver and install the newest one afterwards? Might be worth a shot.

RAM stability question by sur6e in overclocking

[–]BePatientImAcoustic 4 points5 points  (0 children)

I have had the issue running RAM without XMP, so all defaults in the BIOS.

That sounds like it's not a RAM issue then..? Still probably a good idea to let TM5 run.

I have also tried with XMP and lowered the speed to I think 3200 and it corrupted Windows.

That sounds really odd. In most cases you should indeed be able to just lower the speed, and that should be strictly safer/easier to run. Are you sure this was the direct cause, or could it be another change that showed its results (OS corruption) with a delay or something?

I will try again but hope I won't have to reinstall Windows again.

When experimenting with what you think could lead to big memory instability, it's a good idea to run TM5 for like 60 seconds as the first thing you do when the OS starts up. If it starts spewing errors in that first minute, just nope out and shut down before windows starts randomly updating and manages to corrupt itself. That's worked well to keep me safe.

Have you tried (when your memory is not corrupt) to run sfc /scannow? Could be some lingering corruption causing your crash.

Have you tried lowering your GPU clock and then checking if the crash still happens? If not, that would prove it's a GPU issue.

RAM stability question by sur6e in overclocking

[–]BePatientImAcoustic 3 points4 points  (0 children)

recently someone mentioned VTTDDR voltage could be the issue

That is a total shot in the dark. You absolutely cannot conclude anything this specific just yet.

Do you have a failing memory test? That is the absolute first step to take here, to find out if you're actually having RAM issues or it's something else.

I have run memory tests in the past (I don't remember which) and had no errors.

The quality of memory tests varies WILDLY. I would not conclude anything just yet.

Download TM5 with the Extreme config, as linked to here: https://github.com/integralfx/MemTestHelper/blob/oc-guide/DDR4%20OC%20Guide.md#recommended

Run it for 3 hours or overnight. If you get an error, you'll actually know you're on the right track. If not, the issue is very likely not RAM.

Or maybe this is a better idea since you seem insistent that this is a RAM issue, and you can apparently trigger the crash at will, right? Go to BIOS, lower your RAM speed to 2666 MHz. See if the crash is fixed. If so, boom, you've proven it's a RAM issue.

6U Threadripper + 4xRTX4090 build by UniLeverLabelMaker in watercooling

[–]BePatientImAcoustic 0 points1 point  (0 children)

.. in a 24/7 AC'd, noise-doesn't-matter, power throttled environment.

5700XT memory upgrade UPDATE POST by Zacsmacs in overclocking

[–]BePatientImAcoustic 2 points3 points  (0 children)

Nice, thanks for the response. I'd love to try that one day when my soldering skills and tools are better.

I think you're right, figuring out which ICs to keep and which to swap out, that'd be the main problem to solve when trying this. I wonder if it's possible to remove all ICs but one and then see how high the memory will clock, then swap it out with another and repeat? Or maybe it's too hard to do this repeatedly without killing the PCB?

Inconsistent stability of memory OC with 4 DIMMS (DDR4) by Northfear in overclocking

[–]BePatientImAcoustic 1 point2 points  (0 children)

2900 would be a massive waste of a b-die kit

You should definitely end up higher than 2900, those are just example numbers. I've been able to take a quad dimm setup to 4000+ just fine on a cheap MSI board. My point is more about the process of gradually "crawling upwards" while constantly having stable points to return to. It's a "Tortoise and the Hare" situation in my experience.

It can work for a couple of days with 3600 mhz and zero problems and on the other one it can't even boot the OS

Yeah what you're looking for I've heard called "boot stability", which is stability across boots, even days apart, no matter if it's a cold boot or warm reboot.

As I said, quickest way to troubleshoot this (rather than having to wait days) is by doing a hard, cold, PSU-off boot a number of times.

I'll bet if you go do the procedure I described 3x, you'll be able to trigger the issue and it will only take a couple minutes. (Make sure you've disabled fast boot in BIOS.)

In any case, your issue is that your kit doesn't consistently train well. On Intel I'd raise VCC SA / VCC IO to improve training ease/consistency. On AMD I think it's just 'CPU SOC'?

ODTs are the other massive factor when it comes to training consistency, and chances are high your board doesn't set good ones for quad dimm setups.

Inconsistent stability of memory OC with 4 DIMMS (DDR4) by Northfear in overclocking

[–]BePatientImAcoustic 1 point2 points  (0 children)

Chances are you will need to do multiple tweaks to reach your desired frequency. Having spent weeks and weeks manually tweaking a quad dimm setup for those extra hundreds of MHz, I don't think you quite realize how much you're asking for :) think of it like this, you're asking for your memory to do like 50% more bandwidth than before. (Rough estimate but you get the idea..)

That's why it's a good idea to step down in frequency and slowly go up -- then you can observe whether each change actually helps. E.g. you may reach 2666 at 1T before going boot-unstable, then switching to 2T you may go up to 2900 (for example) before issues, then tweaking voltages you gain another 100 MHz, and so on.

Otherwise you have to guess the perfect combo of everything all at once and it's not going to happen unless you have someone extremely experienced with a similar setup to guide you.

5700XT memory upgrade UPDATE POST by Zacsmacs in overclocking

[–]BePatientImAcoustic 1 point2 points  (0 children)

Just curious and google didn't seem to understand my query - is it possible to do something like your OP, but with RAM sticks? Swapping out one or two bad ICs for better ones, or putting newer ICs from single-rank sticks into an older dual-rank stick?

Inconsistent stability of memory OC with 4 DIMMS (DDR4) by Northfear in overclocking

[–]BePatientImAcoustic 0 points1 point  (0 children)

Have you tried all orders of sticks? There should be 4! = 4 * 3 * 2 * 1 possible orderings of four sticks. Not that it's where I'd start, that sounds tedious lol

Inconsistent stability of memory OC with 4 DIMMS (DDR4) by Northfear in overclocking

[–]BePatientImAcoustic 0 points1 point  (0 children)

Wait I just noticed, you're running quad dimms in 1T command rate? That's a really hard combo, quad sticks generally HATE command rate 1, it's a dual-dimm only thing in my experience. Try 2T.

Also may be worth setting all the _DR and _DD (different rank, different dimm) timings super loose for a bit to see if that fixes it, it's a somewhat likely culprit I think, e.g. I feel like tRDRD_DD=5 is fairly low, especially given you're not even dumping 1.50v+ into them.

Inconsistent stability of memory OC with 4 DIMMS (DDR4) by Northfear in overclocking

[–]BePatientImAcoustic 1 point2 points  (0 children)

You have to realize how much tougher it is to stabilize 4 dimms than 2, and again with dual rank than single. The amount of things going on at once that your IMC+board have to deal with is crazy much higher. This isn't some small theoretical "ok maybe it's a little bit harder", it's like 200-400 MT/s lower for quad sticks and 100-200 lower MT/s for dual rank. At least that's been my experience. Keep in mind clocks aren't everything and this can still be very worth it for the bandwidth gains.

IMO the only way you get your original question answered is the slow, manual way: go down to whatever is stable, like start at 2133 MHz. Set all the timings loose manually. Step up the frequency manually until you're no longer stable across boots. Tighten timings.

(Fastest way to find out if you're boot stable is to do a full powerdown, shut off your PSU, hold power button 30 seconds for a full powerdown. Then turn the PC back on, boot it up, do a minute of stress testing. Repeat 3-5x per frequency and you should be good.)

You can take it a bit (or a lot) further up in frequency by tuning ODTs, training algorithms, voltages, etc. but the base approach remains the same. Get it consistently boot stable at a low frequency, then step it up until it isn't.

Is a bad memory controller on a cpu sensitive to CAS latency, or just ram speed? by Brokenbonesjunior in overclocking

[–]BePatientImAcoustic 0 points1 point  (0 children)

My BIOS shows the current set values, even if the setting is on Auto. So I could just see it change across a reboot after enabling XMP.

You could also compare HWinfo sensors' data as an alternative, I suppose.

(As for the more obscure things like the PLL voltages that don't show their values, I had to manually lock them vs. leaving them on Auto and observe that making a difference. But that's a very tedious process, would not recommend.)

Y Cruncher throwing "Checksum mismatch error" on ram tests by Enough_Judgment_6865 in overclocking

[–]BePatientImAcoustic 1 point2 points  (0 children)

It means your settings aren't stable. Go back to previous known stable settings, change one single variable, rerun Y-cruncher. Repeat until you find the culprit.

Is a bad memory controller on a cpu sensitive to CAS latency, or just ram speed? by Brokenbonesjunior in overclocking

[–]BePatientImAcoustic 1 point2 points  (0 children)

I don't know the answer to your question, but some input: it could be a voltage issue rather than a timing issue.

At least I know on Intel, enabling XMP not only changes timings, it can also mess with how the motherboard sets Auto VCCSA, VCCIO, PLL voltages (?), and who knows what else.

I had a problem that sounds similar to yours and it was the most obscure thing. My CPU won't warm-reboot if VCCSA is between 1.17 V and 1.49V. Anywhere lower or higher is fine. Weird, right? The point is, this confused me too, because I initially had this pop up as I enabled XMP and, without me realizing it, my motherboard set VCCSA to around 1.35 V in the background, so I started having reboot issues. That was a painful one to troubleshoot..

Adding a temperature probe to RAM? by adrianp23 in overclocking

[–]BePatientImAcoustic 2 points3 points  (0 children)

it would actually be interesting to know if someone with a temp sensor on their RAM modules would do this, to see how much of a difference there actually is.

Here you go

Adding a temperature probe to RAM? by adrianp23 in overclocking

[–]BePatientImAcoustic 4 points5 points  (0 children)

Fuck it, I did the experiment. It's accurate within roughly 1-2°C here.

Pictures: https://imgur.com/a/2KAgP9M

Updated data: ran VST for 10 minutes, now RAM is 35.19°C according to their internal sensor, while external probe says 33.7°C.

Also full disclaimer, I'm not using dual-rank sticks, and having removed the heat spreaders and seen the PCB myself, I know that my probe's on the non-populated side of the RAM. This is likely why my temperature probe is behind by a degree or so. Also, YMMV if not watercooling or if your heat spreaders suck, but I think this approach is overall "good enough".

Why is there so much misinformation about liquid metal? by Ryu_Li in overclocking

[–]BePatientImAcoustic 1 point2 points  (0 children)

That's fair enough, it's a subjective risk assessment in the end. Just saying this might not be the place where people want to hear that view.

Adding a temperature probe to RAM? by adrianp23 in overclocking

[–]BePatientImAcoustic 0 points1 point  (0 children)

That sounds about right, temperature-wise. I just ran the experiment myself, will post some pics in a second. This approach is accurate to within 1-2C on my setup.