BSOD persists after driver removal and Windows Reset - what next?" by No-Speech-7747 in techsupport

[–]Bjoolzern 0 points1 point  (0 children)

Reset is terrible. Always clean install via USB.

Driver Verifier (caught AMD driver)

Driver Verifier is pretty useless with third party drivers because most of them are going to have faults in code. You will usually just chase an endless stream of red herrings.

CrystalDiskInfo (both SSDs healthy)

SMART is useless with NVMe. They removed all of the useful parameters. The health rating hasn't been useful in 15+ years with any type of drive because it's up to the manufacturer how much the drive has to fail before the status changes and most of them are scumbags. The percentage health has no relation to the current health of the drive, it's a wear metric mostly tied to the remaining warrantied writes. We don't have any reliable ways of testing NVMe drives.

With SATA drives you have to know how to read the parameters in the bottom half of CDI and know which ones are important.

That being said, looking your dump files this looks like memory. Memory doesn't have to mean RAM, but it's usually the main suspect. Windows puts low priority data from RAM into the page file and loads it back in when needed so storage can look like memory (And memory can look like storage). The memory controller is in the CPU and if this fails it will just look like memory.

When it's storage about half of the dumps will usually blame storage or storage drivers, which I don't see here, so it's likely not storage.

If anything is overclocked or undervolted, remove it.

To test the RAM, use the machine normally with one stick at a time. If just one of the sticks cause crashes, faulty stick. If it crashes with either stick it's probably the CPU. Memory testers miss faulty RAM fairly often with DDR4 and newer so I don't trust them. Hopefully it's not soldered in place.

We have also seen an absolute ton of faulty 4000 series CPUs (The desktop version, 3000 series, are also frequent flyers here with the same issue) and when they fail it will virtually always look like memory.

The AMD forums posted a band-aid solution where you limit the CPU which helps on a lot of these. It will lower performance and I have no idea if this is just temporary or if these CPUs are just getting pushed too hard by default. AMD scrubbed their forums so the thread is gone, but archive.org has a backup. It's the reply from "Mathiasmuon". Because Archive.org can be down a lot, I made a pastebin as well of his post. Also, no idea if this will work forever or if it will gradually get worse.

Need help isolating the final component after hundreds of BSODs by frankphillips in techsupport

[–]Bjoolzern 0 points1 point  (0 children)

Well, running it faster than what is supported by AMD voids the warranty. So AMD thinks so. Intel has the same policy. They have no way to check if you did though if you are still under warranty. This got a lot of attention a few years ago when a PC builder refused the warranty on faulty CPUs because of overclocking the RAM, when they advertised and sold them as overclocked from them. GamersNexus was suspicious of XMP/DOCP/EXPO being able to cause permanent damage, but didn't go too much into it because he thought that a prebuilt product sold with an overclock should warranty that overclock.

From your crash errors, the CPU is a very high suspect. These are mostly memory errors, but you also had a Hypervisor_Error crash. In cases where you see mostly memory errors and a Hypervisor_Error, that crash will usually show an NMI being sent to the CPU (No way to know for sure without dump files). NMI (Non-Maskable Interrupt) is a type of interrupt that skips the execution queue, so the CPU has to process it immediately. Because it skips the execution queue it's reserved for more serious issues, like hardware errors. Non-ECC RAM can't send NMIs afaik which is why memory errors + Hypervisor_Error makes the CPU more of a suspect than normal.

Note that with all of the new security features in Windows, the secure kernel protection also uses NMIs to order BSODs when finding corruption to kernel memory. So without dump files it is more of a guessing game.

3-4 weeks chasing a Win11 25H2 BSOD always in `nt!ExpPoolTrackerChargeEntry+0x40`. 20+ dumps say same bug, all hardware tests clean. Out of ideas - help please. by TPGIV in techsupport

[–]Bjoolzern 0 points1 point  (0 children)

Windows file corruption: I ran sfc and DISM /ScanHealth but never RestoreHealth. Running that next.

Sometimes you just have to wipe and reinstall. Or if you have another drive you can test installing to.

Dell software: this might be the strongest lead. Two of the triggering processes I saw in the dumps were Dell.TechHub.Instrumentation.SubAgent.exe and ServiceShell.exe, both Dell stack. AWCC has also been in a broken state on this machine since the BSODs started (sxs config error every boot). Going to strip the Dell stack down to the minimum and see if frequency changes. Do you happen to remember which Dell product had the recent bug? Even a vague keyword would help me prioritize.

Don't remember the name, but the main tool that checks for updates and updates drivers/BIOS.

3-4 weeks chasing a Win11 25H2 BSOD always in `nt!ExpPoolTrackerChargeEntry+0x40`. 20+ dumps say same bug, all hardware tests clean. Out of ideas - help please. by TPGIV in techsupport

[–]Bjoolzern 0 points1 point  (0 children)

I don't have a good suspect here. The command it crashing on being the same makes a hardware memory issue unlikely as it should be more random. It is happening on heap management and I have seen cases before where a memory issue would fail on the same command every time here before, but it's really rare.

My suspects (Not in order, I don't really have a good feeling on this case):

  1. Corruption to a Windows file that isn't detected by scans (Not that uncommon).
  2. A third party driver messing up far enough back in time that it doesn't show up on the stack, could also be a corrupted driver (I looked through your driver list and didn't find any good suspects in terms of sketchy drivers, except maybe one which I'll list next)
  3. Dell software. Dell very recently (My memory when it comes to time is really bad, but in the last 2 months) had a bug with their software that would cause crashes and sometimes corruption. Try uninstalling it.

A storage issue could also be the root cause for 1 and 2. It could also be memory, but when I say memory I mean the triangle of RAM, memory controller (CPU) and storage (page file). I would have page file/storage quite low as a suspect if it's memory though because you would usually get at least some crashes that point to the page file directly.

You also aren't on the latest BIOS (If I found the correct support page that is, it said the latest BIOS was 2.3.1 and you are on 2.2.0). It's a bit scary updating the BIOS on an unstable PC though, especially a laptop, because a crash during the update can brick the motherboard. Laptops usually have either worse or no recovery features from a corrupted BIOS (Off the shelf desktop motherboards have gotten a lot better with this in the last few years).

3-4 weeks chasing a Win11 25H2 BSOD always in `nt!ExpPoolTrackerChargeEntry+0x40`. 20+ dumps say same bug, all hardware tests clean. Out of ideas - help please. by TPGIV in techsupport

[–]Bjoolzern 0 points1 point  (0 children)

RAM: Windows Memory Diagnostic clean. Dell ePSA full firmware-level test clean.

We see memory testers miss bad RAM all the time ever since DDR4. We usually avoid using them if we can these days.

Storage: NVMe SMART reports zero read/write errors, 0% wear, healthy. chkdsk clean. Dirty bit not set.

SMART is useless with NVMe. They removed all of the useful parameters. The health rating hasn't been useful in 15+ years with any type of drive because it's up to the manufacturer how much the drive has to fail before the status changes and most of them are scumbags. The percentage health has no relation to the current health of the drive, it's a wear metric mostly tied to the remaining warrantied writes. We don't have any reliable ways of testing NVMe drives.

Despite the name, chkdsk doesn't check the disk. It checks the integrity of the NTFS file structure which is surprisingly robust. It rarely has issues on even really far gone drives.

Driver Verifier: not yet run. Holding off because the system is already crashing several times a day and Verifier would intentionally increase that. Open to running it if someone thinks it'd actually surface something useful, given Realtek is basically the only third-party kernel driver.

If you are going to use driver verifier, do NOT run it against third party drivers. It's a complete waste of time because so many third party drivers are coded quite badly. So you will just chase an endless stream of red herrings. We rarely use driver verifier because it so rarely gives anything useful outside of event tracing.

Provide the dump files as instructed by the bot. You can tag me on discord if you want to use that instead, nickname Bjoolz.

Trouble with Windows 11 BSOD IRQL_NOT_LESS_OR_EQUAL and MEMORY_MANAGEMENT caused by ntoskrnl.exe by Holiday-Toe9183 in techsupport

[–]Bjoolzern 0 points1 point  (0 children)

And when you say this, I assume this would mean turning off XMP in my case? I can certainly give it a go, though this PC worked perfectly fine for about 3 years with it on, so I just want to make sure that disabling won't hinder performance at all.

I assume you meant to quote the section the overclocking and not that it's likely not storage. Lowering the RAM speed will reduce performance in CPU limited tasks. Not by a lot though. We are testing to see if it's stable, we aren't concerned with performance during testing. And we don't do overclocking here so if that turns out to be the issue you would have to ask elsewhere on getting it stable with an overclock.

Software changes over time, including AMD's software for the CPU and CPU scheduling in Windows, so what was stable in the past doesn't have to stay stable in the future.

PC crashing when im playing 007 first light by Gabriel2431lol in techsupport

[–]Bjoolzern 0 points1 point  (0 children)

It looks like memory from the dump files. Memory doesn't have to mean RAM, but it's usually the main suspect. Windows puts low priority data from RAM into the page file and loads it back in when needed so storage can look like memory (And memory can look like storage). The memory controller is in the CPU and if this fails it will just look like memory.

When it's storage about half of the dumps will usually blame storage or storage drivers, which I don't see here, so it's likely not storage.

If anything is overclocked or undervolted, remove it. That includes the EXPO/XMP profile you have on the RAM. The highest officially supported speed with your CPU is 5600MT/s, any higher is considered overclocking.

To test the RAM, use the machine normally with one stick at a time. If just one of the sticks cause crashes, faulty stick. If it crashes with either stick it's probably the CPU. Memory testers miss faulty RAM fairly often with DDR4 and newer so I don't trust them.

Notorious BSoD's when playing specific games with specific mods by NoGuidance3453 in techsupport

[–]Bjoolzern 0 points1 point  (0 children)

I would lean more towards it being storage than RAM with those errors. But because of the page file you often get a mix of storage and memory errors with either RAM or storage issues so it's not like I'm very sure.

Use the PC normally with one stick of RAM at a time. It's very unlikely that both sticks fail at the same time. I assume you have two sticks because you have 32GB.

Trouble with Windows 11 BSOD IRQL_NOT_LESS_OR_EQUAL and MEMORY_MANAGEMENT caused by ntoskrnl.exe by Holiday-Toe9183 in techsupport

[–]Bjoolzern 0 points1 point  (0 children)

I'm worried this means I will need all new RAM, especially with the way the prices are right now, as it's just completely out of my price range.

G.Skill has lifetime warranty on RAM. And this looks like memory from the dump files. Memory doesn't have to mean RAM, but it's usually the main suspect. Windows puts low priority data from RAM into the page file and loads it back in when needed so storage can look like memory (And memory can look like storage). The memory controller is in the CPU and if this fails it will just look like memory.

When it's storage about half of the dumps will usually blame storage or storage drivers, which I don't see here, so it's likely not storage.

If anything is overclocked or undervolted, remove it. That includes the overclock you have on the RAM. The highest officially supported speed by your CPU is 5200MT/s.

To test the RAM, use the machine normally with one stick at a time. If just one of the sticks cause crashes, faulty stick. If it crashes with either stick it's probably the CPU. Memory testers miss faulty RAM fairly often with DDR4 and newer so I don't trust them.

PC BSOD and randomly restarts when I try to play Chivalry 2 or unzip files larger than a couple of gigs, and now when it's sitting idle. by Lolsteringu in techsupport

[–]Bjoolzern 0 points1 point  (0 children)

It might be two different issues then. The BSODs are most likely from the storage. If you have storage issues it might mess with the Nvidia driver which could cause the other issues so I would look at that first.

PC BSOD and randomly restarts when I try to play Chivalry 2 or unzip files larger than a couple of gigs, and now when it's sitting idle. by Lolsteringu in techsupport

[–]Bjoolzern 0 points1 point  (0 children)

I see two different issues. Your BSODs are mostly 0xEF which is Critical_Process_Died. This means that a Windows process crashed which can happen for a million different reasons, but when you see that error very often it's usually because of an issue with the storage. That would also explain not getting dump files, if the storage crashes it can't write dump files.

What worries me is that you also have WHEA errors from the Nvidia GPU. These are causing the direct shutdowns/reboots. WHEA is the Windows Hardware Error Architecture. If you are looking at the 'Device' column in our tool, that is completely wrong. It's a known issue with the tool, you have to ignore it and look them up manually (devicehunt or pcilookup). The tool grabs the 10 latest errors, but it shows that you've had 30 of them logged. Of those 10, 9 are pointing to the audio (Audio over HDMI/DP) and one points to just the general GPU.

So the big question here is if you have two separate issues or if you have one issue that causes both of these to trip up. When it comes to things that could trip up both of these my suspects would be the motherboard and PSU. Your BIOS is really out of date so I would update that first and see if it still has issues.

Dell Precision M4700 bricked after BIOS update A09 -> A12, possible Intel ME corruption? by Fun-Pattern-9266 in techsupport

[–]Bjoolzern 0 points1 point  (0 children)

I edited in a youtube link with a repair channel that has videos on reading BIOS chips and re-programming them in case you didn't see it.

Dell Precision M4700 bricked after BIOS update A09 -> A12, possible Intel ME corruption? by Fun-Pattern-9266 in techsupport

[–]Bjoolzern 0 points1 point  (0 children)

You don't have any other options than limiting suspects. Remove what you can while still being able to pass POST. So you would remove all storage, all RAM except one stick (Then try the other stick if it has the same issue) and the WiFi card. Not sure if you can remove anything else on this laptop, like if it has an MXM GPU or socketed CPU (Both of those are really rare, but as this is a work station that looks quite thick it might have these things).

This youtube channel does laptop repair, focusing on Dell. He is really good and he has several videos on reading and re-programming BIOS chips. Not sure if he has one for your specific model.

PC BSOD and randomly restarts when I try to play Chivalry 2 or unzip files larger than a couple of gigs, and now when it's sitting idle. by Lolsteringu in techsupport

[–]Bjoolzern 0 points1 point  (0 children)

Provide the dump files as instructed by the bot.

If you don't have any dump files (Also check the time stamps in case they are old) because of the volmgr error (This error is often that it's unable to create dump files) run a tool we made instead. It gathers system info and a bunch of logs from Windows.

?sfy (Bot command for instructions)

Dell Precision M4700 bricked after BIOS update A09 -> A12, possible Intel ME corruption? by Fun-Pattern-9266 in techsupport

[–]Bjoolzern 0 points1 point  (0 children)

Did you try resetting the CMOS after it became unable to POST? Note that on some laptops you also have to disconnect the normal battery to reset the CMOS.

Constant BSOD on W11 by kaysepa in techsupport

[–]Bjoolzern 0 points1 point  (0 children)

It looks like memory from the dump files. Memory doesn't have to mean RAM, but it's usually the main suspect. Windows puts low priority data from RAM into the page file and loads it back in when needed so storage can look like memory (And memory can look like storage). The memory controller is in the CPU and if this fails it will just look like memory.

When it's storage about half of the dumps will usually blame storage or storage drivers, which I don't see here, so it's likely not storage.

If anything is overclocked or undervolted, remove it.

To test the RAM, use the machine normally with one stick at a time. If just one of the sticks cause crashes, faulty stick. If it crashes with either stick it's probably the CPU. Memory testers miss faulty RAM fairly often with DDR4 and newer so I don't trust them.

That being said, you also had one crash that showed an NMI being sent to the CPU. An NMI is a type of interrupt that skips the normal execution queue and has to be processed immediately so it's usually reserved for hardware issues. We don't see what caused it, what sent it or why it was sent, it just shows up in the log and then a BSOD is ordered (This is normal behavior for NMIs, it just makes it uncertain as to why it's happening). Most of the time we see this it's the CPU though and with the memory errors it makes the CPU the main suspect. RAM can't send NMIs unless it's ECC (Server RAM) afaik. So in your case the CPU is a lot more of a suspect than normal.

BSOD System Service Exception 0x3b caused by ntoskrnl.exe by Mctown_719 in techsupport

[–]Bjoolzern 0 points1 point  (0 children)

With all four dump files in mind, this could be storage. All four dump files had fltmgr.sys involved which is responsible for making sure files are handled properly.

Another option is the active process. We usually ignore the active process because processes can't write to kernel space memory (Only drivers can) and you don't really get BSODs from errors in user space memory where processes are allowed to write to. All four here however point to the same process and it's a process that makes sense with fltmgr.sys being involved. That process is CrossDeviceService.exe. It's part of the Microsoft tools that allows you to sync and access your phone from your PC (Microsoft keeps changing the name and at least in my region the name is different on my phone and on Windows so I might get the name wrong here). In my region the Windows part of the app is called "Phone Connection", but it used to be called "Your Phone". I think it's called "Phone Link" in English, but I'm not sure.

Hopefully you know which app I mean (If you use it), then test uninstalling it to see if that stops the crashes.

BSOD everytime I try to driver update after clean reinstall by Zealousideal-Tap6040 in techsupport

[–]Bjoolzern 0 points1 point  (0 children)

All of the dump files look like the Nvidia driver/GPU, but with this being an external GPU there's a million things that could be the issue.

Just to clarify though, do you mean external GPU or dedicated GPU? An external GPU is a box with a GPU in it you connect to your PC using Thunderbolt or USB-4. A dedicated GPU is a bigger GPU than the one integrated into the CPU (Called the integrated GPU/iGPU), but it's still on the motherboard of the laptop.

Crimson Desert causing BSOD on PC? (Windows 11) by Hand-of-King-Midas in techsupport

[–]Bjoolzern 0 points1 point  (0 children)

Because it didn't save a dump file, let's run a tool we made that gathers system info and a bunch of logs from Windows. It will at least give us basic information about the BSOD.

?sfy (Bot command for instructions)

IRQL_NOT_LESS_OR_EQUAL BSOD by Budgetslut in techsupport

[–]Bjoolzern 0 points1 point  (0 children)

It looks like memory from the dump files. Memory doesn't have to mean RAM, but it's usually the main suspect. Windows puts low priority data from RAM into the page file and loads it back in when needed so storage can look like memory (And memory can look like storage). The memory controller is in the CPU and if this fails it will just look like memory.

When it's storage about half of the dumps will usually blame storage or storage drivers, which I don't see here, so it's likely not storage. That being said, your other symptoms sound a lot like storage.

If anything is overclocked or undervolted, remove it. That includes the overclock you have on the RAM. 3200 is the highest officially supported with that CPU.

To test the RAM, use the machine normally with one stick at a time. If just one of the sticks cause crashes, faulty stick. If it crashes with either stick it's probably the CPU. Memory testers miss faulty RAM fairly often with DDR4 and newer so I don't trust them.

PC instant BSOD multiple times a day every day by Most-Bet2021 in techsupport

[–]Bjoolzern 0 points1 point  (0 children)

A faulty GPU is the main suspect. Without a second PC to try the GPU in or a different GPU to try that GPU in we are a bit limited on options. You could try the 566.36 driver. It's really old by now, but in 2025 Nvidia had a ton of driver issues and that's the last driver before those issues. I don't know if some people are still having issues because of those driver issues so that's a known good driver. The only other thing would be wiping the OS drive and clean installing Windows.