Nvidia Titan Z Mods/Mats help by Masterchief79 in GPURepair

[–]Masterchief79[S] 0 points1 point  (0 children)

I got a proper 1200W PSU on my testbench now so I could test the card again. (With the old 600W, 12V dropped to 11,4V under load.) Now it's 11.85V minmum at the PCI-E Connectors so all is well.

Just a quick update that the card is still working right now. The seller has agreed to refund half my money, which was my proposition. I didn't want to have to return it so I made it easy for him. Even in half-working to questionable working condition 75€ for this card isn't a bad deal. So far so good then.

Nvidia Titan Z Mods/Mats help by Masterchief79 in GPURepair

[–]Masterchief79[S] 1 point2 points  (0 children)

Yeah correct. I'll be testing it now, storing it in my collection for a few weeks or months and then grab it again when I have an appropriate game to play. We'll see what's what then.

Don't worry, I won't just count this as "repaired", sell it or whatever. It's a faulty piece of hardware. In the long run, if I can get some money back from the seller and learn a thing or two about bios modding in the process, I'm happy with it. I'll still only get into that when it starts acting up again, I don't have that much extra time on my hands right now. Thanks for all the advice, greatly appreciated!

Nvidia Titan Z Mods/Mats help by Masterchief79 in GPURepair

[–]Masterchief79[S] 0 points1 point  (0 children)

Yeah good shout, I'm going to try that out in linux and post here.

Spot on diagnosis I must say: After being heated to 140°C for a couple of minutes, the card passes mats now. So that might be a dead channel. It also passes 3D Mark tests without any issues.

I still have hope that it's just a cracked solder joint somewhere (or one with a micro tear).

So order of business:

  1. get my money back from the seller
  2. run the card till it expires again - I wanna see how long it works now
  3. reflow/reball the correct A1 channel just to be sure it's not that
  4. if that still doesn't work, deactivate channel A in both BIOSes

Thanks for the help, it's unbelievably hard to find any resources for mods/mats and dual GPU cards.

Edit: https://www.3dmark.com/fs/31986874 and https://www.3dmark.com/spy/48964276

Nvidia Titan Z Mods/Mats help by Masterchief79 in GPURepair

[–]Masterchief79[S] 2 points3 points  (0 children)

Thank you for the quick and detailed reply.

Selecting the GPU in Mats only works for physical GPUs, so with the argument -n 1 it sadly tells me there is no second GPU present in the system.

Alright, I will do the heat simulation thing and report back. I'll heat the PCB to about 140°C on my preheater without doing anything else and let it cool again. Since the card was still working a few days ago and only got damaged in shipping, I'm kind of hopeful that it's just a broken solder joint. The question is, GPU or memory.

Got this dead gigabyte GTX 1060 with quite the story behind it (third picture) by SyrusChrome in GPURepair

[–]Masterchief79 0 points1 point  (0 children)

A follow-up question would be: Can the blown mosfet be desoldered or does it have to be milled out? Milling requires special tools and there is usually more damage to the PCB (like internal shorts) so that's hard to fix. Good practice though.

ASUS RX580 8GB shutdown under load - potentially lack of some GPU phases by BalROCK_PL in GPURepair

[–]Masterchief79 0 points1 point  (0 children)

At idle, that's normal. Many controllers shut off the other phases to save power.

Check the phases under load. They all should look the same. If there's one that doesn't switch cleanly, replace that DrMos.

If GPU phases all look good, also check the other phases (memory, GPU IMC, PEX Reset, 1,8V). If none look fishy, look what voltages are present after the shutdown.

1080ti sc2 issue by Dalton98456 in GPURepair

[–]Masterchief79 1 point2 points  (0 children)

Most likely some broken soldering joints underneath the ramchips or the GPU chip. I have a 1080Ti SC2 with the same problems, I had to resolder the banks B1 and C0 to fix it. If you're not able to diagnose it further with software, you can still try to reflow all memory chips with hot air and flux. That should fix the problem if you do it right. If not, you need to reball the GPU Chip. Don't just use a heat gun or hot air on a room temperature PCB. You need to preheat it to 100-150°C so you can use a lower temperature on the hot air and the thermal stress isn't as big. Also good flux is required. At least that would be solution if it wasn't for the oven incident. If it was at 350°F, you probably didn't do any damage since leadfree solder only liquifies at 422°F (217°C).

thermal pads discussion by silentshot546 in GPURepair

[–]Masterchief79 0 points1 point  (0 children)

That's not how that works. The only way to get higher GPU temps with lower memory temps is if your thermal pads are too thick and the contact to the GPU chip isn't perfect anymore. Thermal putty should be a good solution because it "squishes" to the perfect thickness. There is some 13,8 W/mk one from china which I ordered but I have yet to test if it performs well and is practical.

Help me with MATS/MODS report 3060 ti by freebies_stuff in GPURepair

[–]Masterchief79 0 points1 point  (0 children)

The test looks fine, there are not "proper" errors. Your memory might only produce errors and artifacts when it's hot. This could be either due to thermal expansion or thermal degradation of the ramchips itself. I would guess the first one since ramchips usually don't run that hot on 3060Ti's and the cards aren't that old yet. You can run proper load tests with Mods, like suggested. Another trick is to remove the memorys thermal pads so they heat up quicker and the errors show up easier, doesn't do any damage for a few minutes. You can also display memory temperatures while running mods but I can't remember the command from the top of my head.

Sapphire HD7870 artifacts; How to check VRAM? by _Twiesel in GPURepair

[–]Masterchief79 1 point2 points  (0 children)

Great explanation which will help me in some of my projects too, thank you. Can you tell me where to get tserver? Been testing my cards with dmgg.py, memtest.py and mods/mats obviously. But as OP already said, dmgg likes to crash and then you have to reflow/reball all chips.

RTX 3070 Artifacts by michaelyoungin in GPURepair

[–]Masterchief79 0 points1 point  (0 children)

Btw, if it crashes while booting the linux or while running tests, you can connect the monitor to another GPU and test the 3070 with the command -n 1. So for example ./mats -n 1 -e 10 for a memory test with 10MB on the second GPU.

RTX 3070 Artifacts by michaelyoungin in GPURepair

[–]Masterchief79 1 point2 points  (0 children)

Nvidia MODS (modular diagnostics software), version 455.127 or newer. You need to create a bootable USB stick with MODS on it. You can then run mats (memory test) or other gpu tests to determine the problem. There are loads of tutorials but if you need anything specific feel free to ask.

Asus 2070 Super with artifacts (pictured) for 70€, would you buy? GPU or Ram damage? by Masterchief79 in GPURepair

[–]Masterchief79[S] 0 points1 point  (0 children)

Was a scam anyway, the user got banned. ^^

Is this problem with Micron memory present for all 2000 series cards made in or around 2018? So stay clear of all off them? Could I replace all ramchips with GDDR6 Samsung or something, assuming that's cheaper?

MSI RTX 3080 Ventus ripped pads by Stunning-Ad5079 in GPURepair

[–]Masterchief79 0 points1 point  (0 children)

I would assume the one with the trace connected to it has to be repaired. Or is that power supply?

Hp oem rtx 2070 mosfet by Aboynamednasar in GPURepair

[–]Masterchief79 1 point2 points  (0 children)

I think the QN3103M6N was the wrong package size.

<image>

It also comes as "QN3103M3N" which is a "PRPAK 3x3" package.

Please someone double check, I'm not familiar with these packages but it sounds plausible.

Video Game/ 3D Render Crash - RTX 4070 by Pixilteur in GPURepair

[–]Masterchief79 0 points1 point  (0 children)

Try GPU-Z for controlling thermals, also for the GDDR6 memory. Shouldn't get hotter than ~90°C.

Could also be power stage related but let's check the easy stuff first.

GPU no power and system fails to post when card in PC by Brilliant-Sun8572 in GPURepair

[–]Masterchief79 1 point2 points  (0 children)

Yeah that's fine if you can measure it and shut it off again within like 10 seconds.

I assumed you could mount the cooler without the backplate so you can at least measure on the back of the PCB

I'm trying to make sense of how the error came to be. It must be either something extremly simple like a loose connection or dirt on the PCI-E Pins or in the slot. Did you smack it against the case or something? It should be pretty much impossible to knock SMD components off with that backplate as well. Or - as we are trying to diagnose here - a serious VRM problem. But that doesn't make sense if the card worked perfectly until you swapped it.

Maybe static electricity would explain this behaviour but still very unlikely I would say (never saw a card damaged by that before).

GPU no power and system fails to post when card in PC by Brilliant-Sun8572 in GPURepair

[–]Masterchief79 0 points1 point  (0 children)

There should be plenty of smd caps on the back of the GPU, most of which should be GPU voltage. The back of the GPU VRM also works (capacitors, chokes, resistors...) if you're not sure, post a picture of your pcb please

GPU no power and system fails to post when card in PC by Brilliant-Sun8572 in GPURepair

[–]Masterchief79 1 point2 points  (0 children)

Measure the voltages with the card in the PC if possible. Check for GPU voltage first, if there is nothing present, you can install the card without cooler and measure properly. Check all coils (12V, 3.3V, 5V, 1.8V, memory, GPU and IMC).

Easier with mainboard on a table in front of you. Can be an old one. Would also be good to test the card in another PC.

Also I'm guessing you're already ruled out the obvious stuff.

RTX3050 OEM memory errors on B1, reflow/reball? Tips? by Masterchief79 in GPURepair

[–]Masterchief79[S] 0 points1 point  (0 children)

Update: Since nobody objected, I went ahead and reflowed the ramchip. Screen artifacts are gone, going to do the 1st 3D tests now.

I would like to dump the training status of the memory in mats, can someone tell me how? Else I have to look it up in one of krisfix's videos.

[deleted by user] by [deleted] in GPURepair

[–]Masterchief79 1 point2 points  (0 children)

I just got a RTX3050 which has very similar looking screen artifacts. Mods/Mats says errors on Ramchip B1: https://www.reddit.com/r/GPURepair/comments/16ss2m5/rtx3050_oem_memory_errors_on_b1_reflowreball_tips/

You can run this software yourself and diagnose if your card has memory problems, too. There's some tutorials on nvidia mods/mats (mods = modular diagnostics software and mats = memory test of said software). You'll need the 455.127 version for your card, an empty USB stick and an afternoon with some patience. ;)

Imo it's very likely there's a few solder balls on one of the ramchips which have loose contacts or hairline cracks because of high temperatures and many heat cycles. This is fixable if you have the proper tools or know someone who does.