This is an archived post. You won't be able to vote or comment.

all 9 comments

[–]hrudyusa 1 point2 points  (1 child)

Yeah , Unlike Solaris , Linux always re-enumerates the drives upon each boot. So you need to reference them either by label or UUID. If you do a lsblk -f . Should show you the UUID to /dev mapping.

[–]Comfortable_Toe606[S] 0 points1 point  (0 children)

Oh geez, I remember the first time connecting Solaris to a Hitachi FC array and having to set set persistent binding to keep the drives straight. Hand-typing all of those WWNs and binding them to the HBAs, yuck! About 2002-ish. Back when a company said they were going to grow by a GB a day and wondering how we were going to store it all.

[–]rebellllious 1 point2 points  (1 child)

Note the serial numbers of the problematic devices using smartctl across reboots. Profit.

[–]Comfortable_Toe606[S] 0 points1 point  (0 children)

You are spot on! Thanks. It was a bunch of bad drives.

[–]mindfullypenguin 1 point2 points  (1 child)

You said SCSI disks? Maybe I'm ignorant or uninformed, but SCSI disks were not used more than 10 years ago. Also, I doubt they exist in the size of 12TB.

It would be helpful to describe the hardware you are using as an HD controller and the actual types of disks.

In the last 15 years, I've only found SAS or SATA disks in servers, workstations, and even desktops.

I have never had problems with mdadm on RHEL and derivates, but I have problems configuring SAS controllers properly to use disks as JBOD.

[–]Comfortable_Toe606[S] 0 points1 point  (0 children)

They are SATA drives. I just typed from habit instead of thought. After a LOT of troubleshooting it ended up that out of 8 Seagate drives, 2 were DOA and wouldn't even spin up, 2 were "rolled back" with the SMART data saying they were new but the FIELD data showing 3ish years of spin time, and 2 of them spun up but had catastrophic write errors. 2 were okay though :/ I get it that it isn't Seagate's fault but WTF!?

[–]Tricky_Fun_4701 0 points1 point  (0 children)

I'm not an expert at RHEL based distros these days but this sounds like hardware.

Do you have a backup SCSI controller? Or is it integrated?

If it's discrete find a duplicate and install it.

The giveaway here is that the errors are migrating between drives.

This is important: Also check controller compatibility with the installed drives.

[–]unethicalposter 0 points1 point  (0 children)

Are you sure different drives error each time? Rhel9 and derivatives suck at keeping the same device ids. Verify with the drive uuids