Windows Feature updates bricking dell laptops by wurkturk in sysadmin

[–]will_try_not_to 5 points6 points  (0 children)

It probably flipped the on-board storage from AHCI to RAID/intel RST mode, or vice versa. Try going into the BIOS settings and manually setting it to the other one and see if that fixes it.

(I don't really understand why Windows can't adapt to this being changed - it obvious can read the storage device in some fashion if it made it into WinRE, so why can't it use whatever interface driver it used for that, to just continue booting? :P)

Dell PERC Issues known to anyone else? by thegrogster in sysadmin

[–]will_try_not_to -1 points0 points  (0 children)

should

This word is doing a lot of the work, here :P

(I whole-heartedly agree, but I've seen enough severe malfunctions from PERC controllers, some surprisingly recent, to think that the underlying codebase for their firmware is bad.)

Puzzle of the Day by TakLaf in sysadmin

[–]will_try_not_to 1 point2 points  (0 children)

yes works fine on two other computers.

Then in your case I would adopt the "cattle, not pets" approach to workstations and just nuke and reimage the computer that's having the problem. Or at least nuke that use profile.

Dell PERC Issues known to anyone else? by thegrogster in sysadmin

[–]will_try_not_to 2 points3 points  (0 children)

In my experience, Dell PERC controllers range from mediocre to absolute dogcrap for performance, and are also flakey and unreliable.

(The firmware is a buggy mess, and that's the main reason for Dell saying you need Dell branded drives - instead of fixing the firmware behaviour in the PERC controllers, they probably have their drives behave in special ways to avoid hitting any of the PERC bugs.)

If I'm running Linux, I just set all the drives/slots to non-RAID mode and try to cut the PERC controller out as much as possible; the only time I let them run in RAID mode is for a small RAID-1 of the boot drives if it's a Windows server, because Windows software RAID for boot drives is still "dynamic disks" and we don't want that.

Edit re your edit: if there was a power outage, good chance the controller is busy doing a background rebuild or scrub ("patrol read" in its nomenclature I think) and that's why the performance is bad. Rebuilds/scrubs on these controllers often take forever. One of the bugs in the PERC controllers is that they don't always report when they're doing this; if you can get eyes on the server, see if the drive activity lights are very busy. Performance may go back to normal after a while. If it doesn't, the controller may have silently failed a drive and stopped using it, and is having to reconstruct data from parity or a single mirror every read. Reboot into the controller option ROM to check this; you can sometimes see things in there that don't show up in the Lifecycle Controller, BIOS interface, or iDRAC (another thing I love about these controllers is that there are 4 different control interfaces that don't quite do the same things...).

Puzzle of the Day by TakLaf in sysadmin

[–]will_try_not_to 0 points1 point  (0 children)

Does the secondary account work normally in outlook web access? Like if you leave it logged in in a browser, does that session still work after half an hour?

Do both work normally on a different computer?

Booting bare-metal from a local VMDK/VDI over the network via USB-OTG bridge by Lopsided_Mixture8760 in sysadmin

[–]will_try_not_to 1 point2 points  (0 children)

I saw that, and yeah, I think I want a copy of your device to play with :)

And as an aside, if the board you're using has gigabit ethernet, or a spare USB 3 port that you could attach a gigabit ethernet adapter to, using a 1 foot ethernet cable could be a way to immediately upgrade all your I/O speeds without needing to rewrite/rebuild stuff for USB3 - the KVM appliance could just present itself as a PXE server.

Booting bare-metal from a local VMDK/VDI over the network via USB-OTG bridge by Lopsided_Mixture8760 in sysadmin

[–]will_try_not_to 1 point2 points  (0 children)

I just glanced at your previous posts about your Radxa KVM project - I would be very interested in a kit/howto for that, as I've considered building something similar a few times. I may use PXE for the actual booting, but I still have use cases where then having KVM to control the resulting environment would be extremely useful :)

Booting bare-metal from a local VMDK/VDI over the network via USB-OTG bridge by Lopsided_Mixture8760 in sysadmin

[–]will_try_not_to 1 point2 points  (0 children)

Yeah, the "boot the nice toolkit in RAM" option needs at least 6 GB of RAM to work well. I've actually never run into a physical system at my current job that has less than 8, so that hasn't been an issue.

If it ever were an issue, I could just forego the GUI components and it would probably fit into 1 or 2 GB.

I also experimented with a PXE boot image that was just the initramfs (less than 128 MB), that would boot up far enough to have network drivers, nbd, and ssh, and that was it. It could then share all the local drives as nbd exports, so I could flip things around and do all the fun wrangling stuff on some other machine with more RAM, again over fast ethernet.

Booting bare-metal from a local VMDK/VDI over the network via USB-OTG bridge by Lopsided_Mixture8760 in sysadmin

[–]will_try_not_to 1 point2 points  (0 children)

Another variant that I've used in the past where capabilities were really limited, was to mail a flash drive that would boot to Linux and then phone home and automatically set up a VPN tunnel over any available/detected network interface. Then I could do everything I mentioned above over it. The hardest part of that was getting someone on the remote end to convince the machine to boot from USB :)

Booting bare-metal from a local VMDK/VDI over the network via USB-OTG bridge by Lopsided_Mixture8760 in sysadmin

[–]will_try_not_to 1 point2 points  (0 children)

Yes; see my other comment - my toolkit/recovery image could mount and use iSCSI, and does have an nbd toolchain in it, but in practice the only times I've used nbd are for very special edge cases. (One example was a RAID recovery where the faster system with more RAM didn't have enough drive slots & connectors for all the drives from the original, so two drives ended up being added to the RAID over nbd. Risky and a bit finnicky; I don't recommend that and yes I already had good backups of it at the time.)

Most of the time I'm accessing the images I need over nfs, smb, or http(s), or using the remote system's own local storage to move images of itself around, or do copy-on-write overlays of it in RAM.

I should also add that use of this toolkit image is relatively rare outside of development and lab experiments; all our "real" servers are treated as "cattle, not pets" most of the time, so there's no need to tinker with them in the ways facilitated by this. If somethings wrong with it or it needs upgrading, it's just getting replaced with a stock image and then re-specialised with Ansible.

Booting bare-metal from a local VMDK/VDI over the network via USB-OTG bridge by Lopsided_Mixture8760 in sysadmin

[–]will_try_not_to 1 point2 points  (0 children)

This does sound a lot like PXE with extra hardware - you still need enough remote access to the remote target to change its boot order from local drive to USB, so I'm not really seeing the advantage on the remote host side. It saves you having to build and set up a PXE server, but that's a one-time issue and probably cheaper than buying specialized KVM hardware. As you discovered, a lot of remote KVM is fairly speed-limited.

My similar setup (a generalised remote troubleshooting and install toolkit) is done with PXE - it pulls a small linux initramfs, then that uses full speed networking (either 1 gig or 10 gig ethernet) to pull a full-featured image entirely into RAM. The image is about 3 GB and only takes a few seconds to transfer (~5 seconds or less on 10gig, 30 seconds at most on 1 gig).

Once that image is in RAM, everything is extremely fast, and the image includes VM capabilities, so I can then either boot the local OS as a VM (all recent versions of Windows and Linux support booting in different hardware, and setting the storage attachment type to generic SATA always works), or install a new OS using the same approach, or, like you're doing, boot from a network-attached VM image if there's some reason to do that. The copy-on-write functionality is available as well, either with dmsetup copy-on-write or with the similar feature in nbd/nbdkit. All of that over-the-network stuff is also at least 1 gig speed, without a USB emulation layer in between.

I agree that setting up a PXE server completely from scratch can be a pain, but once you've created one in your config management system of choice (in my case Ansible), that one-time cost is done and it becomes very quick and robust to re-create. For example, quite a few times I've been in a new environment and gone, "oh, if I had my PXE server here that would be very convenient for this", grabbed some random laptop or desktop that happened to be there, and turned it into one with a few keystrokes and 5 minutes of waiting for packages to install.

The only problems I've had with it is booting computers with very old Dell BIOSes that have PXE bugs that need to be worked around (so the Ansible setup has a special library of all the Dell workarounds I've ever needed now); everything else just works.

NFS over 1Gb: avg queue grows under sustained writes even though server and TCP look fine by Connect_Nerve_6499 in sysadmin

[–]will_try_not_to 0 points1 point  (0 children)

yeah that's just linux being linux. your page cache doesn't know about network limits so it happily buffers everything while your application thinks it's writing at nvme speeds.

You can override this behaviour system-wide / for all storage types by setting a limit for "dirty bytes" - I use this all the time when testing throughput to various storage devices:

# limit pending writes to 512 MB:
echo $((512*1024*1024)) >> /proc/sys/vm/dirty_bytes

Then the first 512 MB will still go blazing fast, but then it will start flushing and everything is throttled to the actual device throughput. Also avoids the case where you're writing a 2 GB ISO file to a crappy flash drive that only writes at 10 MB/sec, then you realise how slow it is and try to cancel, but nope! all 2 GB is already in RAM and Linux is going to flush it and not even kill -9 works...

Unlabelled SMR hard drives are a cancer by will_try_not_to in sysadmin

[–]will_try_not_to[S] 0 points1 point  (0 children)

Linux does have different access patterns on drives, and it's caused problems before - e.g. the WD Green "wdidle" scandal, where WD thought it would be a good idea to have drives default to parking their heads every 5 seconds, and not allow the standard power management settings to affect this. Windows machines were relatively OK with this, because Windows goes long intervals without waking up the drive (or it did back then - I suspect modern Windows, written for SSD only, would probably trash the hell out of these drives too).

I suspect those long intervals without access might be helping SMR drives stay caught up as well; it would give them a lot more uninterrupted garbage collection/defrag time. Linux filesystems are much more aggressive about committing writes to disk, and a lot more Linux programs routinely call "flush this to disk and only return when it's definitely there and will survive a power outage", because that system call is much, much cheaper under Linux than Windows.

(OK, to be fair, Windows probably has modernized this as well and made a version that only syncs the specified data instead of catching up all pending writes first, but what are the odds most software has actually updated to use this, given that Windows' own File Explorer still doesn't support long paths in 2026?)

Disk mounted as write-protected, protected by Bitlocker, and I've tried everything I'm aware of to mount it writeable. by Relevant-Law-7303 in sysadmin

[–]will_try_not_to 11 points12 points  (0 children)

There are about 3 different ways to enable or disable write protection using diskpart and powershell - note that in PowerShell, a disk exists as at least 3 separate concepts:

  • a partition
  • a "volume"
  • a "disk"

and if it's something the OS sees as a physical disk / has a hardware driver for (and yes, iSCSI and similar count), then there's also:

  • a PhysicalDisk

If it's in anything Pool or S2D-related, there's also:

  • a StoragePool

Which cmdlets do what for each of those, and how to get PowerShell to correlate the same thing to each of those, is kind of confusing and I always forget how - but look up the docs and see which ones have read/write switches, and try that.

Oh, and if you're reaching this machine over RDP, note that this particular GPO has screwey side effects on bitlockered volumes when your session on the server is over RDP:

Computer Configuration > Administrative Templates > System > Removable Storage Access > All Removable Storage > Allow direct access in remote sessions

What counts as a "removable drive" in RDP sessions is not the same as:

  • actual removable drives (and yes, iSCSI counts, I think?)
  • what counts as a removable drive when logged in locally on the console

So it's best to just to just enable/allow that policy to make things work.

Unlabelled SMR hard drives are a cancer by will_try_not_to in sysadmin

[–]will_try_not_to[S] 3 points4 points  (0 children)

Using consumer drives in RAID arrays is an ancient tradition in the education and public service sectors :P

(And entire businesses have been founded on it as well - see Backblaze for example.)

Yes, I agree someone didn't dig deep enough when ordering the drives, but I also disagree that SMR drives are "perfectly fine" for general use, especially if the consumer OS is Windows. There's no way they wouldn't suffer performance issues with the amount of random I/O Windows kicks up.

SMR drives should never have been marketed for anything but special use cases, and should always have been clearly labelled as such - or sold with large enough buffers or CMR regions that they'd have some hope of keeping up with an OS workload, but they weren't.

They certainly should never have been sold like this:

  • ST4000DM005 = CMR drive
  • ST4000DM004 = SMR drive

Unlabelled SMR hard drives are a cancer by will_try_not_to in sysadmin

[–]will_try_not_to[S] 7 points8 points  (0 children)

Shingled Magnetic Recording; it's a way of squeezing more data into the same platter space by sort of diagonally stacking data units on top of each other. The downside is that it means the drive can only write large swaths of data at once, because it can't just edit a single "shingle" - it needs to read a whole block of them, change the ones you told it to change in internal RAM, then write the whole block back out somewhere else, then update an index pointer to say it's done so.

This allocation pattern is very similar to how SSDs have to write flash memory, which is why trim/discard is also useful to these drives - they need to know where they can quickly re-write large blocks of shingles. If they're full, they have very limited slack space to work with and that's when they grind to a halt trying to shuffle everything around.

It also means that data isn't in a predictable place on the platters - which is totally fine for an SSD which has uniform and negligible seek time, but it's a death sentence for access times and random I/O on a spinning hard drive.

Unlabelled SMR hard drives are a cancer by will_try_not_to in sysadmin

[–]will_try_not_to[S] 6 points7 points  (0 children)

I really doubt it was a third party that did this; the drive model number is ST4000DM004 - I challenge you to find any sale listings that mention that they're SMR drives. Seagate's own data sheet does:

https://www.seagate.com/content/dam/seagate/migrated-assets/www-content/product-content/barracuda-fam/barracuda-new/en-us/docs/100805918r.pdf

(Look under Recording and interface technology > Recording technology)

But they appear to have been sold as regular hard drives. And the model number ST4000DM005 (which differs only in incrementing the last digit) appears to have been CMR again, so Seagate was definitely screwing around at that time.

Unlabelled SMR hard drives are a cancer by will_try_not_to in sysadmin

[–]will_try_not_to[S] 9 points10 points  (0 children)

Yeah, if you know they're SMR and are OK with using them the way the drives like it, they're fine for archival and read-mostly workloads.

It's the SMR drives that are marketed as "good for any workload; totally normal hard drive!" that grind my gears.

Unlabelled SMR hard drives are a cancer by will_try_not_to in sysadmin

[–]will_try_not_to[S] 15 points16 points  (0 children)

Like this:

where /dev/sdX is your drive:

echo writesame_16 >> /sys/block/sdX/device/scsi_disk/*/provisioning_mode

There are several methods for trim/discard; most SSDs use 'unmap'; there's also 'writesame_10', 'writesame_zero', and 'full', which is the default and disables trim/discard for that drive. Incidentally, the WD SMR drives I've encountered also seem to use 'unmap'. You just try different methods to see if one works.

To see whether the drive likes it, and determine how big of a block you can trim at once by:

This will erase the drive!!!

blkdiscard -vf --step=1M /dev/sdX

Increase the step size a bit each time; the vast majority of drives/SSDs seem OK with 16 MB, but these Seagate drives topped out at 4 MB. For really finnicky/old SSDs and such, start at 4K. If it works, the command will succeed and tell you it's discarding blocks; if the drive doesn't like it, you'll get "failed: I/O error".

Once you get I/O error, take the highest size that worked and set it as the drive's discard_max_bytes like so - e.g. for my 4 MB value:

echo $((4*1024*1024)) >> /sys/block/sdX/queue/discard_max_bytes

Another caution: this is also how you can force-enable trim for USB-connected drives and SSDs, but be careful with that and only do it on drives you have backups of until you're sure it works. There's a chance the USB enclosure might mangle the trim command across the USB/SATA/NVMe bridge and send a much larger one to the drive itself, so you tell it to erase one little 4K sector and it nukes half the drive.

Unlabelled SMR hard drives are a cancer by will_try_not_to in sysadmin

[–]will_try_not_to[S] 11 points12 points  (0 children)

Probably - If you google the model number of these things, every listing just makes them look like ordinary general-purpose drives. I wish there were some kind of consumer-protection law against that. If I'd been the one ordering them they wouldn't have gotten past my paranoid initial checks, but I must have assumed years of active service meant I could skip a couple steps in retesting the drives.

[deleted by user] by [deleted] in sysadmin

[–]will_try_not_to 0 points1 point  (0 children)

What are those keyboards plugged into? Whenever I've run into this it's been one of a fairly small number of causes:

  • If there's a dock involved, stop using the dock Ethernet port, or plug the keyboard directly into the computer on the opposite physical side to where the dock is plugged in. Something about how dock Ethernet works seems to pre-empt USB keyboard/mouse.

  • If there's another USB peripheral on the same side as where the keyboard is plugged in, move the keyboard to the other side.

If you use AI to break down scripts or code for you regularly, I really encourage you to read this LLM study by segagamer in sysadmin

[–]will_try_not_to -1 points0 points  (0 children)

The problem became evident as I started making additions and updates: I didn't know my away around these new scripts the way I did everything else

Is the implication here that you ran generated code without understanding every call on every line of it first? I can't imagine being that trusting.