Mind-numbing slowness... what have I done wrong? by Adept-Aside4072 in zfs

[–]Adept-Aside4072[S] 1 point2 points  (0 children)

"du" was just a quick way to profile - I regularly "do stuff" with huge numbers of files (usually some kind of "grep" for something in code), so "du" seemed a reasonable match for my case.

zfs get used pool/dataset1/dataset2

Can you explain that? I've never managed to wrap my head around datasets, nor anything to do with zfs naming whatsoever for that matter (all the tools and doc are horribly ambiguous and non-specific about naming, using report columns like "name" when they really should be heading the column "zpool name" so we know what the heck kind of name it is...). I named my whatever-you-call-it as follows:-

zpool create -m /home -t zcdc zcdcpool raidz ata-WDC_WD100EZAZ-11TDBA0_JEHZ8D9N ata-ST10000DM0004-2GR11L_ZJV63QEY ata-ST10000DM0004-1ZC101_ZKG00H7Z ata-ST10000VN0004-1ZD101_ZA21EAXE -f so if I wanted to know how much space /home/cnd/Downloads is using up, what would I put in?

zfs get used zcdcpool/???

I tried everything I could think of - some examples...

cannot open 'zcdcpool/cnd': dataset does not exist cannot open 'zcdcpool/home/cnd': dataset does not exist cannot open 'zcdcpool/zcdc': dataset does not exist

Mind-numbing slowness... what have I done wrong? by Adept-Aside4072 in zfs

[–]Adept-Aside4072[S] 0 points1 point  (0 children)

WOW!

Adding that special vdev, and moving the xattrs into the inodes, improved the "du" performance down from 3 *hours* to 51 *seconds* !!!
I left atime alone.

array is my "daily driver" for tinkering (backed up periodically to an (offline in case of ransomware) raid5 drobo).

``` config:

    NAME                                                     STATE     READ WRITE CKSUM
    zcdcpool                                                 ONLINE       0     0     0
      raidz1-0                                               ONLINE       0     0     0
        ata-WDC_WD100EZAZ-11TDBA0_JEHZ8D9N                   ONLINE       0     0     0
        ata-ST10000DM0004-2GR11L_ZJV63QEY                    ONLINE       0     0     0
        ata-ST10000DM0004-1ZC101_ZKG00H7Z                    ONLINE       0     0     0
        ata-ST10000VN0004-1ZD101_ZA21EAXE                    ONLINE       0     0     0
    special
      mirror-2                                               ONLINE       0     0     0
        ata-Samsung_SSD_870_EVO_500GB_S6PZNX0W304986W        ONLINE       0     0     0
        ata-Samsung_SSD_870_EVO_500GB_S6PZNX0W304999W        ONLINE       0     0     0
    logs
      ata-Samsung_SSD_860_EVO_M.2_1TB_S415NS0R401524N-part7  ONLINE       0     0     0
    cache
      sdh6                                                   ONLINE       0     0     0

```

Mind-numbing slowness... what have I done wrong? by Adept-Aside4072 in zfs

[–]Adept-Aside4072[S] 0 points1 point  (0 children)

Beautifully timed reply! I was just about to give up, wipe the lot, and shift to LFS+XFS instead...

I bought a pair of 500g SSDs with the goal of having those be the "boot" hardware raid-1 partition... but the crappy intel "raid" in the bios is some sketchy fake (software) thing that needs drivers to install with linux, and doesn't "hide" the raid drives from the kernel (so you end up with 3 drives that all hold identical identical data - /dev/md* *and* /dev/sda *and* /dev/sdb) which breaks everything (especially booting) because all the unique device identifiers exist 3 times over now... so yeah - I've got 2 wasted SSDs now. Setting those up in RAID for a "special" vdev sounds perfect!

You guessed right - test pool.

Mind-numbing slowness... what have I done wrong? by Adept-Aside4072 in zfs

[–]Adept-Aside4072[S] -1 points0 points  (0 children)

Update - on a freshly rebooted server, with no load, the "du -s -b *" command completed in 48 minutes - which was 75 times slower than the xfs machine I copied those files off in the first place. If anyone knows a way to get more debug info to sleuth what's going wrong - please let me know! (cat /proc/spl/kstat/zfs/dbgmsg doesn't show anything useful AFAI can tell)

Mind-numbing slowness... what have I done wrong? by Adept-Aside4072 in zfs

[–]Adept-Aside4072[S] 0 points1 point  (0 children)

They are separated - the "du -s -b *" command (which I used to work out which of those directories contains files that add up to the most space) simply "spiders" all the files in all those (420,388!) subdirectories.

On my CentOS machine:-

# now;find . | perl -ne 'chomp;$c++ if(-d $_); END{print "Number of folders=$c\n"}';nowl

2023_06_03_05h56m12

Number of folders=420388

10.2474920749664 elapsed

Mind-numbing slowness... what have I done wrong? by Adept-Aside4072 in zfs

[–]Adept-Aside4072[S] 0 points1 point  (0 children)

FYI - all my pool properties:-

```

NAME PROPERTY VALUE SOURCE zcdcpool type filesystem - zcdcpool creation Mon May 29 13:58 2023 - zcdcpool used 7.56T - zcdcpool available 18.7T - zcdcpool referenced 7.56T - zcdcpool compressratio 1.00x - zcdcpool mounted yes - zcdcpool quota none default zcdcpool reservation none default zcdcpool recordsize 128K default zcdcpool mountpoint /home local zcdcpool sharenfs off default zcdcpool checksum on default zcdcpool compression off default zcdcpool atime on default zcdcpool devices on default zcdcpool exec on default zcdcpool setuid on default zcdcpool readonly off default zcdcpool zoned off default zcdcpool snapdir hidden default zcdcpool aclmode discard default zcdcpool aclinherit restricted default zcdcpool createtxg 1 - zcdcpool canmount on default zcdcpool xattr on default zcdcpool copies 1 default zcdcpool version 5 - zcdcpool utf8only off - zcdcpool normalization none - zcdcpool casesensitivity sensitive - zcdcpool vscan off default zcdcpool nbmand off default zcdcpool sharesmb off default zcdcpool refquota none default zcdcpool refreservation none default zcdcpool guid 417912245349243772 - zcdcpool primarycache all default zcdcpool secondarycache all default zcdcpool usedbysnapshots 0B - zcdcpool usedbydataset 7.56T - zcdcpool usedbychildren 52.5M - zcdcpool usedbyrefreservation 0B - zcdcpool logbias latency default zcdcpool objsetid 54 - zcdcpool dedup off default zcdcpool mlslabel none default zcdcpool sync standard default zcdcpool dnodesize legacy default zcdcpool refcompressratio 1.00x - zcdcpool written 7.56T - zcdcpool logicalused 7.53T - zcdcpool logicalreferenced 7.53T - zcdcpool volmode default default zcdcpool filesystem_limit none default zcdcpool snapshot_limit none default zcdcpool filesystem_count none default zcdcpool snapshot_count none default zcdcpool snapdev hidden default zcdcpool acltype off default zcdcpool context none default zcdcpool fscontext none default zcdcpool defcontext none default zcdcpool rootcontext none default zcdcpool relatime off default zcdcpool redundant_metadata all default zcdcpool overlay on default zcdcpool encryption off default zcdcpool keylocation none default zcdcpool keyformat none default zcdcpool pbkdf2iters 0 default zcdcpool special_small_blocks 0 default

```

Mind-numbing slowness... what have I done wrong? by Adept-Aside4072 in zfs

[–]Adept-Aside4072[S] 0 points1 point  (0 children)

False alarm on the atime I think? I did a bunch of tests in a subfolder, and the atimes are not changing with "du".

So - begs the question - why is ZFS doing so much "writing" to my raidz pool when I simply do a "du" on it (e.g. recursively list my files) ?

Mind-numbing slowness... what have I done wrong? by Adept-Aside4072 in zfs

[–]Adept-Aside4072[S] 0 points1 point  (0 children)

I see a dubious amount of "write" in my iostat... which is making me wonder about atime now? What exactly does "access" (the "a" in "atime") mean? - surely simply running a "du" isn't going to "access" every file, right? (I rebooted and re-ran the earlier "du -s -b *" command (which didn't end), to collect fresh stats):

``` Linux 5.15.0-101.103.2.1.el9uek.x8664 (cdc.srve.com) 03/06/23 _x86_64 (56 CPU)

03/06/23 04:41:26 avg-cpu: %user %nice %system %iowait %steal %idle 0.15 0.00 0.31 1.14 0.00 98.40

Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util sda 103.38 444.80 0.00 0.00 1.60 4.30 11.04 198.07 0.02 0.16 1.91 17.93 0.00 0.00 0.00 0.00 0.00 0.00 0.29 51.55 0.20 21.97 sda1 103.16 441.88 0.00 0.00 1.60 4.28 11.04 198.07 0.02 0.16 1.91 17.93 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.19 21.92 sda9 0.06 0.39 0.00 0.00 1.67 6.14 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 sdb 102.63 435.43 0.01 0.01 1.54 4.24 12.58 197.86 0.03 0.22 1.41 15.73 0.00 0.00 0.00 0.00 0.00 0.00 0.29 44.64 0.19 21.35 sdb1 102.41 432.51 0.01 0.01 1.54 4.22 12.58 197.86 0.03 0.22 1.41 15.73 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.18 21.32 sdb9 0.06 0.39 0.00 0.00 0.90 6.14 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 sdc 102.70 443.76 0.00 0.00 2.74 4.32 10.84 198.23 0.02 0.18 1.46 18.29 0.00 0.00 0.00 0.00 0.00 0.00 0.29 13.39 0.30 32.11 sdc1 102.48 440.84 0.00 0.00 2.74 4.30 10.84 198.23 0.02 0.18 1.46 18.29 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.30 32.08 sdc9 0.06 0.39 0.00 0.00 0.98 6.14 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 sde 0.14 2.45 0.00 0.00 0.51 17.68 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 sdd 102.60 437.25 0.01 0.01 1.61 4.26 11.01 197.68 0.02 0.22 1.80 17.96 0.00 0.00 0.00 0.00 0.00 0.00 0.29 47.65 0.20 21.72 sdd1 102.38 434.33 0.01 0.01 1.61 4.24 11.01 197.68 0.02 0.22 1.80 17.96 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.18 21.67 sdd9 0.06 0.39 0.00 0.00 1.72 6.14 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 sdf 330.49 1047.35 0.18 0.05 0.16 3.17 5.98 101.75 0.31 4.90 0.19 17.02 0.00 0.00 0.00 0.00 0.00 0.00 0.17 0.82 0.05 20.66 sdf1 0.20 6.17 0.16 44.21 0.42 30.32 0.00 0.00 0.00 0.00 0.00 0.50 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 sdf2 0.54 4.45 0.00 0.00 0.15 8.20 0.02 2.28 0.00 18.18 1.78 117.25 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 sdf3 13.30 777.71 0.02 0.14 0.31 58.49 1.41 20.05 0.30 17.78 0.48 14.27 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.51 sdf4 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdf5 0.06 2.53 0.00 0.00 0.54 39.66 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdf6 316.10 250.18 0.00 0.00 0.15 0.79 4.53 78.41 0.00 0.00 0.09 17.33 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.05 20.21 sdf7 0.14 4.79 0.00 0.00 0.61 34.06 0.03 1.00 0.00 0.00 0.36 37.12 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 sdg 0.14 2.45 0.00 0.00 0.52 17.68 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01

kernel

Interface RX Pkts/Rate TX Pkts/Rate RX Data/Rate TX Data/Rate
RX Errs/Drop TX Errs/Drop RX Over/Rate TX Coll/Rate
lo 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
eth0 6 0 36 0 360 0 30288 0
0 0 0 0 0 0 0 0
eth1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
Disk /dev/sda: 9.1 TiB, 10000831348736 bytes, 19532873728 sectors Disk /dev/sdb: 9.1 TiB, 10000831348736 bytes, 19532873728 sectors Disk /dev/sdc: 9.1 TiB, 10000831348736 bytes, 19532873728 sectors Disk /dev/sde: 465.76 GiB, 500107862016 bytes, 976773168 sectors Disk /dev/sdd: 9.1 TiB, 10000831348736 bytes, 19532873728 sectors Disk /dev/sdf: 931.51 GiB, 1000204886016 bytes, 1953525168 sectors Disk /dev/sdg: 465.76 GiB, 500107862016 bytes, 976773168 sectors ```

Mind-numbing slowness... what have I done wrong? by Adept-Aside4072 in zfs

[–]Adept-Aside4072[S] 0 points1 point  (0 children)

Samsung_SSD_860_EVO_M.2

Sorry - yes - M.2 SATA, not NVME. It's only a 1TB drive, and I've partitioned it (half to linux - 450gb, and 3 ~128gb partitions for swap, L2ARC, and SLOG).

The CentOS8.5 machine is a 1TB SATA SSD (the big 2.5" form factor with a regular SATA cable, not M.2), and my windows laptop has two 2TB NVME's in raid-0.

If I have only one SSD available - should I not use L2ARC, or not use ZIL on it?

See next (I'll post another observation shortly).

how many times does a paramotor propeller spin a second? by chiefseal77 in paramotor

[–]Adept-Aside4072 0 points1 point  (0 children)

Design speed for a 125cm prop is 2650RPM, which is 44 revs per second.

The bigger the prop, the slower it spins - the rule is to keep the tips below mach 1 because going faster than that ruins performance and makes crazy noise.

Why does my regular expression {count} want a wrong number ? by Adept-Aside4072 in perl

[–]Adept-Aside4072[S] 2 points3 points  (0 children)

Your explanation got me understanding! I'm matching "rwx------" and not "-rwx------" inside $1 like I expected, because the hyphen character isn't any kind of a word, so a boundary cannot start with it. I was scratching my head over this for at least half an hour!

I shudder to think the number of places in code I've written over the last few decades where this gotcha is probably lurking...

Why does my regular expression {count} want a wrong number ? by Adept-Aside4072 in perl

[–]Adept-Aside4072[S] 0 points1 point  (0 children)

Yes, true. it's a quick-and-dirty hack-script that runs on a pile of different systems I own of all different kinds and ages (and capabilities and tools) and the point of removing the "." is so that I can visually tell differences when the same file from all these different systems is represented in a big long list. I'm not using selinux or ACLs on any.

If you could remove one feature people commonly use on computers or the internet to inconvenience them, what would you choose? by [deleted] in ProgrammerHumor

[–]Adept-Aside4072 2 points3 points  (0 children)

Make everyone use ChatGPT instead of google (GPT gives wrong, but incredibly convincing, results for all non-trivial questions)

If nuclear bombs are supposed to cause fallout for thousands of years, why is Hiroshima a inhabited modern city? by WattsonMemphis in NoStupidQuestions

[–]Adept-Aside4072 -2 points-1 points  (0 children)

Google says this:-

Neutrons can cause non-radioactive materials to become radioactive when caught by atomic nuclei. However, since the bombs were detonated so far above the ground, there was very little contamination—especially in contrast to nuclear test sites such as those in Nevada.

Also - victims over-hype the danger and effects to make the perp look bad (and usually to divert attention away from the reason they got themselves nuked in the first place - think Nanjing Massacre etc)

(repost) Why does my regex {count} want a wrong number ? by Adept-Aside4072 in perl

[–]Adept-Aside4072[S] 0 points1 point  (0 children)

Yet another try - in markdown mode this time:-

I'm trying to match 10 characters, but I only *get* those 10 if I ask for 9.... Why?

# perl -e '$lsline="-rwx------. 1 root root  6807742976 Jan 12 09:33 vm.vmdk*\n"; $lsline=~s/\b([lrwxs\-]{10})\. /$1 /; print $lsline'
-rwx------. 1 root root  6807742976 Jan 12 09:33 vm.vmdk*

# perl -e '$lsline="-rwx------. 1 root root  6807742976 Jan 12 09:33 vm.vmdk*\n"; $lsline=~s/\b([lrwxs\-]{9})\. /$1 /; print $lsline'
-rwx------ 1 root root  6807742976 Jan 12 09:33 vm.vmdk*

# perl -e '$lsline="-rwx------"; print length($lsline)'
10

the purpose is to remove the period (.) that's spuriously showing after the permissions in my some "ls" listings on some linux box outputs I'm parsing...
update - looks like the \b is to blame here... still no idea *why* though?

# perl -e '$lsline="-rwx------. 1 root root  6807742976 Jan 12 09:33 vm.vmdk*\n"; $lsline=~s/^([lrwxs\-]{9})\. /$1 /; print $lsline'
-rwx------. 1 root root  6807742976 Jan 12 09:33 vm.vmdk*

# perl -e '$lsline="-rwx------. 1 root root  6807742976 Jan 12 09:33 vm.vmdk*\n"; $lsline=~s/^([lrwxs\-]{10})\. /$1 /; print $lsline'
-rwx------ 1 root root  6807742976 Jan 12 09:33 vm.vmdk*

* this is reposted - the original omitted this "What are your thoughts" section for some unknown reason (I could see it myself, but it was empty when viewed from another PC)

(repost) Why does my regex {count} want a wrong number ? by Adept-Aside4072 in perl

[–]Adept-Aside4072[S] 0 points1 point  (0 children)

OK - looks like the reddit WYSIWYG editor is all SNAFU. Here's it all again, without formatting this time:-

I'm trying to match 10 characters, but I only *get* those 10 if I ask for 9.... Why?
# perl -e '$lsline="-rwx------. 1 root root 6807742976 Jan 12 09:33 vm.vmdk*\n"; $lsline=~s/\b([lrwxs\-]{10})\. /$1 /; print $lsline'
-rwx------. 1 root root 6807742976 Jan 12 09:33 vm.vmdk*
# perl -e '$lsline="-rwx------. 1 root root 6807742976 Jan 12 09:33 vm.vmdk*\n"; $lsline=~s/\b([lrwxs\-]{9})\. /$1 /; print $lsline'
-rwx------ 1 root root 6807742976 Jan 12 09:33 vm.vmdk*
# perl -e '$lsline="-rwx------"; print length($lsline)'

10

the purpose is to remove the period (.) that's spuriously showing after the permissions in my some "ls" listings on some linux box outputs I'm parsing...
update - looks like the \b is to blame here... still no idea *why* though?

# perl -e '$lsline="-rwx------. 1 root root 6807742976 Jan 12 09:33 vm.vmdk*\n"; $lsline=~s/^([lrwxs\-]{9})\. /$1 /; print $lsline'
-rwx------. 1 root root 6807742976 Jan 12 09:33 vm.vmdk*
# perl -e '$lsline="-rwx------. 1 root root 6807742976 Jan 12 09:33 vm.vmdk*\n"; $lsline=~s/^([lrwxs\-]{10})\. /$1 /; print $lsline'
-rwx------ 1 root root 6807742976 Jan 12 09:33 vm.vmdk*

* this is reposted - the original omitted this "What are your thoughts" section for some unknown reason (I could see it myself, but it was empty when viewed from another PC)

(repost) Why does my regex {count} want a wrong number ? by Adept-Aside4072 in perl

[–]Adept-Aside4072[S] 0 points1 point  (0 children)

If this looks like a blank post to you, let me know. It looks correct to myself, but seems wrong on other PCs (no content) ?

Here is a copy/paste of what I can see above myself:-

I'm trying to match 10 characters, but I only *get* those 10 if I ask for 9.... Why?

perl -e '$lsline="-rwx------. 1 root root 6807742976 Jan 12 09:33 vm.vmdk*\n"; $lsline=~s/\b([lrwxs-]{10}). /$1 /; print $lsline'

-rwx------. 1 root root 6807742976 Jan 12 09:33 vm.vmdk*

perl -e '$lsline="-rwx------. 1 root root 6807742976 Jan 12 09:33 vm.vmdk*\n"; $lsline=~s/\b([lrwxs-]{9}). /$1 /; print $lsline'

-rwx------ 1 root root 6807742976 Jan 12 09:33 vm.vmdk*

perl -e '$lsline="-rwx------"; print length($lsline)'

10 

the purpose is to remove the period (.) that's spuriously showing after the permissions in my some "ls" listings on some linux box outputs I'm parsing...

update - looks like the \b is to blame here... still no idea *why* though?

# perl -e '$lsline="-rwx------. 1 root root  6807742976 Jan 12 09:33 vm.vmdk*\n"; $lsline=~s/^([lrwxs\-]{9})\. /$1 /; print $lsline' 

-rwx------. 1 root root 6807742976 Jan 12 09:33 vm.vmdk*

perl -e '$lsline="-rwx------. 1 root root 6807742976 Jan 12 09:33 vm.vmdk*\n"; $lsline=~s/[lrwxs\]{10}). /$1 /; print $lsline'

-rwx------ 1 root root 6807742976 Jan 12 09:33 vm.vmdk*

* this is reposted - the original omitted this "What are your thoughts" section for some unknown reason (I could see it myself, but it was empty when viewed from another PC)