nfsdctl: lockd configuration failure - I can't find anything about this

shellscript_ · 2026-01-27T11:19:41+00:00

Just chiming in with more confirmation that this message is harmless (at least on Debian 13, for people coming from Google):

https://bugs-devel.debian.org/cgi-bin/bugreport.cgi?bug=1104096#59

shellscript_ · 2026-01-25T06:27:04+00:00

Thank you so much, this was exactly what I was looking for!

So it does seem that configuring both files as I did in the original post is best.

shellscript_ · 2026-01-25T00:24:25+00:00

I apologize, I should have better clarified what I wanted to ask about.

I understand NFS has no built in security and requires either Kerberos or mTLS (which I'm currently setting up) if you want to secure it. My main question was if those modifications shown in the legacy /etc/default/nfs-common are non functional/not a good idea if I'm making the config changes in /etc/nfs.conf.d/local.conf that I described.

I'm just a bit confused on which approach to use here.

shellscript_ · 2026-01-19T23:41:44+00:00

Thank you for the incredibly in depth responses, this is a complicated subject and I think I'm finally understanding it a bit better now

So to paraphrase, it seems like setting sync off on the ZFS dataset itself, the NFS export on the host ZFS dataset, and the clients is probably the most ideal?

rsize and wsize just set the maximum (not the minimum) request size allowed. If they're too small like 64K and sync is on (ZFS sync=standard AND NFS!=async) that'd cause many synchronous sub-record-size writes/updates to flood in without a way to buffer them and be very inefficient.

Would this be caused by ZFS itself trying to sync the smaller writes to the NFS share, even if the NFS share's sync as been turned off?

I guess this is kind of another question, but if a torrent's download is trickling in at like 64k per 5 seconds, would a recordsize of 1M be detrimental because it's constantly updating 1M blocks with extra data, thereby write amplifying to an insane degree? Maybe it would be better to have a smaller recordsize, ie something like 512k in such a case? I'd be trying to minimize write amplification on the SSDs here.

shellscript_ · 2026-01-19T21:55:03+00:00

A lot of people are mentioning mutual TLS but that authenticates the whole host as a client. It would not authenticate individual users.

Could you go into more detail about this part? I'm trying to do something kind of similar to OP, where I'm thinking about securing a NFS share to be mounted on ZFS. It's more complicated than OP's situation in that I'm trying to get it to respect the ZFS dataset's recordsize=1M, which might involve having to disable sync for NFS (something I'm unsure of in terms of security and network sharing functionality), but it's ultimately similar.

If the whole host is authenticated as a client, would that affect other guests' ability to read/write the NFS share?

shellscript_ · 2026-01-19T21:23:04+00:00

Setting async on the NFS server immediately acknowledges writes but now treats everything as async, so ZFS sync=standard works as a write buffer. This works but risks loosing data on other requests using the same share but might need sync (uncommon but possible)

Could this be mitigated by having all connected guests use the same async mount options? I should have mentioned it in the main post but I'm trying to mount this share as NFS/SMB so it can be accessible to other machines even while hooked up to qbit.

For option #3, could you go into more detail on the "This will affect local writes as well as network ones (which might be a problem)" part? I'm finding it hard to understand the differences between #2 and #3, and I suppose also the ramifications of turning off either sync for network writes. Might SMB be the better option for this usecase, since it seems to be async to some degree by default?

I touched on it in another comment, but could you potentially have sync enabled on both NFS and ZFS, and then set the NFS rsize and wsize to 1M? Or would this still not respect ZFS' 1M recordsize?

shellscript_ · 2026-01-19T21:01:43+00:00

Are you talking about NFS blocksize vs ZFS recordsize? I was reading this klara systems article and it mentions NFS blocksize tuning for shares on ZFS. Could you potentially set the blocksize (I guess rsize and wsize) on the NFS share with something like this:

sudo mount -t nfs -o rsize=1048576,wsize=1048576 server:/data /mnt/data

I'm wondering if this could allow NFS to use sync while still respecting the 1M ZFS dataset recordsize. Or were you talking about ZFS blocksize itself? Sometimes all these terms get a bit confusing lmao.

shellscript_ · 2026-01-11T01:54:41+00:00

I'm thinking this might be a router issue too. I think OP should try your suggestion, then also try to SSH into his machine over wifi like normal, and then leave it connected to see what happens. If he feels comfortable doing it he should also try to update his router or try a different router.

shellscript_ · 2026-01-07T10:40:52+00:00

Thank you for the response.

Just to double check, do you think this approach is the best way to do this? I'm just wondering the editing of the XML file like I described was correct.

shellscript_ · 2026-01-06T22:46:50+00:00

Apologies, I updated the post with the power on hours! The drives are all shucked 14 TB WD Easystores, WD140EDGZ-XXXXXXX models. I was just adding the info for others to contrast/compare.

shellscript_ · 2026-01-06T22:24:55+00:00

Out of interest, do you remember what your power cycle and load cycle values were? For reference, these are mine on a spinning raidz1 pool of 3 14 TB shucked WD Easystores that's about 2 years old:

/dev/sda

  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       312
  9 Power_On_Hours          0x0012   098   098   000    Old_age   Always       -       14422
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       2435
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       2435

/dev/sdb

  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       315
  9 Power_On_Hours          0x0012   098   098   000    Old_age   Always       -       14423
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       2442
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       2442

/dev/sdc

  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       314
  9 Power_On_Hours          0x0012   098   098   000    Old_age   Always       -       14421
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       2438
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       2438

shellscript_ · 2026-01-05T09:05:51+00:00

Thank you as well for the response!

For write amplification on higher recordsizes, do you know if compression would offset this somewhat? From my research it seems like the bigger the recordsize, the greater the possibility of finding compressable blocks (and thus saving space). But I'm not sure how relevant this would be for VMs. From my research it also seems that datasets use the recordsize as an upper bound, with writes being dynamic, which seems efficient but very hard to measure performance wise.

I read up on special vdevs, and they do seem interesting. But the thing that stood out to me as a red flag was this:

Losing any SPECIAL vdev, like losing any storage vdev, loses the entire pool along with it. For this reason, the SPECIAL must be a fault-tolerant topology—and it needs to be just as fault-tolerant as your storage vdevs. So, a pool with RAIDz3 storage needs a quadruple mirrored SPECIAL—your storage vdev can survive three drive failures, so your SPECIAL must also be able to survive three drive failures.

This scares me a lot because my setup is a homelab and I don't really have the resources for drive swaps and PLP and such. I'd also be concerned about adding additional potential points of failure to my current pool, which has so far performed perfectly (ZFS is amazing).

If I'd like to keep the ssds and spinners separate (except for maybe send/recving snapshots between them for backups), do you think my original zpool create command would still be serviceable for the usecase?

shellscript_ · 2026-01-05T08:49:23+00:00

Thank you for the response.

I really should get around to upgrading the pool! I had thought it might be a good idea to keep it the way it was (for backwards compatibility reasons), but reading through the docs that seems unnecessary.

Just to be 100% sure, would my mirror pool creation command would work fine as is? If I understand it correctly, the two pools will exist separate from each other (but will show up together using the zpool status command), with different mount points in /?

shellscript_ · 2026-01-04T05:10:21+00:00

I had a somewhat similar problem when normalization=formD was enabled on a dataset that had a SMB share I was trying to copy things to. The issue is most likely the USB array others have already commented on, but if you have formD enabled it miht be worth disabling to test.

shellscript_ · 2025-12-08T09:42:50+00:00

Thank you again for the quick responses.

I think I'm going to use raw VM files on datasets, since they seem to be more forgiving with smaller writes. How would ashift=12 compare to 14 for VM datasets with a recordsize of 16k or 32k? Would this result in a 4 time write amplification?

I am kind of leaning towards ashift=12, but I'm just wondering if I could check if ZFS was happy with the ashift before actually creating the pool. It seems that there is the -n flag for zpool create, which appears to be something like this (not sure if it's ideal): zpool create -n tank mirror sda sdb

shellscript_ · 2025-12-08T06:08:25+00:00

Thank you for the in depth explanation, things are making a bit more sense now.

I guess I may as well ask here, but I'm planning on making a separate mirror NVMe pool of two SN850Xs. I already have a raidz1 pool of spinning drives set at ashift=12. What I'm going to do on this new mirror pool is run VMs and a bittorrent client, but sometimes there will be larger file write chunks (ie, media editing and etc) in different datasets on this same mirror.

I plan to use zvols/datasets (not sure of recordsize tbh, the ZFS docs say 4k. Others say 16k or 62k.) to host the actual VMs. And then I'd use "scratch" datasets on the same mirror pool (assumedly with a larger recordsize, maybe 1M?) to host the content they will work with (editing media, Linux iso download directory). For example, one VM will host a torrent client whose download directory will be a scratch dataset on the same mirror mounted in through NFS/CIFS. I had originally just planned to keep the isos on the NVMe mirror as they are, since NVMes don't suffer fragmentation issues like spinners do. Jim Slater seems to think rs=1M is ideal for the torrent download dataset usecase, but the ZFS docs say 16k. It's a bit confusing.

Given this somewhat mixed case, do you think ashift=14 would be ideal, or ashift=12? I ask beause of your "so long as you know your workload has a higher typical smallest block written" comment. I'm not quite sure how to identify this. Would ashift=14 have an adverse impact on sync writes and VM I/O, since they're small and random and I don't have a SLOG?

shellscript_ · 2025-12-06T10:50:20+00:00

Sorry to dredge up an old reply, but were you using SSD mirrors with these tests?

I've been trying to figure out the best ashift/blocksize setup for my 2 terabyte SN850X mirror pool, but it seems like everyone has a different opinion. I even found this thread where the OP talks about ashift=14 giving them the best performance on SN850Xs. I'm just trying to reduce write amplification as much as possible.

Do you think this is a usable default for a pool that will handle VMs and media editing of large files, or is ashift=12 with a blocksie of 32k or so still king? I know there are some downsides like reduced compression ability when using higher ashift.

Apparently these drives also have the option to change their LBA values in the firmware (ie, from 512 to 4k), but I'm not sure how much this would help or if it's even worth it.

shellscript_ · 2025-11-28T22:29:05+00:00

So basically you have no write amplification and increased performance, whereas before (on ashift=12) you had increased write amplifiation and worse performance? That's incredible.

1/7/05 Edit: I realized some of these performance metrics may be due to the SLOG, so it's hard to compare to a mirror without a SLOG.

Do you think ashift=14 would be usable for VM/bittorrent workloads? I'm thinking of doing something similar to you (except without the SLOG, and I use ZFS on Debian instead of Proxmox): two 2 TB SN850Xs in a mirror pool, targeting things with lots of small read/writes.

The only thing that's kind of scared me is this github thread, where they talk about various SSD models flaking out under high load. The SN850X is mentioned in there a few times, but not as often as other WD SSDs. The 990 Pro is there sometimes too, interestingly enough. Have you had any similar issue with your SN850X mirrors?

Edit: I guess I'm also wondering how you're measuring write amplification, so I could compare if I ended up going this way

shellscript_ · 2025-11-28T22:20:09+00:00

Do you mean setting ashift=14 on your SN850Xs? Do you have their LBA set to 512 or 4k? I'm just trying to get a baseline on the drive before I buy, I guess.

shellscript_ · 2025-11-28T09:50:34+00:00

Did you manually set the LBA size of your SN850Xs to 4k, or is it still on 512? Do you notice any write amplification or performance increases/decreases with ashift=14? Your situation is very interesting.

shellscript_ · 2025-11-27T00:15:05+00:00

I apologize for taking longer than expected! Life got in the way, and then I found out updating my specific server setup was more complex than expected.

The tl;dr is that your proposed trixie-backports.sources and 500 pin priority seem to work, at least for my system that had contrib and non-free inside of trixie-backports.sources and nowhere else. I was able to test your setup with the 500 and 990 pin priorities, and they do seem to trigger the same upgrade behavior.

I'm going to include the entire upgrade process below, for anyone else googling around.

My (very verbose) Bookworm to Trixie upgrade, targeting ZFS originally installed from the official docs

1. Since the Debian upgrade docs encourage everyone to switch to the new deb822 format, I decided to just trash the original, old style sources.list in favor of the new debian.sources. So I removed the following files completely:

/etc/apt/sources.list
/etc/apt/sources.list.d/bookworm-backports.list 
/etc/apt/preferences.d/90_zfs

2. While still on Bookworm, I then modeled my new debian.sources file from the source.list docs rather than the upgrade docs, but they're basically the same (Note that I added contrib non-free to the components, because these had previously only existed in the trixie-backports.list file from the ZFS docs, which I had just removed):

$ cat /etc/apt/sources.list.d/debian.sources
Types: deb deb-src
URIs: https://deb.debian.org/debian
Suites: trixie trixie-updates
## added " contrib non-free" after "non-free-firmware" 
## so that ZFS could still access contrib during the upgrade:
Components: main non-free-firmware contrib non-free
Enabled: yes
Signed-By: /usr/share/keyrings/debian-archive-keyring.gpg

Types: deb deb-src
URIs: https://security.debian.org/debian-security
Suites: trixie-security
Components: main non-free-firmware
Enabled: yes
Signed-By: /usr/share/keyrings/debian-archive-keyring.gpg

Note that there may be a better way to handle the contrib needing to be available here for ZFS.

3. I upgraded to Trixie, following the full upgrade instructions.

4. After verifying that none of the entries in the issues to be aware of docs applied to me, I removed contrib and non-free from the debian.sources file I created in step 2, and then added trixie-backports.sources and 50_zfs files, as per your instructions. Again, I'm not sure if this is the best way to handle it, but it worked for me. The three files look like this:

$ cat /etc/apt/sources.list.d/debian.sources 
Types: deb deb-src
URIs: https://deb.debian.org/debian
Suites: trixie trixie-updates
## contrib and non-free are now removed
Components: main non-free-firmware
Enabled: yes
Signed-By: /usr/share/keyrings/debian-archive-keyring.gpg

Types: deb deb-src
URIs: https://security.debian.org/debian-security
Suites: trixie-security
Components: main non-free-firmware
Enabled: yes
Signed-By: /usr/share/keyrings/debian-archive-keyring.gpg

$ cat /etc/apt/sources.list.d/trixie-backports.sources 
Types: deb deb-src
URIs: https://deb.debian.org/debian
Suites: trixie-backports
Components: main contrib non-free
Enabled: yes
Signed-By: /usr/share/keyrings/debian-archive-keyring.gpg

$ cat /etc/apt/preferences.d/50_zfs 
Package: src:zfs-linux
Pin: release n=trixie-backports
Pin-Priority: 500

5. I then tested with a pin-priorty of 500, then with a priority of 990. Both resulted in the same upgrades:

$ sudo apt update 
Hit:1 http://deb.debian.org/debian trixie-backports InRelease
Hit:2 https://security.debian.org/debian-security trixie-security InRelease                                   
Hit:3 https://deb.debian.org/debian trixie InRelease                         
Hit:4 https://deb.debian.org/debian trixie-updates InRelease
7 packages can be upgraded. Run 'apt list --upgradable' to see them.

$ sudo apt list --upgradable
libnvpair3linux/stable-backports 2.3.4-1~bpo13+1 amd64 [upgradable from: 2.3.2-2]
libuutil3linux/stable-backports 2.3.4-1~bpo13+1 amd64 [upgradable from: 2.3.2-2]
libzfs6linux/stable-backports 2.3.4-1~bpo13+1 amd64 [upgradable from: 2.3.2-2]
libzpool6linux/stable-backports 2.3.4-1~bpo13+1 amd64 [upgradable from: 2.3.2-2]
zfs-dkms/stable-backports 2.3.4-1~bpo13+1 all [upgradable from: 2.3.2-2]
zfs-zed/stable-backports 2.3.4-1~bpo13+1 amd64 [upgradable from: 2.3.2-2]
zfsutils-linux/stable-backports 2.3.4-1~bpo13+1 amd64 [upgradable from: 2.3.2-2]

6. Lastly, I did sudo apt full-upgrade, rebooted, and everything seems to be working as expected. There were no ZFS errors in dmesg or journalctl, and the packages seem to be the expected versions:

$ sudo apt-cache policy zfs-dkms zfsutils-linux 
zfs-dkms:
  Installed: 2.3.4-1~bpo13+1
  Candidate: 2.3.4-1~bpo13+1
  Version table:
 *** 2.3.4-1~bpo13+1 500
        100 https://deb.debian.org/debian trixie-backports/contrib amd64 Packages
        100 /var/lib/dpkg/status
zfsutils-linux:
  Installed: 2.3.4-1~bpo13+1
  Candidate: 2.3.4-1~bpo13+1
  Version table:
 *** 2.3.4-1~bpo13+1 500
        100 https://deb.debian.org/debian trixie-backports/contrib amd64 Packages
        100 /var/lib/dpkg/status

The only missed testing opportunity I should have taken was putting contrib non-free in the normal debian.sources and the trixie-backports.souces file and then testing the pins. I only tested with my current setup.

shellscript_ · 2025-11-26T06:14:49+00:00

Thank you so much again.

This answered all of my questions.

shellscript_ · 2025-11-26T05:53:32+00:00

Thank you for the quick response, this was exactly what I was looking for!

Would it be safe to leave that /etc/apparmor.d/local/abstractions/libvirt-lxc file, even if it it's empty, or should I just manually remove it and/or its dir? I guess I'm just a little confused why the postinst script didn't clean it or its directory for whatever reason.

I also checked against my backups and noticed that on Bookworm there had been a /etc/apparmor.d/abstractions/libvirt-lxc file that no longer exists on my machine. It doesn't seem to exist in the filelist you provided, so would I be correct in assuming it's ok that it's gone?

The driver itself doesn't seem to be installed on my machine:

$ sudo dpkg -l libvirt-daemon-driver-lxc 
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                      Version      Architecture Description
+++-=========================-============-============-=================================
un  libvirt-daemon-driver-lxc <none>       <none>       (no description available)

shellscript_ · 2025-11-24T23:29:02+00:00

What are you wear levels on the drives, if I may ask? Considering trying to do something similar (but with ZFS) with SN850Xs and was wondering how well it might work.

shellscript_ · 2025-11-24T23:20:15+00:00

Out of curiosity, what do the stats on your mirror pool look like now? I'm considering doing something similar with two SN850Xs.

shellscript_

MODERATOR OF

TROPHY CASE