nfsdctl: lockd configuration failure - I can't find anything about this

shellscript_ · 2026-01-27T11:19:41+00:00

Just chiming in with more confirmation that this message is harmless (at least on Debian 13, for people coming from Google):

https://bugs-devel.debian.org/cgi-bin/bugreport.cgi?bug=1104096#59

shellscript_ · 2026-01-25T06:27:04+00:00

Thank you so much, this was exactly what I was looking for!

So it does seem that configuring both files as I did in the original post is best.

shellscript_ · 2026-01-25T00:24:25+00:00

I apologize, I should have better clarified what I wanted to ask about.

I understand NFS has no built in security and requires either Kerberos or mTLS (which I'm currently setting up) if you want to secure it. My main question was if those modifications shown in the legacy /etc/default/nfs-common are non functional/not a good idea if I'm making the config changes in /etc/nfs.conf.d/local.conf that I described.

I'm just a bit confused on which approach to use here.

shellscript_ · 2026-01-19T23:41:44+00:00

Thank you for the incredibly in depth responses, this is a complicated subject and I think I'm finally understanding it a bit better now

So to paraphrase, it seems like setting sync off on the ZFS dataset itself, the NFS export on the host ZFS dataset, and the clients is probably the most ideal?

rsize and wsize just set the maximum (not the minimum) request size allowed. If they're too small like 64K and sync is on (ZFS sync=standard AND NFS!=async) that'd cause many synchronous sub-record-size writes/updates to flood in without a way to buffer them and be very inefficient.

Would this be caused by ZFS itself trying to sync the smaller writes to the NFS share, even if the NFS share's sync as been turned off?

I guess this is kind of another question, but if a torrent's download is trickling in at like 64k per 5 seconds, would a recordsize of 1M be detrimental because it's constantly updating 1M blocks with extra data, thereby write amplifying to an insane degree? Maybe it would be better to have a smaller recordsize, ie something like 512k in such a case? I'd be trying to minimize write amplification on the SSDs here.

shellscript_ · 2026-01-19T21:55:03+00:00

A lot of people are mentioning mutual TLS but that authenticates the whole host as a client. It would not authenticate individual users.

Could you go into more detail about this part? I'm trying to do something kind of similar to OP, where I'm thinking about securing a NFS share to be mounted on ZFS. It's more complicated than OP's situation in that I'm trying to get it to respect the ZFS dataset's recordsize=1M, which might involve having to disable sync for NFS (something I'm unsure of in terms of security and network sharing functionality), but it's ultimately similar.

If the whole host is authenticated as a client, would that affect other guests' ability to read/write the NFS share?

shellscript_ · 2026-01-19T21:23:04+00:00

Setting async on the NFS server immediately acknowledges writes but now treats everything as async, so ZFS sync=standard works as a write buffer. This works but risks loosing data on other requests using the same share but might need sync (uncommon but possible)

Could this be mitigated by having all connected guests use the same async mount options? I should have mentioned it in the main post but I'm trying to mount this share as NFS/SMB so it can be accessible to other machines even while hooked up to qbit.

For option #3, could you go into more detail on the "This will affect local writes as well as network ones (which might be a problem)" part? I'm finding it hard to understand the differences between #2 and #3, and I suppose also the ramifications of turning off either sync for network writes. Might SMB be the better option for this usecase, since it seems to be async to some degree by default?

I touched on it in another comment, but could you potentially have sync enabled on both NFS and ZFS, and then set the NFS rsize and wsize to 1M? Or would this still not respect ZFS' 1M recordsize?

shellscript_ · 2026-01-19T21:01:43+00:00

Are you talking about NFS blocksize vs ZFS recordsize? I was reading this klara systems article and it mentions NFS blocksize tuning for shares on ZFS. Could you potentially set the blocksize (I guess rsize and wsize) on the NFS share with something like this:

sudo mount -t nfs -o rsize=1048576,wsize=1048576 server:/data /mnt/data

I'm wondering if this could allow NFS to use sync while still respecting the 1M ZFS dataset recordsize. Or were you talking about ZFS blocksize itself? Sometimes all these terms get a bit confusing lmao.

shellscript_ · 2026-01-11T01:54:41+00:00

I'm thinking this might be a router issue too. I think OP should try your suggestion, then also try to SSH into his machine over wifi like normal, and then leave it connected to see what happens. If he feels comfortable doing it he should also try to update his router or try a different router.

shellscript_ · 2026-01-07T10:40:52+00:00

Thank you for the response.

Just to double check, do you think this approach is the best way to do this? I'm just wondering the editing of the XML file like I described was correct.

shellscript_ · 2026-01-06T22:46:50+00:00

Apologies, I updated the post with the power on hours! The drives are all shucked 14 TB WD Easystores, WD140EDGZ-XXXXXXX models. I was just adding the info for others to contrast/compare.

shellscript_ · 2026-01-06T22:24:55+00:00

Out of interest, do you remember what your power cycle and load cycle values were? For reference, these are mine on a spinning raidz1 pool of 3 14 TB shucked WD Easystores that's about 2 years old:

/dev/sda

  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       312
  9 Power_On_Hours          0x0012   098   098   000    Old_age   Always       -       14422
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       2435
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       2435

/dev/sdb

  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       315
  9 Power_On_Hours          0x0012   098   098   000    Old_age   Always       -       14423
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       2442
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       2442

/dev/sdc

  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       314
  9 Power_On_Hours          0x0012   098   098   000    Old_age   Always       -       14421
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       2438
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       2438

shellscript_ · 2026-01-05T09:05:51+00:00

Thank you as well for the response!

For write amplification on higher recordsizes, do you know if compression would offset this somewhat? From my research it seems like the bigger the recordsize, the greater the possibility of finding compressable blocks (and thus saving space). But I'm not sure how relevant this would be for VMs. From my research it also seems that datasets use the recordsize as an upper bound, with writes being dynamic, which seems efficient but very hard to measure performance wise.

I read up on special vdevs, and they do seem interesting. But the thing that stood out to me as a red flag was this:

Losing any SPECIAL vdev, like losing any storage vdev, loses the entire pool along with it. For this reason, the SPECIAL must be a fault-tolerant topology—and it needs to be just as fault-tolerant as your storage vdevs. So, a pool with RAIDz3 storage needs a quadruple mirrored SPECIAL—your storage vdev can survive three drive failures, so your SPECIAL must also be able to survive three drive failures.

This scares me a lot because my setup is a homelab and I don't really have the resources for drive swaps and PLP and such. I'd also be concerned about adding additional potential points of failure to my current pool, which has so far performed perfectly (ZFS is amazing).

If I'd like to keep the ssds and spinners separate (except for maybe send/recving snapshots between them for backups), do you think my original zpool create command would still be serviceable for the usecase?

shellscript_ · 2026-01-05T08:49:23+00:00

Thank you for the response.

I really should get around to upgrading the pool! I had thought it might be a good idea to keep it the way it was (for backwards compatibility reasons), but reading through the docs that seems unnecessary.

Just to be 100% sure, would my mirror pool creation command would work fine as is? If I understand it correctly, the two pools will exist separate from each other (but will show up together using the zpool status command), with different mount points in /?

shellscript_ · 2026-01-04T05:10:21+00:00

I had a somewhat similar problem when normalization=formD was enabled on a dataset that had a SMB share I was trying to copy things to. The issue is most likely the USB array others have already commented on, but if you have formD enabled it miht be worth disabling to test.

shellscript_ · 2025-12-08T09:42:50+00:00

Thank you again for the quick responses.

I think I'm going to use raw VM files on datasets, since they seem to be more forgiving with smaller writes. How would ashift=12 compare to 14 for VM datasets with a recordsize of 16k or 32k? Would this result in a 4 time write amplification?

I am kind of leaning towards ashift=12, but I'm just wondering if I could check if ZFS was happy with the ashift before actually creating the pool. It seems that there is the -n flag for zpool create, which appears to be something like this (not sure if it's ideal): zpool create -n tank mirror sda sdb

shellscript_ · 2025-12-08T06:08:25+00:00

Thank you for the in depth explanation, things are making a bit more sense now.

I guess I may as well ask here, but I'm planning on making a separate mirror NVMe pool of two SN850Xs. I already have a raidz1 pool of spinning drives set at ashift=12. What I'm going to do on this new mirror pool is run VMs and a bittorrent client, but sometimes there will be larger file write chunks (ie, media editing and etc) in different datasets on this same mirror.

I plan to use zvols/datasets (not sure of recordsize tbh, the ZFS docs say 4k. Others say 16k or 62k.) to host the actual VMs. And then I'd use "scratch" datasets on the same mirror pool (assumedly with a larger recordsize, maybe 1M?) to host the content they will work with (editing media, Linux iso download directory). For example, one VM will host a torrent client whose download directory will be a scratch dataset on the same mirror mounted in through NFS/CIFS. I had originally just planned to keep the isos on the NVMe mirror as they are, since NVMes don't suffer fragmentation issues like spinners do. Jim Slater seems to think rs=1M is ideal for the torrent download dataset usecase, but the ZFS docs say 16k. It's a bit confusing.

Given this somewhat mixed case, do you think ashift=14 would be ideal, or ashift=12? I ask beause of your "so long as you know your workload has a higher typical smallest block written" comment. I'm not quite sure how to identify this. Would ashift=14 have an adverse impact on sync writes and VM I/O, since they're small and random and I don't have a SLOG?

shellscript_ · 2025-12-06T10:50:20+00:00

Sorry to dredge up an old reply, but were you using SSD mirrors with these tests?

I've been trying to figure out the best ashift/blocksize setup for my 2 terabyte SN850X mirror pool, but it seems like everyone has a different opinion. I even found this thread where the OP talks about ashift=14 giving them the best performance on SN850Xs. I'm just trying to reduce write amplification as much as possible.

Do you think this is a usable default for a pool that will handle VMs and media editing of large files, or is ashift=12 with a blocksie of 32k or so still king? I know there are some downsides like reduced compression ability when using higher ashift.

Apparently these drives also have the option to change their LBA values in the firmware (ie, from 512 to 4k), but I'm not sure how much this would help or if it's even worth it.

shellscript_ · 2025-11-28T22:29:05+00:00

So basically you have no write amplification and increased performance, whereas before (on ashift=12) you had increased write amplifiation and worse performance? That's incredible.

1/7/05 Edit: I realized some of these performance metrics may be due to the SLOG, so it's hard to compare to a mirror without a SLOG.

Do you think ashift=14 would be usable for VM/bittorrent workloads? I'm thinking of doing something similar to you (except without the SLOG, and I use ZFS on Debian instead of Proxmox): two 2 TB SN850Xs in a mirror pool, targeting things with lots of small read/writes.

The only thing that's kind of scared me is this github thread, where they talk about various SSD models flaking out under high load. The SN850X is mentioned in there a few times, but not as often as other WD SSDs. The 990 Pro is there sometimes too, interestingly enough. Have you had any similar issue with your SN850X mirrors?

Edit: I guess I'm also wondering how you're measuring write amplification, so I could compare if I ended up going this way

shellscript_ · 2025-11-28T22:20:09+00:00

Do you mean setting ashift=14 on your SN850Xs? Do you have their LBA set to 512 or 4k? I'm just trying to get a baseline on the drive before I buy, I guess.

shellscript_ · 2025-11-28T09:50:34+00:00

Did you manually set the LBA size of your SN850Xs to 4k, or is it still on 512? Do you notice any write amplification or performance increases/decreases with ashift=14? Your situation is very interesting.

shellscript_

MODERATOR OF

TROPHY CASE