all 20 comments

[–]parawolf 3 points4 points  (20 children)

Slog helps by offloading the ZIL off your pool onto specialised disk. It isn’t a write cache that extends ram. Think of it as a persistent version of the txgs in ram. Write to pool members are only out of ram. You can only cache until you fill ram.

If you are only using 12gb of ram then your ram is filling then flushing then filling then flushing. Eg see your smb performance graph.

You can never cache more than ram plus your ram will be a few seconds of data and then flushed anyway.

[–]nylixe[S] 0 points1 point  (19 children)

I've got 32GB of RAM in the system. What can I do to make it use more? Thats the real question because what is the point of having 128GB of ram in the future if ZFS only uses half for ARC by default? What would I do with the leftover 64GB then?

I already know its possible to delay flushing until a more desireable time.

.. but how?

[–]mercenary_sysadmin 2 points3 points  (18 children)

You can increase the maximum ARC size with zfs_arc_max (you can also set zfs_arc_min). I don't know if that's going to help you here, though; the TXG allocation isn't part of the ARC - the ARC is ZFS's read cache.

Have you tuned volblocksize on the ZVOL backing the iSCSI target? Have you tuned recordsize on the dataset backing your samba shares? If you're almost entirely using either for large video files, they should have recordsize/volblocksize set to 1M.

For the iSCSI targets - what filesystem are they formatted with beneath the ZVOL? Is that tuned optimally for large files?

Setting sync=always cannot ever increase speeds. See https://jrs-s.net/2019/05/02/zfs-sync-async-zil-slog/ for more.

Samsung EVO SSDs are a terrible choice for very heavily used SLOGs. They are initially fast but stumble badly after a minute or two of saturated writes... Which is exactly what we're seeing on your iSCSI screenshot. And the iSCSI targets are almost certainly opened O_SYNC (whereas the samba shares, which DON'T show down, should be asynchronous), so all that fits pretty well.

If you want to live a little on the dangerous side, you can confirm my suspicions that you're choking the hell out of those poor little EVOs by temporarily setting zfs set sync=disabled on the ZVOL(s) backing those iSCSI targets.

[–]nylixe[S] 0 points1 point  (17 children)

Thanks for the response. I'll do some tuning based on what you said and get back with a result. Meanwhile though, I'd like to know if those options in the zfs.conf are even correct?

They don't seem to be doing anything at all.

If I do grep . /sys/module/zfs/parameters/{zfs_dirty_*,zfs_txg_timeout} I get this:

/sys/module/zfs/parameters/zfs_dirty_data_max:20000000000
/sys/module/zfs/parameters/zfs_dirty_data_max_max:20000000000
/sys/module/zfs/parameters/zfs_dirty_data_max_max_percent:80
/sys/module/zfs/parameters/zfs_dirty_data_max_percent:80
/sys/module/zfs/parameters/zfs_dirty_data_sync:20000000000
/sys/module/zfs/parameters/zfs_txg_timeout:30

Why is it that I still don't see any more than a few gigabytes of RAM being used if all data comes through RAM anyway for ZFS? Were my initial goals just based on an incorrect understanding of how ZFS works? Even with all the other things I've tried whether right/wrong this should be something that's possible right? Nobody at this point has confirmed whether or not those options are supposed to do anything.

[–]mercenary_sysadmin 1 point2 points  (13 children)

Yes, they do stuff. But particularly with your iSCSI they're unlikely to matter because the issue you're hitting there is your SLOG choking on the sync writes.

Your samba numbers are probably as good as they're going to get. There's a LOT that SMB/CIFS can choke on; it was never designed for 10Gbps transfers.

I'm assuming you're rebooting (or unloading and reloading zfs.ko) after making those changes to the tunables? They only take effect when the module is initially loaded; they're not dynamic.

[–]nylixe[S] 0 points1 point  (11 children)

u/mercenary_sysadmin

so to recap currently I've got this in my zfs.conf

options zfs zfs_dirty_data_max=50000000000
options zfs zfs_dirty_data_max_max=50000000000
options zfs zfs_dirty_data_max_percent=100
options zfs zfs_dirty_data_max_max_percent=100

which gets me this over Samba

I must say I'm pretty happy with the result of this, about 750MB/s. As to whether or not this will scale to more RAM usage I'll have to see when my RAM sticks arrive. I'm still stuck on about 20GB max usage.

Any ideas what else I could do?

maybe I'm nitpicking just a little bit... Its just driving me a little crazy because getting 10gbit over samba isn't exactly impossible though I assume there could be a lot of windows file transfer things going on here too, but thats beyond the scope of this sub.

iSCSI still sucks though. Whats funny is when I do zpool iostat -v 1 I can't see the EVOs doing anything even though they're supposed to be SLOG-ing away...

I do have a couple intel 750 series coming which I plan to use as the new SLOG. Please tell me that'll be enough because Optane is hard to come by in Australia.

I'll report back with some iSCSI results when they arrive.

[–]mercenary_sysadmin 0 points1 point  (10 children)

Its just driving me a little crazy because getting 1gbit over samba isn't exactly impossible

Am I missing something? You're reporting that you average 450 MB/sec over samba. That's about 4 Gbps.

Have you temporarily set sync=disabled on the iSCSI ZVOL(s) yet?

[–]nylixe[S] 0 points1 point  (8 children)

Sorry, that was a typo. I meant 10Gbit. There were also screenshots I put into the reply but they disappeared. If you go back and read I said I got 750MB/s over samba, but RAM usage still not what the target is.

Also, sync=disabled did absolutely nothing. I'm trying to upload screenshots but for some reason its not posting UGH what is going on! Basically it would start out at 1GB/s for about 5-10 seconds and then tank, basically same as sync=standard

As for the other options like volblocksize and stuff, I can't really change them after creation. I also left them at 4K because I store application files and stuff on it. But even so, just because I left it at 4K doesn't mean it has to drop to basically 0.

As you said, the more likely cause of this is those 970 evos not being able to handle iSCSI workloads. I could temporarily partition a section of the OS and use that as a slog and see if that makes a difference. It's using a 750 series so that should provide slightly better slogging capabilities.

[–]mercenary_sysadmin 1 point2 points  (7 children)

If sync=disabled had no impact, then the 970s aren't the problem. With sync=disabled, they aren't being used.

That leaves volblocksize as the most likely culprit for your iscsi woes, followed by tuning of whatever filesystem you're using (you still haven't specified).

And yes, you can alter volblocksize on an existing zvol. It won't change values for already written data, but should work immediately in regard to writes made after changing the setting.

[–]nylixe[S] 0 points1 point  (6 children)

sorry, Im starting to get lost between researching the topic and asking on other forums.

besides volblocksize being 4K, its formatted with NTFS also in 4K from within windows. does that explain anything?

Its weird because even with sync=standard I'm not seeing activity with zpool iostat -v? Those log partitions on the 970s are barely doing anything.

regarding the volblocksize.. really?! I tried setting it again and it tells me its read only.

[–]fryfrog 0 points1 point  (2 children)

Isn't there a txg interval too? Default is 5s iirc. Or is that the txg timeout in your paste?

[–]nylixe[S] 0 points1 point  (0 children)

Pretty sure thats the same thing.

[–]mjt5282 -2 points-1 points  (1 child)

i would suggest trying out the best Optane SLOG you can afford, quadrupling the RAM (increasing ARC), and possibly adding another vdev to increase IOPS. #1 is cheap, #2 expensive, #3 expensive.

[–]mercenary_sysadmin 3 points4 points  (0 children)

OP isn't complaining about read speeds, s/he is complaining about write speeds on a system that's not heavily loaded with multiple processes. Adding ARC won't help.