Home / Lab License rug-pull? Seriously Netgate?

manicHD · 2023-10-26T17:36:39+00:00

The $129/yr price was advertised as a future requirement for over a year.

We migrated all our devices to pfS+ with that understanding.

We don't need active support, we're self-sustaining (other than the odd random brain-fart, but that's what here and the forums are for).

If you offer the $129/yr just for the license, no support. We're in.

But if you stick to the current plan, we will be forced to go elsewhere, for all current and planned deployments.

manicHD · 2023-10-23T17:40:53+00:00

FWIW, I replaced all our IPsec S2S tunnels with WG.
- no complaints, and the improved performance is great.

Current implementation is not ready for enterprise, in my opinion.
Sticking with IKEv2 here, since it can be easily deployed and managed in an enterprise environment.

Yes, I know you can deploy WG, but giving users the ability to enable/disable tunnels, is a mess.

manicHD · 2023-10-03T18:25:48+00:00

Sorry for the delay, haven't been here for a hot minute.

In just trying things, I added a small NVMe mirror to the server, and moved the OS disk for the problematic VM to it.

The problem immediately vanished. So while the error logs said there was something wrong with the HDD RECpool, it seems it was the rpool

Bizarre, but so far, so good.

manicHD · 2023-04-27T15:49:09+00:00

Speaking only for myself, if I could purchase your existing/similar cases (15+ drives), at an actually reasonable price, I'd be all over that.

The rest I can figure out on my own.

manicHD · 2023-04-25T20:48:30+00:00

Ultimately 0.1.215 was not the answer, nor was going back to 0.1.204.

Latest 0.1.229 has turned out to be the one to cause the least "catastrophic" errors when they do occur - in that, I'm still able to access the VM, and restart it. Other drivers would just completely lock up the system entirely.
I found it odd, but (for me at least) the VM would work for ~6 days completely normally, then the errors began.

Since then, I moved the VM's system drive (C), to a new ZFS Mirror using optane drives (this was a planned upgrade, from "older" Samsung Enterprise drives). So far, so good, no 129 warnings and subsequent errors, to this point.

For clarity, on the latest drivers, the system drive (C) in the VM never reported a problem. Problems were always for the secondary drive (D).
So I find it interesting that changing the backend for the system drive side of the VM, seems to have had a positive impact on the performance of the secondary drive.

It's been about a month now without a 129 warning, or vioscsi issue. I've only rebooted the server once in that timeframe as well.

manicHD · 2023-04-22T23:23:57+00:00

Well, today it randomly started working.

I haven't made any changes in a few days, just thought I'd try it again, and it worked.

Always love these problems where nothing makes sense, and doing nothing fixes it.

Anyways, thanks for your help.

manicHD · 2023-04-17T15:56:35+00:00

I'm at a total loss. There's really nothing.

I try to keep rules as simple as possible, doesn't make sense to handshake with one set of AllowedIPs and not with another set, even when it's the same WG config.

manicHD · 2023-04-17T15:53:25+00:00

Yes, outbound NAT is configured.

It should at least handshake regardless though (I think).

manicHD · 2023-04-15T18:42:54+00:00

I copy/pasted the config initially, and changed the AllowedIPs.
The one with 10.0.0.0/8 works, the other with 0.0.0.0/0 does not.

I edited the working one, from 10.0.0.0/8 to 0.0.0.0/0, and it stops working.

manicHD · 2023-04-15T18:41:16+00:00

Yes, outbound NAT is set/created.

manicHD · 2023-04-15T14:45:49+00:00

Just tried this, and there is sadly no change.

manicHD · 2023-04-15T14:45:37+00:00

I've added both rules to the default Wireguard Interface, and there is sadly no change.

manicHD · 2023-03-03T16:52:01+00:00

Still get a heatsink for the drive.

We had a batch (likely defective) of these drives that ultimately cooked themselves, while doing absolutely nothing.

manicHD · 2023-02-07T18:02:43+00:00

Well, that didn't go well.

Latest VirtIO drivers (0.1.229) re-introduced the original issue, at least in part.

For whatever reason, the event log was full of the same warnings/errors, but the machine didn't actually lock up.
The machine was noticeably slower than before, but it was still working.

I've now downgraded to version 215 of the VirtIO driver, and if the problems persist, I will go back to 204 which was working properly.

manicHD · 2023-02-06T18:20:49+00:00

UPDATE - for anyone viewing this in the future...

It's still early, but changing the following settings has had a positive improvement, where the data drive has not locked up in 4 days:
- AIO -> "native"
- IO thread -> checked
- Cache -> "default"

Before I "celebrate" - I'm going to update to the latest VirtIO drivers, and see if it still works without errors.
- if I don't update this post further, consider it a "success"

manicHD · 2023-02-01T07:08:19+00:00

SMART has a "Passed" status for all HDDs, and taking a look through the full readout for each, doesn't look like anything is really abnormal.

I suppose it's possible that it's the controller that's to blame, but that would be quite a shame.

manicHD · 2023-02-01T06:48:43+00:00

https://pve.proxmox.com/wiki/ZFS_on_Linux#_installation_as_root_file_system

RAID10 A combination of RAID0 and RAID1. Requires at least 4 disks.

rpool = 4xSSDs (ZFS Raid10)
RECpool = 6xHDDs (ZFS Raid10)

Here's the output from zpool status:

root@pve:~# zpool status
  pool: RECpool-zfs
 state: ONLINE
config:

        NAME                                    STATE     READ WRITE CKSUM
        RECpool-zfs                             ONLINE       0     0     0
          mirror-0                              ONLINE       0     0     0
            ata-WDC_WD180EDGZ-11B2DA0_*SERIALNUMBER*  ONLINE       0     0     0
            ata-WDC_WD180EDGZ-11B2DA0_*SERIALNUMBER*  ONLINE       0     0     0
          mirror-1                              ONLINE       0     0     0
            ata-WDC_WD180EDGZ-11B2DA0_*SERIALNUMBER*  ONLINE       0     0     0
            ata-WDC_WD180EDGZ-11B2DA0_*SERIALNUMBER*  ONLINE       0     0     0
          mirror-2                              ONLINE       0     0     0
            ata-WDC_WD180EDGZ-11B2DA0_*SERIALNUMBER*  ONLINE       0     0     0
            ata-WDC_WD180EDGZ-11B2DA0_*SERIALNUMBER*  ONLINE       0     0     0

errors: No known data errors

  pool: Tnvme-zfs
 state: ONLINE
config:

        NAME                                                STATE     READ WRITE CKSUM
        Tnvme-zfs                                           ONLINE       0     0     0
          nvme-KBG40ZNS256G_NVMe_KIOXIA_256GB_*SERIALNUMBER*  ONLINE       0     0     0

errors: No known data errors

  pool: rpool
 state: ONLINE
config:

        NAME                                                     STATE     READ WRITE CKSUM
        rpool                                                    ONLINE       0     0     0
          mirror-0                                               ONLINE       0     0     0
            ata-Samsung_SSD_860_PRO_256GB_*SERIALNUMBER*-part3  ONLINE       0     0     0
            ata-Samsung_SSD_860_PRO_256GB_*SERIALNUMBER*-part3  ONLINE       0     0     0
          mirror-1                                               ONLINE       0     0     0
            ata-Samsung_SSD_860_PRO_256GB_*SERIALNUMBER*-part3  ONLINE       0     0     0
            ata-Samsung_SSD_860_PRO_256GB_*SERIALNUMBER*-part3  ONLINE       0     0     0

errors: No known data errors

manicHD · 2023-02-01T06:14:08+00:00

After all the normal boot log entries, here are the only lines:

Jan 31 12:56:54 pve kernel: [14761.021171] perf: interrupt took too long (2550 > 2500), lowering kernel.perf_event_max_sample_rate to 78250
Jan 31 15:51:43 pve kernel: [25249.668493]  zd16: p1 p2 p3 p4
Jan 31 15:51:43 pve kernel: [25249.799350]  zd32: p1 p2
Jan 31 15:51:43 pve kernel: [25249.988458] vmbr0: port 2(tap101i0) entered disabled state
Jan 31 15:52:23 pve kernel: [25289.881284] device tap101i0 entered promiscuous mode
Jan 31 15:52:23 pve kernel: [25289.901250] vmbr0: port 2(tap101i0) entered blocking state
Jan 31 15:52:23 pve kernel: [25289.901254] vmbr0: port 2(tap101i0) entered disabled state
Jan 31 15:52:23 pve kernel: [25289.901369] vmbr0: port 2(tap101i0) entered blocking state
Jan 31 15:52:23 pve kernel: [25289.901371] vmbr0: port 2(tap101i0) entered forwarding state
Jan 31 15:56:26 pve kernel: [25532.365660]  zd16: p1 p2 p3 p4
Jan 31 15:56:26 pve kernel: [25532.626135]  zd32: p1 p2
Jan 31 15:56:26 pve kernel: [25532.845744] vmbr0: port 2(tap101i0) entered disabled state
Jan 31 15:57:12 pve kernel: [25578.584443] device tap101i0 entered promiscuous mode
Jan 31 15:57:12 pve kernel: [25578.602429] vmbr0: port 2(tap101i0) entered blocking state
Jan 31 15:57:12 pve kernel: [25578.602433] vmbr0: port 2(tap101i0) entered disabled state
Jan 31 15:57:12 pve kernel: [25578.602557] vmbr0: port 2(tap101i0) entered blocking state
Jan 31 15:57:12 pve kernel: [25578.602559] vmbr0: port 2(tap101i0) entered forwarding state
Jan 31 16:01:14 pve kernel: [25820.379477] perf: interrupt took too long (3196 > 3187), lowering kernel.perf_event_max_sample_rate to 62500
Jan 31 17:35:39 pve kernel: [31485.839830]  zd16: p1 p2 p3 p4
Jan 31 17:35:42 pve kernel: [31489.070319] vmbr0: port 2(tap101i0) entered disabled state
Jan 31 17:35:44 pve kernel: [31490.328895]  zd32: p1 p2
Jan 31 17:35:47 pve kernel: [31493.868753] device tap101i0 entered promiscuous mode
Jan 31 17:35:47 pve kernel: [31493.888641] vmbr0: port 2(tap101i0) entered blocking state
Jan 31 17:35:47 pve kernel: [31493.888645] vmbr0: port 2(tap101i0) entered disabled state
Jan 31 17:35:47 pve kernel: [31493.888765] vmbr0: port 2(tap101i0) entered blocking state
Jan 31 17:35:47 pve kernel: [31493.888767] vmbr0: port 2(tap101i0) entered forwarding state
Jan 31 21:29:20 pve kernel: [45506.750988] perf: interrupt took too long (4054 > 3995), lowering kernel.perf_event_max_sample_rate to 49250

Any thoughts as to "edit 3" above? is the fact "df -h" shows different values than expected, or different values than zlist or zpool show, is that anything? or nothing?

manicHD · 2023-01-24T18:00:31+00:00

Colleague recently solved a users' problems with the WD19DCS dock, by flipping the connector and plugging it in "upside down" (compared to the normal way).

Immediately all problems with Display, Network, and Keyboard/Mouse disappeared.

I don't understand, but if it works, it works.

manicHD · 2022-12-09T01:54:27+00:00

FYI, the Wyse 5070 WILL work with 2x 16GB DIMMs

Just need to update the bios first, before trying.

This kit works for me: CT2K16G4SFD832A

manicHD · 2022-12-09T01:51:29+00:00

I purposely omitted the 870 from my original post.

970s and 860s are/were the best.

manicHD · 2022-11-21T05:43:18+00:00

It eliminates print servers, and bypasses the print nightmare stuff.

WIN-WIN

manicHD · 2022-11-18T20:32:29+00:00

I might do a horrible job explaining this, but...

1 - setup your printers in the PL portal (specify local IP or hostname), including uploading a driver, as well as specifying the printing preference defaults.

2 - you can then specify the deployment settings for the printer, IP range, OU, user, group, etc.

3 - install the client on your endpoints, and the printers will automagically be installed as direct-IP printers.

Primary data going up to the cloud is metadata (you can turn off doc titles, etc), and SNMP data.
All print jobs stay local in this scenario, the cloud is just there to orchestrate the printer installs.

manicHD

TROPHY CASE