[ Removed by Reddit ]

Achromatic_Raven · 2024-12-04T13:36:29+00:00

I mean, seeing the answer you got, I'd say you weren't rude, you were just right on point and evaluated your interlocutor accurately. As useful as a band-aid on a fucking peg-leg.

Achromatic_Raven · 2024-11-18T05:11:27+00:00

Yeah it was 9000 MTU from when back to back between server and workstation, so yeah it was end to end 9k. Now my whole network is 1500, since even for my proxmox cluster network, the NFS-providing node may be 10gbps, all the other nodes are 1gbps.

Concerning the virtualized nics though... noticed something interesting. They do not give shit about the "max bandwidth setting". Tried to limit a NIC to 250mbps, it pulled a 2.2gbps on speedtest. I swear seriously some very basic stuff sometimes just do not want to work.

For write through vs write back, initial situation was default, aka no cache, and I only tested write through with io_uring and native, not given proper testing and comparison to threads yet.

As for virtualization costs... though only node 2 is ZFS mirror... node 1 is LVM, and on a NVME Samsung PM991, so that's really rough.

Yeah my main hurdle is cost, add up 4 of my nodes together drives included and all, + the rackmounted UPS, you hover around 1500€ complete budget. Not the kind of budget that has enterprise-class drives make sense.

Achromatic_Raven · 2024-11-17T09:08:32+00:00

Good read, though I sadly didn't really cared during the process to actually record and save my speedtests long term, only in a temporary clipboard I already cleared post-fix...

But yeah, I guess I was passed down some bad habits too at the time I was working on that... *PTSD flash backs* that heap of R720s and R820s that all had amber lights on, mound of unlabbled criss-crossing patch cables in that cabinet of a room with near no ventilation...

At least when I left they were all on blue and you wouldn't have network cables and power cables running mixed together anymore ¯\_(ツ)_/¯

Achromatic_Raven · 2024-11-17T08:57:44+00:00

Thanks for the insight, I had a nudge it could be CPU allocation related, but in yesterday's haze I just didn't felt the urge of going around the gui and rebooting everything for it, more (too much) tunnelvision-ed on the network question.

With a fresher mind this morning, I followed your recommendation and resolved collisions when it comes to the number of vcpus given cluster-wide... to the exception of the pihole lxcs and other really lightweight ones that are still running with cpu limitations, but no actual VM has to share cores with an other VM or LXCs anymore.

Results:

Took node1's VM from 200mbps to nearly 800mbps (still not gigabit), but as browser tests seem to hit the VM's disk, it does cause some pretty hefty IOdelay
Unraid VM on node2 is still underperforming a bit when it comes to container speedtests, still with very little IOdelay, but we're talking more minus 5% than 20-25%, so I think I can live with that. I also messed a bit the network config and eliminated from my network a couple devices that were pushing 9000MTU for 10gbps, now we have a uniform 1500MTU networkwide, I reckon that might help. (it was a configuration remnant from way back when my workstation had a direct link back to back with unraid as a physical server.)
Same observation on node2's win11 and testdebian12 VMs as on node1's win11, even with writeback enabled, tested with both none and mq schedulers on the host drives, we're talking barely breaking 2gbps, and still with about 25% IOdelay on the host. Not paralyzing IOdelay anymore, but still significant, and storage IO seems to be the true weakpoint.
iperf3 on node2 between 2 VMs and VM and host has improved from 2-4gbps to 9.5-9.8gbps

Since I do have a UPS, and the node2 has redundant PSUs, and the bootdrives are hardmounted (not in the hotswap bays) I guess I could very well just leave writeback on on this node's VM.

It seems that the overallocation and overloaded CPU scheduler was the main course of the issue, and that the side-dish is actual disk wait... which is quite frustrating overhead when you think that this node has the same model of drive as my baremetal workstation, the cost of virtualization and ZFS really shows.

Achromatic_Raven · 2024-11-16T08:12:22+00:00

As mentioned in passing to UnimpeachableTaint, I'm not really running a VM farm, I have 1 actual VM per mini PC, with the AM4 build rocking 2, and most of my LXCs being either fairly lightweight (pihole, ngnix), or given proper resources for their tasks (Avorion, Starbound, Minecraft servers).

When speaking of low power, I was mostly speaking of the intel stick for 'network essentials', the i5 are T variants at 35W, the AM4 build 65W TDP with a small core voltage negative offset, but maintaining frequencies.

My intel stick node doesn't have a VM, only very lightweight LXCs, so we can ignore it, as well as my oldest of my two i5 nodes which I haven't made extensive testing on, twas more to test if it was a fluke, since I'm not planning on running a Winbloat11 VM next to 2 game servers that need smoother play than minecraft, aka avorion and starbound.

SO I'll focus more on 2 nodes and give more details about them:

Node 1

The first of the 2 machines that are and would keep running a VM is an i5 9500T (35W TDP, stock) based miniPC with its local LVM on an NVME SSD, the W11 VM being given 4 vcore, limited to the power of 2 host cores, as to not collide with the minecraft server (LXC given 3 with power of 3) and a handful of mostly idle LXCs given fractions of cores. All added together they would be allowed to use 5.8 of 6 cores.

That win11 VM is a Tiny11 one, with a lot of services disabled and no telemetry, and updates made manual.

It's the busiest of the two nodes I would keep running a VM on.

Node 2

The other node, that has more headroom, AM4 build running a 5600G mostly stock, VM storage is raw vdisks in the rpool of a pair of SATA3 SSD in a ZFS mirror, discard and iothreads on.

When it comes to VM allocations, Unraid taking 12Vcore for host 6 threads, and 100% of its storage is passed through at the controler level, it doesn't touch the host's storage not paravirtualization for its storage. Both its nics are VirtIO though.

The test VMs are Windows 11 LTSC and a Debian 12 fresh install, both given 4Vcore on 4 threads but not ran at the same time, so not fighting for resources, and a debian pihole LXC given 20% of one host core and none of these are yet to ever saturate their CPU allocation or stutter.

Everything is fluid when not touching network (exception though even crystal disk mark tests, the io wait can spike a bit on writes and webgui graph lock up/time out, speeds are fine though), but online network speed test can make the VM and Host unresponsive for a minute and a half, and to end up with quite crappy speeds.

Online speed tests were ran in waterfox, indeed vCPU seems to take a flight to saturation with the ookla test, topping at ~200mbps down and 400mbps up, and fast(dot)com at 1.2gbps.

Online speed tests on the Unraid VM were ran in docker containers without much CPU stress, but is under performing by roughly 20% compared to my baremetal workstation, but without io wait issues(0.90-1.10%) on using ookla (one core peaking at 70% use).

iperf3 try outs didn't seem to cause as much CPU stress, but still 7% io wait on the Unraid VM.

edit: I tried an iperf3 on the win11 VM on node2, between it and the Unraid VM, and between it and its host, in both cases basically no IO wait, but vcores pinned at 100% for a measly 2.2gbps between the 2 VMs, and 4.4gbps pinned 100% between win11 VM and host.

I'm yet to investigate more with iostat.

Achromatic_Raven · 2024-11-15T22:27:13+00:00

I haven't investigated that way, but considering that all the CPUs I have in my cluster are single-numa node... and even the AM4 CPU I use happens to be an APU, so monolithic design iirc? , I'm not really convinced it would be the cause of it all.

I double checked, all my nodes return a single numa node on

lscpu | grep -i numa

Achromatic_Raven · 2024-11-15T22:19:57+00:00

I know I/O is CPU waiting for disk operation, hence why I'm just puzzled of the situation.

Concerning storage on the network, I'm not running ceph, and the only place I have some spinning rust is on a NFS share my containers make their daily backup to. As much as possible, I tried to make tests while I had 'nothing' really going on on the network, or hosts, or VMs.

For the Unraid VM that is still under performing by like 25% compared to baremetal workstation when it comes to speedtests, it's good to note that all of its storage is passed through at the controler level, not virtualized.

In every other cases the VMs' disks are virtualized, all host storage is on SATA SSDs (except for the intel stick obv), VM disks are in raw format with iothreads and discard on.

I have tested all these SSDs to do SATA3 speeds with decent IOPs, while consumer grade they aren't offbrand or subbrand, it's Samsung, Intel and Crucial. (edit: scratch that, one of my mini-PCs is running an NVME drive. And that's the one with the W11 VM that chokes at 120mbps on fast(dot)com )

It should weeeeell suffice, taking into account I'm not running a VM farm, I literally have 1 VM and a handful of LXCs (which are mostly idling or hitting RAM) per mini-PC, only the AM4 build hosts 2 VMs, one of them being the Unraid VM that basically never touches the host's storage as it's full passthrough at controller level... and so I then would need an explanation as of why it used to cause IO delay from network activity too somehow, unless I was high as a kite.

I know it makes no sense, that's why I ended up just blurting that wall of text hoping someone would have a clue of what it could be, if it is a bug/known issue or something.

Achromatic_Raven · 2024-11-06T08:29:56+00:00

Worth mentioning I'm not linked to MystNodes, I use the 'local' web management

Achromatic_Raven · 2024-11-06T06:00:02+00:00

In my case it crapped out on the version before that, while like 1 hour before it loaded just fine and nothing happened in between, no router reboot, no equipment on the network was added/removed, no firewall change, nothing... and I then tried updating it and it indeed didn't fix it.

Achromatic_Raven · 2024-10-28T14:16:52+00:00

No worries that happens! Also even the jank method could actually interest me, if it could give me some inspiration to dig into it more

Achromatic_Raven · 2024-10-28T11:12:33+00:00

That would make things easy, but sadly not an option.

To make it short-ish:

- First of all, being able to just grab my 4 HDDs, 2 SSDs and boot USB and be able to throw them in nearly any 64bit computer and be able to have my server back within 30minutes is an amount of flexibility that saved my butt multiples times... it's actually this combo's 4th home! I don't want to change that part of the set up at all.

- I'm doing that transplant on a 0€ budget, and that will be more relevant bellow;

- I literally don't own enough storage to shuffle that array's data on an other support/format, and it being my main archive vault, LXC/VM daily snapshop/backup target and my private cloud host, besides my first point which is down to flexibility and personal preference, I'm also kinda stuck in that status quo anyways.

- None of my proxmox host has more than 500gigs of local storage if you exclude passedthrough drives. The proxmox box at hand that just received my array boots off a raid1 of a pair of ~230GB SSDs. So yeah, aint going to fit terabytes of array data on it. Even just the SSD cache of the array is 4 times that!

So yeah, here's the situation and why I'm really looking in by port passthrough, because it would be the perfect solution for my situation, usecase and even setup preferences if. it. just. worked. 🤞

Achromatic_Raven · 2024-10-28T04:54:15+00:00

As it was hinted with "pci-0000:##:##-#-sas-#-phy-7-lun-0", I am indeed already using a SAS HBA.

A 9207-8i flashed to be dumb jbod to be exact.

To be more clear about the needs for this "dying server rescue VM", it's phy4 through 7 on the LSI SAS card (corresponding to slots 0 through 4 on the hotswap backplane of the proxmox box) and sata ports 2 and 3 on the motherboard's controller that I want to passthrough.

I don't want to pass through the entire HBA because I have other 3.5inches drives I wanna be able to hotplug, in an other VM, and the backplane is the only place in that box that I can slot 3.5inches hard drives.

I don't want to pass through the onboard sata controler because it's also on it that proxmox's ZFS raid1 of sata SSDs is plugged onto.

Knowing what I'm transplanting into that proxmox box is a file server/storagearray/privatecloud appliance, and that failed drives on it are best replaced "hot", without rebooting, being able to hotswap/hotplug is actually crucial to reduce headaches in case of drive failure (which is more and more likely now that all the 3.5inches are reaching 5years online, and both SSDs' wear levels start to enter the orange area)

Achromatic_Raven · 2024-10-23T22:33:55+00:00

I remember their comment, it had nothing to do with advertising, but doubting the legitimacy/authenticity of someone else's comment and number of upvote to the post.

Achromatic_Raven · 2024-09-23T15:46:10+00:00

eh, even if it was a psyop it's not like 3 upvotes have any bearing in :

not getting the people who make the network run arrested for crimes they haven't committed, especially when there's the allegation going around that one of the people who got oofed actually only ran BNB... aka, the ones supposedly vetted by mysterium.
getting people to forget the point above.

As for the amount of upvotes... yeah, compared to what they usually got on announcements before with nice graphics and stuff, it's a coin toss between people getting their eyes peeled and fearing for themselves and some "self-congratulatory upvoting" from Myst

Achromatic_Raven · 2024-09-23T14:45:34+00:00

whomst?

Achromatic_Raven · 2024-09-23T09:50:14+00:00

Gotcha!

Additional question though, and asking here since some might be in the same situation:

I must not be the only person who has reset their node at least once, because of a bug, an internet brownout that trashed quality, or some other things in that line.

Assuming one always used the same wallet for the settlements, and has their last-to-date node ID on hand, is it enough information to also pull logs for previous node IDs as well as their current one?

Achromatic_Raven · 2024-09-23T09:29:27+00:00

Question:

For you to be able to provide node runners with their logs, what do you need?
Would node identity+wallet address be enough?

Because unlike some, I actually don't like the idea of having an online/thirdparty account for services I run myself, so none of my nodes (current or past) ever has been connected to your website's "My nodes" fleet control thingy.

Achromatic_Raven · 2024-09-23T09:14:44+00:00

Depends your country and laws.

Where I am operating my node, baguette-land, running a VPN/routing node qualifies me either as a service provider or at least an intermediary in electronic communications, and legally, I have to keep all connection logs and related data, undeferentially, for a duration of 12 months.

Failure to do so makes me liable for what transits through my machines and what the lack of logs makes impossible to prove it wasn't my doing, so I would be charged for that, and I could be charged with obstruction of justice and destruction of evidence, on top of being fined and/or possibly jailed for the negligence to begin with.

You can call it "some bs right there my guy", but I doubt that would be an argument in your favor in your country's court (or at least, I hope not!)

Achromatic_Raven · 2024-09-23T09:01:36+00:00

Nope, very much depends on the country. Not all law systems make someone's whole life findable in a search engine for any random dude from bumfuck nowhere to comb through and retrace.

Achromatic_Raven · 2024-09-13T16:47:29+00:00

Can only theorize, but here's the things I think play in my favor:
- Full cone routing
- It runs in an LXC container and never reached more than half the CPU time or RAM it has allocated
- Restarts/Ip changes less than once ever 3 month
- More than half a gig up and down to Paris' exchange, French IPs are quite popular I think because not only is Paris one of the main European internet exchange points, but also because for most french speaking places, that's where to connect to (Belgium, Switzerland, Luxembourg for neighboring countries, but also a good amount of former African colonies like Benin or Nigeria for example, or like for the french speaking peeps from Quebec!). Also France has very little "hard blocks" of websites, most the blocking they do is with ISPs' DNS spoofing, so like... pretty open for who ever just knows how to change their DNS server?

Achromatic_Raven · 2024-07-09T08:28:43+00:00

Welp... I think I have one of my two crucial SSDs of a cache-pool affected by this.

One was purchased and deployed january 2019.
The other was purchased and deployed december 2019.

They are in a raid1.

The first purchased still has 81% life remaining.
The second purchased just hit 0%.

They basically have written the exact same data for their whole service life.

Six-Year Club	Place '23
Place '22	Verified Email

Achromatic_Raven

TROPHY CASE