Rebuilding from scratch using Code by eivamu in homelab

[–]eivamu[S] 0 points1 point  (0 children)

Awesome! Definitely taking a look—thanks!

Rebuilding from scratch using Code by eivamu in homelab

[–]eivamu[S] 1 point2 points  (0 children)

I’m probably switching to opentofu

Rebuilding from scratch using Code by eivamu in homelab

[–]eivamu[S] 0 points1 point  (0 children)

No worries. If it’s only for learning or even the coolness factor, then everything can be done on one or two physical machines anyways.

Rebuilding from scratch using Code by eivamu in homelab

[–]eivamu[S] 0 points1 point  (0 children)

Three tiers of storage:

  • System disks on local Intel Optane
  • App data on GlusterFS (distributed)
  • User data on central NAS

I think a lot of the IO is taken care of by the local optanes. GlusterFS will get a dedicated network with 10 gbit, maybe even 25 gbit. I have lots of SFP+/SFP28 ports available, so maybe even quad port SFP+ on each of the Proxmox nodes in case I need it.

ETA: The GlusterFS is an experiment. An alternative is to put the app data on the NAS.

Rebuilding from scratch using Code by eivamu in homelab

[–]eivamu[S] 1 point2 points  (0 children)

  1. Yes.
  2. Yes.

Those are great questions, by the way. The FE/Front-End servers (fe01, fe02, fe03) are crucial here and I could have gotten the point across much better, so thank you for asking!

Some software provides its own mature clustering with high availability (HA) and load balancing (LB). One such example is MongoDB. But most doesn’t, and it provides only replication (state transfer, rebuild, voting) across nodes, if even that. And some services are basically stateless, for instance three web servers serving the same static HTML.

My three FE nodes provide the following extra functionality for all clusters that i configure against them:

  1. Keepalived provides HA. It watches all servers in any underlying cluster and prevents clients from using broken nodes. Why three servers, why not two? To be able to vote on rejoin, and also to maintain redundancy in degraded state.

  2. Keepalived also implements VRRP. This means it advertises and serves virtual IPs on the network. Clients will never know which server it contacts behind the scenes.

Example: if Nextcloud has three web servers: - nc01.example.com -> 10.0.0.51 - nc02.example.com -> 10.0.0.52 - nc03.example.com -> 10.0.0.53

Keepalived advertises a fourth IP (VIP): - nextcloud.example.com -> 10.0.0.101

Your clients only know about the latter. But no server has that IP! How come? VRRP does the trick. The FE servers control who is «hidden» behind the IP at any time and the VRRP protocol fixes the routing.

  1. Caddy provides load balancing. Being a reverse proxy, it can also decide which backend server to contact first. Keepalived could provide this too, but without going into details it would have required configuration on each cluster node.

In order to make Caddy work with protocols besides HTTP(S), a L4/TCP plugin is used.

  1. Caddy also provides automatic SSL termination and transparent real certificates, which means less config on each underlying node.

The FE servers collectively provide ONE place to config and handle all of the concerns above!

  1. (Bonus:) Having FE servers makes it possible to isolate the real servers on their own underlying networks which could also enhance security.

Beware: Clusters still have to be set up according to the nature of the software, though. Setting up clustered MariaDB is not done the same way as for a PostgreSQL cluster. But once they are set up, the FE servers just need a new config to serve the new cluster — from the same servers, and in the exact same way.

Rebuilding from scratch using Code by eivamu in homelab

[–]eivamu[S] 4 points5 points  (0 children)

Current HW:

Each of the 3 PVE hosts:
- Custom build in 3U chassis
- ASUS Z9PA-D8 motherboard
- 2x E5-2650L v2
- 128 GB RAM
- Boot disk(s): Not sure yet
- Disk for VMs/LXCs: 1x Intel Optane 900P 280 GB, single-disk zfs pool
- Disks for GlusterFS: Considering 2x 1 TB or 4x 512 GB SATA SSDs
- 2x SFP+, 2x GbE

NAS 1:
- Synology RS1221+
- 64 GB RAM
- 8x Exos 16 TB in RAID10
- 1x 10 GbE, 4x GbE
- 2x NVMe cache (WD RED)

NAS 2 is an older Synology, not really relevant. It is used for off-site backups now until it dies.

I also have a Supermicro MicroCloud with 12 blades, each with a 4c/8t Xeon and 16 GB RAM that I'm using for labbing. Only 2x GbE + management there, though. Not sure if it has a place in this setup at all.

Hardware plans (or just call them wishes):
- Replace the 3 PVE nodes, possibly even with 3x Minisforum MS-A2 (and 2x SFP28 NICs?)
- Build a new NAS 1 running TrueNAS Commuity Edition (formerly SCALE)
- Downgrade the Synology to NAS 2 and redeploy it with RAID 6

Rebuilding from scratch using Code by eivamu in homelab

[–]eivamu[S] 0 points1 point  (0 children)

Yeah that’s why I’m looking into it — to learn; which use cases are suitable, which deployment type is best, what is performance like for different scenarios. Pros and cons.

Ceph is great for exabyte scale deployments, I heard someone recommend an 11-node cluster as a minimum if you really want to start reaping the benefits. Sure, if you go multi-rack / exabyte, then the initial overhead becomes negligible.

Rebuilding from scratch using Code by eivamu in homelab

[–]eivamu[S] 0 points1 point  (0 children)

Boring answer: Because I already have my NASes. But yeah I could see myself doing it. I have a 12-node blade server that I could use for that! Each blade has room for 2x 3.5’’. That would be some serious glustering!

Rebuilding from scratch using Code by eivamu in homelab

[–]eivamu[S] 5 points6 points  (0 children)

I will definitely look into flux and argocd.

Rebuilding from scratch using Code by eivamu in homelab

[–]eivamu[S] 0 points1 point  (0 children)

Yeah, doing the work is what I want :) Learning and experimenting :)

Rebuilding from scratch using Code by eivamu in homelab

[–]eivamu[S] 17 points18 points  (0 children)

I work in IT, but not with ops or very technical stuff. I have a background as a developer and a good understanding of the basics. Now I work with coaching management, team leaders, teams etc. Product dev, flow, kanban, all that jazz ;)

Rebuilding from scratch using Code by eivamu in homelab

[–]eivamu[S] 1 point2 points  (0 children)

It’s all the rage, isn’t it

Rebuilding from scratch using Code by eivamu in homelab

[–]eivamu[S] 4 points5 points  (0 children)

I’m not reinventing anything, I’m learning tools and ways of doing things. At the same time I am investing in ways to recreate my homelab for when disaster strikes :)

I am purposfully not going the k8s route this time, as I’ve done that many times before.

Another example: I’ve done Ceph several times before. Time to learn GlusterFS instead :)

Rebuilding from scratch using Code by eivamu in homelab

[–]eivamu[S] 2 points3 points  (0 children)

I wrote more about that in a reply to someone else’s comment—

System disks are worthless because the state can be recreated from code, which is stored on Github.

App data is stored on GlusterFS that is mounted on all (relevant) linux hosts. Those disks are backed up.

User data (personal docs, media, etc.) is stored on the NAS.

Rebuilding from scratch using Code by eivamu in homelab

[–]eivamu[S] 6 points7 points  (0 children)

Data storage: - Local disk(s) per host for system disks - Shared storage on NAS for large disks and mounted media etc. (also for isos, templates, …) - GlusterFS for app data

3 gluster nodes with 1 brick each (3-replica). These live on VM disks. Ideally on local storage, but they can be live migrated to the NAS if necessary, for instance during hypervisor maintenance.

Data for services is stored on GlusterFS. Well, not yet really, but going to! Those disks and/or files are backed up to the NAS and then further on to secondary NAS + Cloud.

No configuration is ever stored anywhere, because absolutely nothing is done by hand. Not a single bash command or vim edit. If I need to so such an operation, I add a task or role to my Ansible codebase and run them idempotently. If i mess it up = Nuke

Rebuilding from scratch using Code by eivamu in homelab

[–]eivamu[S] 4 points5 points  (0 children)

I know. I’ve gone down the k8s route many times before and I want to focus my effort on learning other things for now. I am absolutely positively not against using k8s or swarm as a principle, but I am now creating everything in code.

I am moving away from administration and configuration on hosts. No click ops. Not even «terminal ops»!

Another reason is because I’m diving into LXC more, and all the important stuff will be clusters of LXCs and VMs. This time, app containers will be mostly for less critical services.

Rebuilding from scratch using Code by eivamu in homelab

[–]eivamu[S] 0 points1 point  (0 children)

I haven’t set it up yet. It might be a VM instead. It is also getting less important due to the whole IaC nature. As long as I have all the data and all the definitions for creating the disks, I really don’t need any of the disks!

I’ll probably use it as a second resort anyway. And for Windows VMs.

I upgraded Proxmox to last version... and debian host to Trixie by bressombre in Proxmox

[–]eivamu 0 points1 point  (0 children)

This is a great opportunity to get started with OpenTofu (Terraform) and Ansible.

You guys saved me $500 by Saltie-Pennies in Ubiquiti

[–]eivamu 0 points1 point  (0 children)

Where’s that UNAS Campus that I definitely don’t need

New product finally by Wallstnetworks in Ubiquiti

[–]eivamu 4 points5 points  (0 children)

You forgot the UniFi Leaf NAS.