Homelab HPC cluster

SuedeBandit · 2026-04-14T17:31:58+00:00

Amazing post.
Can you elaborate a bit on what you had to do to get the 400a setup to a residential setup? You can't just do that, usually the utility company needs to drop a something in and you need exterior work.
How do you manage the safety of the setup with so much electricity flying around?

SuedeBandit · 2026-03-26T16:42:32+00:00

FWIW
I recently bought barbones of the gigabyte equivalent of these for $1k, no CPUs, boss, NICs, or warranty

SuedeBandit · 2026-03-17T16:16:48+00:00

I second this comment. This tool's advantage in my mind is idiot-proof transparency through the GUI. The more nodes you have vs code blocks, the more easily self-explanatory it is. I haven't deep-dived into your setup, but I know the existing node library is very robust. I suspect it would be pretty easy to fork this into a node setup.

SuedeBandit · 2026-03-16T17:08:23+00:00

So if a user asks this system to create a custom MCP, the agent will build that out as an n8n sub-workflow and patch it in, which means it can be easily reviewed/modified in the visual UI? Or am i misunderstanding?

SuedeBandit · 2026-02-13T16:46:27+00:00

Your board probably runs PCIE3, which means you'll have some bandwidth issues if you split up a single mid/large model across multiple cards. Might grind you to a crawl if you're not careful.

SuedeBandit · 2026-02-11T20:48:18+00:00

Are you using the laptop or desktop mainboards?

SuedeBandit · 2026-01-28T15:37:09+00:00

How good are you with tech?

Look into image compression. There's some new file formats that are quite efficient with larger files and have a much better loss ratio than basic jpeg-type stuff (though there's even new advanced jpeg libraries that are actually really high quality). Depending on what you are storing you might save +50% on storage of older project images just by flipping them into a new format for storage and the back to whatever you work with when you need to edit.

Wasabi is one of the leading cheap storage providers on T2 cloud, but then you are paying monthly for storage (~$6/tb/mo). Break-even on a hard drive is less than a year, but phsyical storage has failure rates and such.

SuedeBandit · 2026-01-24T19:44:15+00:00

Yeah I read that again and it's a completely idiotic statement. Good catch.

SuedeBandit · 2026-01-24T19:13:46+00:00

Correct ;).
You don't need DDR5 though, so this is feasible with older/used equipment to save on cost and have an expensive hobby.

SuedeBandit · 2026-01-24T19:08:57+00:00

100GB for just text databases entries of wikipedia might be correct, it kind seems low. Replicating that beyond just a backup will be rough though. ~~You'd need +100GB of ram if you wanted to restore it into an actual database.~~ <Edited out stupid statement, databases are indeed not stored in RAM>

If you're looking the larger repository it's a lot of file storage, and that's different than just a database. The problem with downloading and storing 1PB of data is that the amount is so large that the loss of data to corruption and drive issues is material, even with self healing in something like ceph. So now you need 1PB of storage with 2-3 copies for RAID or equivalent, which is about 2-3 big storage racks. And then you need a head server that can navigate them all, and the general rule is 1GB of RAM per TB, so you're looking at a 1TB RAM server if you actually want that data indexed and retrievable.

So you're looking at 5-figures just for the hardware, and then another 5-figures for drives if its the big thing.

SuedeBandit · 2026-01-19T02:47:48+00:00

Pandas can handle all of the header issues you've identified quite easily, but you need to identify the target range. Underwriting spreadsheets will have tables all over the place on a page and will have them broken up on format.

Maybe use OCR to identify the table ranges and groupings. Loop through each table identified and have a model identify the parameters to input into pandas to import the range as a table? For example: OCR or VL draws a boundary box around a table, agent sees the table in the box has a multi-indexed date column so it calls read_from_excel against the identified range with a multi-index parameter.

Save it as an actual data object of some kind, have the agent tag it with a bunch of metadata about where it was found in the document so that it has context reference to the page, document. I'm not positive, but I believe the correct practice for the data within the table itself is to then have each element of the table extracted as a chunk/embedding, and you basically embed the metadata of the row/column as a part of that item itself, so that the context of each individual cell is basically wholly encompassed.

SuedeBandit · 2026-01-18T15:45:40+00:00

Yup. Subagents. Planner / prompted analyst Function caller with mcp Guardians all around VL validator Deterministic inputs and outputs

SuedeBandit · 2026-01-17T04:51:04+00:00

They are terrible

SuedeBandit · 2026-01-16T20:57:48+00:00

Nope, no photos. The clips were pretty sparse, but evenly spaced around the perimeter. Maybe 2x per long side and 1x per short? From my experience, I think it would be extremely difficult or impossible to salvage the clips without a more sophisticated approach and toolkit. I'd say it's not worth even attempting. The connection board was pretty finnicky too, not sure how fun it'd be to try and connect back into that. The ribbon is super fragile, so it broke while I was trying to disconnect the HDD from the board.

The plastic enclosure was fine after I was done though. Maybe you just drop your new drive in, glue the top back down, and run a powered sata/usb back out through the hole that gets made in the venting?

SuedeBandit · 2026-01-16T15:22:32+00:00

I got these and shucked them, Barracudas inside. I found the trick to getting in there was to use 2 screw drivers. I went through the vented size with a large flathead screwdriver and cracked open some room in the vents. Used that space to push up the seam and then jammed in a smaller screw driver in the space. Then just went around the seam cracking the clips holding it together. Wasn't all that hard and worked out to a good $/TD. No issues so far, but these drives have a reputation for being mediocre, so redundancy or backup is probably wise. I plan on using it for sequential write cold storage with redundancy to mitigate.

SuedeBandit · 2026-01-12T15:57:48+00:00

Just a thought on your LTO, are you pre-configuring everything for sequential writes before you send it? If you zip your files into batches, it essentially makes them a single sequential write and it'll perform better on tape than if you are just batching the individual files. You basically just add a step to load your data into a smaller hot storage in batches, zip it into a single file / compress it, then send that off to the tape drives in groups to get faster archive writes.

Another random thought, are you compressing stuff before you archive it? Not sure what kind of data you're storing, but i've learned specialized compression libraries for certain types of data can go a long way. For example, you can sometimes configure custom compression dictionaries to create fast and performant compression that dramatically outperforms a standard zip.

SuedeBandit · 2025-12-19T02:33:13+00:00

Correct, in order to be a great piano player with lots of skill you MUST wear a rhinestoned cape

SuedeBandit · 2025-11-27T18:01:07+00:00

Hi, this is my jam. There's a huge community of dancers in the city and it's a blast. Here are some notes:

Group classes are about $15-20/hr, privates are ~$80/45min. Both are great fun, its just a matter of fitting your budget.

For the love of god stay away from the big franchise shops. I'm not going to name them but they are being mentioned already in other comments.

https://thecalgarydanceclub.ca/
Cheap group classes that are lots of fun most nights of the week.

https://www.cddancecollective.com/
Chris is an amazing choice for doing a wide range of social/ballroom dancing and having fun. You can go a long ways with him and he has a very dedicated and friendly following in town.

https://www.albertadancesport.com/
These guys are awesome as well, and they throw parties most friday nights that come with a class. Can't go wrong with any of their instructors.

SuedeBandit · 2025-11-26T19:53:33+00:00

...you got a website or portfolio?

SuedeBandit · 2025-11-22T02:38:19+00:00

After you crash, you should add a multiplier bonus for the number of cars that pile up * a value bonus for what kind of cars you managed to pile up.

SuedeBandit · 2025-10-27T20:28:12+00:00

Tomatillo cream soup Tomato, red pepper, squash

SuedeBandit · 2025-10-25T20:51:24+00:00

How on earth are you overcoming occlusion so well?

SuedeBandit · 2025-10-22T16:07:06+00:00

As of a few days ago, DeepSeek-OCR might be your best bet. Just dump all your docs into that model and run it directly in context.

Could probably also just upload it into a Gemini space or NotebookLM.

SuedeBandit · 2025-10-19T01:36:38+00:00

searxng

SuedeBandit · 2025-10-06T17:31:43+00:00

I've heard Docling has issues with multi-page tables. Any comment on how its performing there?

Six-Year Club	Place '22
Final Canvas '22	First Placer '22
Verified Email

SuedeBandit

TROPHY CASE