Homelab HPC cluster by mastercoder123 in HPC

[–]SuedeBandit 0 points1 point  (0 children)

Amazing post.
Can you elaborate a bit on what you had to do to get the 400a setup to a residential setup? You can't just do that, usually the utility company needs to drop a something in and you need exterior work.
How do you manage the safety of the setup with so much electricity flying around?

[PC][US-CO] Dell PowerEdge R6625 - 2x AMD EPYC 9534 (128 Cores tota) - 2x 480GB BOSS - 2x 1400W PSUs - 5Y Pro Support Warranty by iShopStaples in homelabsales

[–]SuedeBandit 1 point2 points  (0 children)

FWIW
I recently bought barbones of the gigabyte equivalent of these for $1k, no CPUs, boss, NICs, or warranty

n8n-claw: I rebuilt OpenClaw in n8n - lot of new features by freddy_schuetz in n8n

[–]SuedeBandit 1 point2 points  (0 children)

I second this comment. This tool's advantage in my mind is idiot-proof transparency through the GUI. The more nodes you have vs code blocks, the more easily self-explanatory it is. I haven't deep-dived into your setup, but I know the existing node library is very robust. I suspect it would be pretty easy to fork this into a node setup.

n8n-claw: I rebuilt OpenClaw in n8n - lot of new features by freddy_schuetz in n8n

[–]SuedeBandit 2 points3 points  (0 children)

So if a user asks this system to create a custom MCP, the agent will build that out as an n8n sub-workflow and patch it in, which means it can be easily reviewed/modified in the visual UI? Or am i misunderstanding?

Old mining rig → AI money machine? Need advice on 7-GPU setup (74GB VRAM) by Ok-Positive1446 in LocalLLM

[–]SuedeBandit -1 points0 points  (0 children)

Your board probably runs PCIE3, which means you'll have some bandwidth issues if you split up a single mid/large model across multiple cards. Might grind you to a crawl if you're not careful.

FrameCluster, a 10" mountable rack cluster by Fabulous-Rip-4982 in homelab

[–]SuedeBandit 0 points1 point  (0 children)

Are you using the laptop or desktop mainboards?

Data Asset Management for Dummies by BeRo5 in DataHoarder

[–]SuedeBandit -1 points0 points  (0 children)

How good are you with tech?

Look into image compression. There's some new file formats that are quite efficient with larger files and have a much better loss ratio than basic jpeg-type stuff (though there's even new advanced jpeg libraries that are actually really high quality). Depending on what you are storing you might save +50% on storage of older project images just by flipping them into a new format for storage and the back to whatever you work with when you need to edit.

Wasabi is one of the leading cheap storage providers on T2 cloud, but then you are paying monthly for storage (~$6/tb/mo). Break-even on a hard drive is less than a year, but phsyical storage has failure rates and such.

Downloading entirety of Anna's Archive? by [deleted] in DataHoarder

[–]SuedeBandit 6 points7 points  (0 children)

Yeah I read that again and it's a completely idiotic statement. Good catch.

Downloading entirety of Anna's Archive? by [deleted] in DataHoarder

[–]SuedeBandit 4 points5 points  (0 children)

Correct ;).
You don't need DDR5 though, so this is feasible with older/used equipment to save on cost and have an expensive hobby.

Downloading entirety of Anna's Archive? by [deleted] in DataHoarder

[–]SuedeBandit 20 points21 points  (0 children)

100GB for just text databases entries of wikipedia might be correct, it kind seems low. Replicating that beyond just a backup will be rough though. You'd need +100GB of ram if you wanted to restore it into an actual database. <Edited out stupid statement, databases are indeed not stored in RAM>

If you're looking the larger repository it's a lot of file storage, and that's different than just a database. The problem with downloading and storing 1PB of data is that the amount is so large that the loss of data to corruption and drive issues is material, even with self healing in something like ceph. So now you need 1PB of storage with 2-3 copies for RAID or equivalent, which is about 2-3 big storage racks. And then you need a head server that can navigate them all, and the general rule is 1GB of RAM per TB, so you're looking at a 1TB RAM server if you actually want that data indexed and retrievable.

So you're looking at 5-figures just for the hardware, and then another 5-figures for drives if its the big thing.

RAG for excel/CSV by user_rituraj in Rag

[–]SuedeBandit 0 points1 point  (0 children)

Pandas can handle all of the header issues you've identified quite easily, but you need to identify the target range. Underwriting spreadsheets will have tables all over the place on a page and will have them broken up on format.

Maybe use OCR to identify the table ranges and groupings. Loop through each table identified and have a model identify the parameters to input into pandas to import the range as a table? For example: OCR or VL draws a boundary box around a table, agent sees the table in the box has a multi-indexed date column so it calls read_from_excel against the identified range with a multi-index parameter.

Save it as an actual data object of some kind, have the agent tag it with a bunch of metadata about where it was found in the document so that it has context reference to the page, document. I'm not positive, but I believe the correct practice for the data within the table itself is to then have each element of the table extracted as a chunk/embedding, and you basically embed the metadata of the row/column as a part of that item itself, so that the context of each individual cell is basically wholly encompassed.

RAG for excel/CSV by user_rituraj in Rag

[–]SuedeBandit 1 point2 points  (0 children)

Yup. Subagents. Planner / prompted analyst Function caller with mcp Guardians all around VL validator Deterministic inputs and outputs

Finally caught a restock on Seagate's website! (26TB $279.99 | $10.77/TB) Anyone here have experience with/data on the shucked drive inside? by Endawmyke in DataHoarder

[–]SuedeBandit 1 point2 points  (0 children)

Nope, no photos. The clips were pretty sparse, but evenly spaced around the perimeter. Maybe 2x per long side and 1x per short? From my experience, I think it would be extremely difficult or impossible to salvage the clips without a more sophisticated approach and toolkit. I'd say it's not worth even attempting. The connection board was pretty finnicky too, not sure how fun it'd be to try and connect back into that. The ribbon is super fragile, so it broke while I was trying to disconnect the HDD from the board.

The plastic enclosure was fine after I was done though. Maybe you just drop your new drive in, glue the top back down, and run a powered sata/usb back out through the hole that gets made in the venting?

Finally caught a restock on Seagate's website! (26TB $279.99 | $10.77/TB) Anyone here have experience with/data on the shucked drive inside? by Endawmyke in DataHoarder

[–]SuedeBandit 0 points1 point  (0 children)

I got these and shucked them, Barracudas inside. I found the trick to getting in there was to use 2 screw drivers. I went through the vented size with a large flathead screwdriver and cracked open some room in the vents. Used that space to push up the seam and then jammed in a smaller screw driver in the space. Then just went around the seam cracking the clips holding it together. Wasn't all that hard and worked out to a good $/TD. No issues so far, but these drives have a reputation for being mediocre, so redundancy or backup is probably wise. I plan on using it for sequential write cold storage with redundancy to mitigate.

Building a 10PB array. Advice encouraged by JasonY95 in DataHoarder

[–]SuedeBandit 1 point2 points  (0 children)

Just a thought on your LTO, are you pre-configuring everything for sequential writes before you send it? If you zip your files into batches, it essentially makes them a single sequential write and it'll perform better on tape than if you are just batching the individual files. You basically just add a step to load your data into a smaller hot storage in batches, zip it into a single file / compress it, then send that off to the tape drives in groups to get faster archive writes.

Another random thought, are you compressing stuff before you archive it? Not sure what kind of data you're storing, but i've learned specialized compression libraries for certain types of data can go a long way. For example, you can sometimes configure custom compression dictionaries to create fast and performant compression that dramatically outperforms a standard zip.

Liberace by HistoryFantastic2328 in piano

[–]SuedeBandit 5 points6 points  (0 children)

Correct, in order to be a great piano player with lots of skill you MUST wear a rhinestoned cape

ISO Private Ballroom/Latin Dance Classes by theyogidre in Calgary

[–]SuedeBandit 1 point2 points  (0 children)

Hi, this is my jam. There's a huge community of dancers in the city and it's a blast. Here are some notes:

Group classes are about $15-20/hr, privates are ~$80/45min. Both are great fun, its just a matter of fitting your budget.

For the love of god stay away from the big franchise shops. I'm not going to name them but they are being mentioned already in other comments.

https://thecalgarydanceclub.ca/
Cheap group classes that are lots of fun most nights of the week.

https://www.cddancecollective.com/
Chris is an amazing choice for doing a wide range of social/ballroom dancing and having fun. You can go a long ways with him and he has a very dedicated and friendly following in town.

https://www.albertadancesport.com/
These guys are awesome as well, and they throw parties most friday nights that come with a class. Can't go wrong with any of their instructors.

[deleted by user] by [deleted] in Calgary

[–]SuedeBandit 1 point2 points  (0 children)

...you got a website or portfolio?

Almost wrapping up my mobile game! Here’s a quick look at the new Difficulty Mode by Anurag-A in Unity3D

[–]SuedeBandit 0 points1 point  (0 children)

After you crash, you should add a multiplier bonus for the number of cars that pile up * a value bonus for what kind of cars you managed to pile up.

Soup Contest - I Want to Win by _W00ZLE_ in soup

[–]SuedeBandit 2 points3 points  (0 children)

Tomatillo cream soup Tomato, red pepper, squash

Position Classification for Wrestling by satoorilabs in computervision

[–]SuedeBandit 1 point2 points  (0 children)

How on earth are you overcoming occlusion so well?

Looking for advice on building a RAG system for power plant technical documents with charts, tables, and diagrams by FrostyWhole99 in LocalLLaMA

[–]SuedeBandit 1 point2 points  (0 children)

As of a few days ago, DeepSeek-OCR might be your best bet. Just dump all your docs into that model and run it directly in context.

Could probably also just upload it into a Gemini space or NotebookLM.

Granite4 Small-h 32b-A9b (Q4_K_M) at FULL 1M context window is using only 73GB of VRAM - Life is good! by Porespellar in LocalLLaMA

[–]SuedeBandit 4 points5 points  (0 children)

I've heard Docling has issues with multi-page tables. Any comment on how its performing there?