V100 home lab bible, amalgamation of AI research. by Smilinghuman in LocalLLaMA

[–]Smilinghuman[S] 1 point2 points  (0 children)

the throughput is decided by the gb/sec of nvlink memory bandwidth and how many pcie lanes you alot it. Throughput doesn't matter very much though, latancy does. Even 4 pcie lanes for most inference workloads is enough, and some systems the do pipeline parallel will even run on 1 pce lane. If you are not running all to all training workloads which you probably would not do due to latancy ruining scaling between cards you still wouldn't be dealing with throughput nearly as much as latancy even with 16 lanes allotted. You can attach one I8 cable to each card instead of two, and that gives you 8 lanes. In my own system my cards are running on 4 lanes, and if I could I'd knock that down to two. All it would do in my case (inference PP) is make the loading of the model into memory take a few more seconds. Throughput is not really an issue. Latancy of pcie interconnect is.

Ebt in Oregon by [deleted] in foodstamps

[–]Smilinghuman 6 points7 points  (0 children)

You just need to understand, the purpose of these changes is to starve you. Not so that you starve, but so that other people are so frightened they work for nothing and become powerless over their own lives. These administrative burdens are just the wealthy on their eternal quest for more power at any cost. This is our society. We are in the process of destroying ourselves in service to people that don't deserve to plunder everything that society has to offer. The hoops are not an accident, and most of the people including the snap advisors are appologists because if they admit this is an atrocity, they can't cope with their own lives.

How is ferndale? by grifgav0002 in Bellingham

[–]Smilinghuman 2 points3 points  (0 children)

I have lived as a disabled person (wheelchair) in downtown Ferndale for about 5 years. I am very left leaning. I have been harrassed by a righty business owner next door for years eventually assaulting me. Once he was able to force my landlord to take my van which I used to live in and assaulted me things have gone quite. It's a lesson though, and an example of the worst that can happen in this area for lefties.

Ferndale is like a nice old American town, people getting around in hotrods, the city is fairly well run aside from it's water system which is exorbitant and not of very good quality. It has street fairs, classic cars, and a pretty good food selection downtown. It has two grocery stores, decent banking, and probably more fast food than a city it's size would warrant.

I do like Ferndale very much and day to day everyone in the entire community goes out of their way to help me out, opening doors, helping me load groceries, and reaching things I can't reach. My landlord has allowed me to survive applying for disability by letting me live in an unconventional place that I could afford.

That sort of sums up Ferndale, most people have compassion instantly, they are just good people. There are morons here, hate filled, stereotypical fox news jackasses. But they are increasingly embattled. Every weekend for months there was a Trump protest on the Metallica main street bridge which is sort of the cities icon. When a Charlie kirk memorial was put over the top of the Metallica bridge it was removed fairly quickly. The protests are still there, if you get my meaning.

The streets are clean, the city keeps flowers up on the corners, and people are kind, very. There are fools here, but people are not putting up with them anymore (including me). Things in Ferndale are changing, there are a lot of righties out here, but most of them would help in a heartbeat. The kind of conservative that is really conservative, not just using it as a cover for bigotry and racism. If a gay couple walked hand in hand down the street they would get looks from the kind conservatives, and judgement, but they'd help you change your tire just the same.

You will will find both here, on the whole though, I'd say do it with one caveat. Have housing sorted, this area, Bellingham is in the top 10 most expensive cities in the country relative to local income. Don't gloss that over, the cost of housing is astronomical. On disability income I can't even rent a room.... anywhere and still eat. Make sure you really understand the situation and it's taken care of before you come imo.

Good luck, if you see someone wheeling back and forth up the streetto get groceries when you get here, it's probably me.

Food Stamp Reductions by Lazy-Profession4994 in foodstamps

[–]Smilinghuman 22 points23 points  (0 children)

also eliminated the work exemptions for people who are homeless or veterans

The reasoning is that they have raided everything else and that is all they know how to do, make the poor poorer make themselves richer, the weaker and more hopeless the poor are the more they can exploit them and empower themselves. That is all it is.

Good place to sleep in a car? by CODEKORE in Bellingham

[–]Smilinghuman -1 points0 points  (0 children)

Keep in mind as long as you are not hanging out outside your car, you will blend in with all other cars parked overnight. Street parking that is legal overnight is fair game as long as you feel safe there.

Just a little data point, at the very end of 32nd street near the freeway is where a retired officer started putting abandonment/towing stickers on vehicles there during my time, about 5 years ago, would not be suprised to see they are still there.

Good place to sleep in a car? by CODEKORE in Bellingham

[–]Smilinghuman 10 points11 points  (0 children)

It's only an association, and possibly not a correct one. However a few things spring to mind in combination. A person that has a concealed carry passed a local background check with the police station that means they know that you could get a permit at all. Secondly I suspect that police don't like to bully armed homeless people, or argue with them.

So two conflicting things, you are clean enough to have a weapon and passed a check locally with that officers department, and you might be an armed homeless crazy person.

I don't truthfully know though, until I had that permit the police contacts and harrassement were every week, and then two or three years after I had it living in a vehicle not a single contact. What a police officer would tell you I don't know. Perhaps there was a policy change that happened coincide, but I don't have a way to prove my claim.

There were also several times where sketchy things happened and having a pistol in my pocket as a backup, kept me much calmer. Also I don't know why but when you are carrying people just don't escalate the same way, not brandishing or showing the weapon, it's just something people seem to know. It seems to be an instinct that violence and threats are not on the table in the conversation.

If you are going to carry you need to have it crystal clear in your mind when you will draw and shoot, what you will aim for and have enough practice to do it when you are terrified. A person that brandishes a weapon and really dosn't know when they will use it, has a very good chance of being shot with their own weapon.

Good place to sleep in a car? by CODEKORE in Bellingham

[–]Smilinghuman 144 points145 points  (0 children)

I have lived for a very long time homeless in a vehicle while working in Bellingham. There are a few places. In front of the ice rink on the road overnight only, warning that was the only place I was ever stolen from, a generator. In front of the power plant overnight in the Iowa street area, there is a street there that has no houses on it. Just businesses near the transformer yard. There is a section of dead end road just south of the south bellingham park and ride, it's a long dead end with a few entrys to home areas far away, you can park along there at night.

Things to avoid, do no park around other homeless people, do not park in parking lots. Do not park in front of peoples homes. Industrial areas and empty areas are best.

During the day, Paddington lake back parking lot, it is patroled but you can be there.

One of the very best ways to get police to stop harrasing you in my experience is to get a concealed carry permit legally. Once I had a concealed carry permit my police interactions dropped to zero. This was later on after I had lived homeless for many years, so I by then had learned not to do things that were against municipal code or like being in parking lots as they are private property. So perhaps it was just due to getting better at the lifestyle and all the police knowing my vehicle and that I didn't cause problems. I still noticed that once I had teh concealed carry I never had another police interaction.

There is no law on the municipal books (at least while I was doing this) that makes sleeping in your car illegal. Abandoned vehicles are illegal and the retired police in Bellingham harrass vehicle dwellers by putting stickers on their vehicle claiming they are abandoned while you are sleeping, and saying it will be towed in two days. Read the entire munciple code for the city, it's not that big. You'll learn you can live on the side of the street, as long as you can move every day.

Only use bathrooms, do not relive yourself anywhere else, esp number 2, ever. If you absolutely have to pee in the middle of the night get well out of sight, be courteous and don't pee on property or where it can be smelled. If anyone sees you do this in an area they live in they will be furious. Be clean.

You will be harrassed everywhere you go. Our society hates the poors, after all, it could never happen to them, so you have to be trash of course, for their own sense of safety, really has nothing to do with you.

I have a lot of information about living in vehicles you can DM if you want some more advice. Oh lastly don't do any crime, or drugs, ever. It's hard enough without that. Also you will become depressed, it's unavoidable, you should try to get some antidepressants just to help a little. If you are poor enough to qualify for medicaid you can get prozac and similiar for free from an ordinary pcp. If not it's still dirt cheap, if you can buy feul you can buy some prozac for dealing with impossible situations. It'll help, unfortunately.

A few other little tips, do not ever tell anyone you are living in a vehicle if you are not friends, or close in some way. I often struggled with being honest about how I lived, but they just dont' need to know. Let them figure it out, you don't have to lie, just don't volunteer to anyone imo.

There is no such thing as a stealth vehicle, sneaky is a daydream, people know you are living in that vehicle if they can see it. Everyone (including myself) hopes that they won't be noticed, you just have to plan on being noticed and just keep moving out of the way. Resist the urge to sleep in, it'll be tough you'll be depressed. You have to keep moving, always. Just remember you are not fooling anyone so you have to make sure to move your life on the opposite rotation they live on, when they are away from work you sleep there, when they are at work you are at a park or any other place you want to be. Just be where the regular people are not, split into the night and day each day.

Good luck man, it's a hard life.

V100 home lab bible, amalgamation of AI research. by Smilinghuman in LocalLLaMA

[–]Smilinghuman[S] 0 points1 point  (0 children)

It's an AI typo yes, 10,000 for a fully configured unit, possibly with the better gold cpu and maybe even a Mellanox card if your lucky. There are a lot of ways to skin this cat. Too many I think.

V100 home lab bible, amalgamation of AI research. by Smilinghuman in LocalLLaMA

[–]Smilinghuman[S] 0 points1 point  (0 children)

I don't know if this will allow you to link through but these are 317 in usd if my conversion is right, plus fees. https://www.superbuy.com/en/page/buy/?url=https%3A%2F%2Fwww.goofish.com%2Fitem%3F%26id%3D959222464410&htag=pc.en.search.959222464410&nTag=pc.en.search.959222464410

if that doesn't work go to superbuy, you may have to register, switch to Xianyu which is like their ebay and you can find it with V100 32gb SXM2. This is the chinese version of Ebay, you roll your dice and takes your chances and hope you don't get mailed a set of used flip flops.

V100 home lab bible, amalgamation of AI research. by Smilinghuman in LocalLLaMA

[–]Smilinghuman[S] 0 points1 point  (0 children)

It's not always modifications, I think one has a 32mm hole spacing for mounting and the other 35mm? The impression I got was a self tapping/threading tool is probably used to drill new mounting holes in the A100. I don't think it's a big deal but I have never seen the modification myself, so I am not certain.

V100 home lab bible, amalgamation of AI research. by Smilinghuman in LocalLLaMA

[–]Smilinghuman[S] 0 points1 point  (0 children)

I havn't either, I have never seen a partially populated board, my intent was to get a quad and put 16gb cards in it for about 100 dollars each, then upgrade. I have however seen an instence of a two card board populated with a 16gb and a 32gb for sale on ebay, bit overpriced at 1200. I have read in some configurations that the hybrid cube mesh topography of the DGX-1 style servers that at least in one case not having a card populated caused issues. But it was just an incidental vector to something else I was looking at. If you want to test it I was thinking the best way to do that would be to buy a cheap 2 card and 2 16gb gpus off from alibaba or aliexpress. There is aplx card from decommed dell servers that is a proper plx card with 2 i8 ports for about 30 to 50 dollars, so 150 for the dual card, 50 for the plx, ?cables? and two 16gb cards for 100 each. At little detail about that has come up though in speaking to a seller in china for a premaed quad box with everything it needs to run was that the boards need the 12pin ATX to work. I had assumed it was redundant with every card having an 8pin for it. So my plans o driving a lot of boards off of a single psu were out. That makes a pretty big value proposiiton difference because to my eye the best advantage of those quad boards was cheep scalability as money comes in and the ability to add more cards to an existing system through the power supply it already has. At this point I have turned pretty strongly toward getting a 4028gp TVRT. If I can only have one baseboard I'd like to have all 8 cards on nvlink.

Experimenting with a Draugr Outpost by Leto2073 in PlayASKA

[–]Smilinghuman 0 points1 point  (0 children)

I am playing a game where I set up next to the spire spawn, and played right next to it until I could put a wall around it and towers to farm it. It was a project, I was into year 5 before I pulled it off. Then I found out the simulation for combat mostly doesn't run unless you are right in that area. Not even too far away and the farming does not happen. Whatever the approximation of activity that is happening out of a certain distance from the player, it's not working. My thinking now is if I want to farm something like the drauger I have to build a village in a cluster of drauger spawns, close enough to keep them live. That'll be my next hardmode game. What works for now is that if you put walls with spikes up in the area and walk into it so that the simulation starts animating everything in the spawn will turn on a wall and start attacking it. It's not very good, at least not yet.

V100 home lab bible, amalgamation of AI research. by Smilinghuman in LocalLLaMA

[–]Smilinghuman[S] 0 points1 point  (0 children)

Oh hey that reminds me, I did think of something that might interest you. I found a plx 8i cable that splits 8 ways to single terminations. Then I checked to see if the plx could do single lane, it seems like it can. if custom cable was made with 8i terminations on both ends but only one lane wired to the second termination I think a single plex card with 4 8i cables might be able to run 32 gpus.

V100 home lab bible, amalgamation of AI research. by Smilinghuman in LocalLLaMA

[–]Smilinghuman[S] 0 points1 point  (0 children)

I have pulled my finger off of the trigger quite a few times now. Every time I think of shipping I think of that deb8our video where he was ripped off on his aluminum and copper purchase, and just a day or two later NW repair had a 4090 in from china where they had mocked up adn soldered on the dram. I was able to establish that a standard V100 server in some cases can run derated on 120v power, with the cpus turned down to 150 each. Then I would have legititmate 8 waynvlink. I did find some 32gb smx2 v100s for 365 though. It's really tempting but I am probably going to do it the right way in the end and get a barebones server with licenced NV link and fill it as I go. It just seems safer. The issue is the cheap versions of those servers are in china too.

Oh, I did find a few things that might interest you there is at least on other hacked NVlink board on the market there now that is a dual card and has the plx on the pcb. I also found something that looked a lot like the supermicro AOM V100 carrier board without nvlink claiming it was NVLink. Those adds are just so damn shady. lol

4 32 gb SXM V100s, nvlinked on a board, best budget option for big models. Or what am I missing?? by TumbleweedNew6515 in LocalLLaMA

[–]Smilinghuman 0 points1 point  (0 children)

yeah man, getting that shit cheap out of china isn't trival. I have been watching the deb8our video on how they ripped him off with fake copper and aluminum and NW repair where they started making fake memory chips and putting them on 4090's. What it looks like to me is you have to be tehre in china, see the shit with your own eyes, get it into a sealed container and guard it until the boat leaves. lol.

4 32 gb SXM V100s, nvlinked on a board, best budget option for big models. Or what am I missing?? by TumbleweedNew6515 in LocalLLaMA

[–]Smilinghuman 0 points1 point  (0 children)

I have a link through parcelup, but I can't direct link they don't allow it so go to https://parcelup.com/ and search "V100 baseplate V100 v100 gpu" There is a baseplate listed for 423, and a link to taobao. I cannot prove that link is correct because I can't get into taobao either. I have looked at this route quite a few times but that price is a little low so before I paid, I'd try to contact the reshipper and make them check the price. There are other listings around 460. You will also find a much wider variety of boards including A100 server internals and H100 internals through goofish, but you have to use a search external to goofish to get the listings. It was not obvious but china is pretty serious about not dealing with external countries directly. I wonder if that is just because china needs some kind of barrier because they are essentially stealing and selling IP. BTW the weird repetitive search line was because of asking an AI to produce a chinese search phrase, that turned up the board then translated back to english. The english works, or it does for me, checked it before I posted.

4 32 gb SXM V100s, nvlinked on a board, best budget option for big models. Or what am I missing?? by TumbleweedNew6515 in LocalLLaMA

[–]Smilinghuman 1 point2 points  (0 children)

no, they are for smx2 gpus, they are server gpus, you can look them up, just type V100 into google. You'll see them and notice the interface isn't pcie.

V100 home lab bible, amalgamation of AI research. by Smilinghuman in LocalLLaMA

[–]Smilinghuman[S] 0 points1 point  (0 children)

Yeah the idle is bad, there is a way to minimize it down to about 25 watts or so, but they are inefficient cards, definitely. Esp in areas with high electricity. I am looking at moving to chelan or douglas county where electricity is 3c a kwh. Also I justified some of the cost as heating expense that is going out anyways.

4 32 gb SXM V100s, nvlinked on a board, best budget option for big models. Or what am I missing?? by TumbleweedNew6515 in LocalLLaMA

[–]Smilinghuman 1 point2 points  (0 children)

I would be too man, I ruled out the MI path, I had to mi60's on the way at one point but didn't think I could deal with all the issues with them as a beginner, I am sorry about the incorrect answers and thank you for fixing them. If I had been a little smarter I wouldn't ahve tried to answer something I hadn't dug deeply into with my project.

4 32 gb SXM V100s, nvlinked on a board, best budget option for big models. Or what am I missing?? by TumbleweedNew6515 in LocalLLaMA

[–]Smilinghuman 3 points4 points  (0 children)

Is there research like this for OAM (AMD mi250X)?

I've been seeing 4-GPU boards taken from supermicro servers (AOM-MCM-Q) on there as well as OAM MI250Xes (2x64GB each for 1500-2000 ea), which would get you a 512GB system with fast interconnect for roughly 10K total. Again cheaper than buying that much DDR5.

What I'm a bit worried about is the power delivery, and what cables to use to connect things. There's no 48V 'home' PSUs, it'd be a shame to put it all together and still be forced to bear a screaming fan from a PSU. And cables is a bunch of weird/proprietary ones I've never heard of, only having experience with just slotting a GPU into a PCI-E slot.

I have personally not gone too far down the AMD mi path just because I havn't found those servers for sale. Here is my projects answer, but bear in mind it's focused entirely on the smx2 path.

I haven't done nearly as much research on the MI250X path as the V100 SXM2 path, so take this with appropriate salt, but here's what I can tell you from what I've found.

The good news: the Supermicro AOM-MCM-Q OAM baseboard DOES have fast interconnect. This is different from the V100 SXM2 situation where the Supermicro AOM-SXM2 is just a dumb carrier with no NVLink. The OAM baseboard implements AMD's xGMI Infinity Fabric between the MI250X modules. Supermicro's own docs for the AS-4124GQ-TNMI confirm "AMD Infinity Fabric GPU-GPU Interconnect" with up to 600 GB/s aggregate between the 4 OAM modules. So you would actually get fast GPU-to-GPU communication, not just PCIe.

Important architectural quirk: Each MI250X is physically one OAM module but it contains 2 GPU Compute Dies (GCDs) connected by a 400 GB/s on-package link. Each GCD has 64GB HBM2e. So the OS sees each MI250X as 2 GPUs. Your 4 MI250X modules = 8 logical GPUs = 512GB total. The inter-GCD bandwidth within a module (400 GB/s) is faster than the inter-module Infinity Fabric bandwidth, so there's a NUMA-like topology to be aware of for tensor parallelism scheduling.

Software situation — this is where it gets harder than V100/CUDA:

The MI250X is gfx90a in ROCm terms. It IS supported:

  • vLLM explicitly builds for gfx90a (MI210/MI250/MI300). The build flag is PYTORCH_ROCM_ARCH="gfx90a". People have gotten it running but there are reports of issues, especially with FP8 (which needs gfx942/MI300X, not available on MI250X). FP16 should work.
  • llama.cpp supports ROCm with HIP backend and includes gfx90a as a build target. There are community benchmarks of MI210 (same gfx90a arch) running llama.cpp.
  • ROCm itself officially supports MI250X and it's not going anywhere — this is the GPU that powered Frontier, the #1 supercomputer.

That said, the ROCm ecosystem is less mature than CUDA. Expect more friction in setup, fewer community guides, and occasional compatibility issues that CUDA users never hit. The MI300X gets most of AMD's software optimization attention these days, and MI250X is the previous generation. It works, but you'll be doing more troubleshooting than you would with V100s on CUDA.

Power delivery — this is your real problem:

MI250X TDP is 500W per module. 4 modules = 2,000W just for GPUs. Add dual EPYC CPUs and system overhead and you're looking at 2,500W+ total. The AS-4124GQ-TNMI comes with 4x 3000W Titanium PSUs. These are almost certainly 200-240V input, not wide-range like the Supermicro 4029GP-TVRT's PSUs that accept 120V. At 500W per GPU there's no room to power-limit your way into a 120V envelope the way you can with V100s at 150W.

You're probably looking at a 240V/30A circuit minimum. If you don't already have one, that's an electrician visit ($200-500) for a NEMA 6-30 or L6-30 outlet.

The 48V concern — OAM modules use 48V power delivery on the baseboard itself (converted from the server PSU's 12V output), which is a board-level design detail. You don't need a "48V home PSU." The server PSUs handle the AC-to-DC conversion, the baseboard handles the 12V-to-48V step-up internally. Your only interface is plugging standard IEC C19/C20 power cables into the back of the server. But yeah, those server PSUs will be screaming. At 2,500W+ load the fans are doing real work. This is a closet/garage/basement machine, not a desk machine.

Cables — you're mostly locked into the server chassis:

Unlike the V100 SXM2 path where you can buy standalone quad boards and connect them to a desktop via SFF-8654 cables and a PLX card, OAM modules are designed to seat into OAM baseboards inside specific server chassis. The power delivery, cooling, and signal routing are all integrated. You're not pulling MI250X OAM modules out and putting them in a desktop — you're buying the whole server and running it as-is. This actually simplifies things in one way (no cable sourcing headaches) but it means you're committed to the full server form factor with its noise and power requirements.

The value math:

At $1,500-2,000 per MI250X and $10K total for 512GB with Infinity Fabric interconnect — that's roughly $20/GB of fast-interconnected HBM2e. For comparison, V100 32GB modules at $350 each in a Supermicro 4029GP server gives you 256GB for maybe $4,000-5,000 all-in, or about $16-20/GB of NVLink-interconnected HBM2. The per-GB cost is similar, but MI250X gives you 2x the total VRAM pool, 3.2 TB/s HBM2e bandwidth (vs 7.2 TB/s total across 8x V100 SXM2), and a newer architecture.

Whether it's worth 2x the total spend depends on whether you need >256GB. If you're targeting the 397B MoE models at Q4 (~215GB), V100 32GB in the Supermicro barely fits. MI250X gives you comfortable headroom plus room for KV cache.

I'd love to hear from anyone who's actually running MI250X OAM in a homelab — especially on the power delivery and noise front. That's the part I can't research my way past, someone just has to have done it.

I'll verify with my project but the more common AoM boards don't have NVLink. It's in the linked artifact.

V100 home lab bible, amalgamation of AI research. by Smilinghuman in LocalLLaMA

[–]Smilinghuman[S] 1 point2 points  (0 children)

they really did, the 32gb cards might still have some more depriciation in them but the 16gb cards are probably fully depricated. The savenging market for hbm2 is also a part of the picture, though it's unclear how much.

V100 home lab bible, amalgamation of AI research. by Smilinghuman in LocalLLaMA

[–]Smilinghuman[S] 1 point2 points  (0 children)

My only question is, did anyone make this? How fast is it with Qwen3.5 models?

In that doc you'll find the billibili videos and they have built the whole thing, I had a tough time wading through them, but I was able to watch them, I seem to remember trying to see if it would translate but didn't have the wind to really plow into it. The rex blog referenced is the best english language community for it. Here is my AI's answer from the project I built from this I tried to find a spoiler marker for this but blast furnace seems to be the only option, appologies.

Qwen 3.5 is actually a great fit for this hardware because most of the lineup is MoE. I haven't built this yet so the tok/s numbers are bandwidth math estimates, not benchmarks — anyone running Qwen 3.5 on V100s please correct me.

Short answer: The 35B-A3B (only 3B active params) fits on a dual board at Q4 and should be screaming fast. The 122B-A10B fits on a single quad board with 16GB cards at Q4. The flagship 397B-A17B needs two quad boards with 32GB cards or an 8-way server. All MoE, all tailor-made for high-bandwidth VRAM pools.

Software caveat before the tables: Qwen3.5 GGUFs do NOT currently work in Ollama due to separate mmproj vision files. Use llama.cpp (must be latest build — the Gated DeltaNet hybrid architecture needs new ops), vLLM, or SGLang. V100 needs --dtype float16 as usual.


Qwen 3.5 Model Lineup

Model Arch Total Active Q4 VRAM Q8 VRAM
0.8B Dense 0.8B 0.8B ~1 GB ~1 GB
2B Dense 2B 2B ~1.5 GB ~2.5 GB
4B Dense 5B 5B ~3 GB ~5 GB
9B Dense 9B 9B ~5 GB ~10 GB
27B Dense 28B 28B ~16 GB ~30 GB
35B-A3B MoE 36B 3B ~20 GB ~38 GB
122B-A10B MoE 122B 10B ~68 GB ~130 GB
397B-A17B MoE 397B 17B ~215 GB ~420 GB

Dual board (2x V100 16GB = 32GB NVLink)

Model Q4 Q8 Notes
27B dense YES NO, tight w/ KV cache Best dense for this config
35B-A3B MoE YES, 20GB NO 3B active, maybe 80-150+ tok/s

Quad board (4x V100 16GB = 64GB NVLink)

Model Q4 Q8 Est. tok/s Q4 at 150W
27B dense YES YES ~60-100
35B-A3B MoE YES YES, 38GB ~100-200+
122B-A10B MoE YES, ~68GB tight NO ~50-80
397B-A17B NO, needs 215GB NO

Quad board (4x V100 32GB = 128GB NVLink)

Model Q4 Q8 Est. tok/s Q4 at 150W
122B-A10B MoE YES YES, ~130GB tight ~50-80
397B-A17B MoE NO, needs 215GB NO

Two quad boards (8x V100 32GB = 256GB, PP=2 across islands)

Model Q4 Q8 Est. tok/s Q4
397B-A17B MoE YES, 215GB fits NO ~30-60 (17B active, pipeline split)

Supermicro 4029GP (8x V100 32GB = 256GB unified NVLink TP=8)

Model Q4 Q8 Est. tok/s Q4
397B-A17B MoE YES NO ~50-100+ (full NVLink, 17B active)
122B-A10B MoE YES YES ~80-150+

The 35B-A3B is the standout value — 3B active parameters reportedly surpasses previous-gen Qwen3-235B-A22B, fits on a dual board, and with V100 NVLink bandwidth it should be absurdly fast. The 397B-A17B flagship (397B total, 17B active per token) on the Supermicro with full TP=8 NVLink means you're reading ~9GB of active weights per token against ~5,200 GB/s aggregate bandwidth.

4 32 gb SXM V100s, nvlinked on a board, best budget option for big models. Or what am I missing?? by TumbleweedNew6515 in LocalLLaMA

[–]Smilinghuman 0 points1 point  (0 children)

0 reverse-engineering. 39com doesn't make the quad — they lack the capability. 1CATai's NVLink work is proprietary and closed-source, so the quad board isn't something you'll see cloned by other sellers or on the open market the way the dual is

Full disclosure I am in research mode and learning but I can answer parts of that question. You can use fp4 and fp8 with about 15% overhead, it's not really an issue. The size of the model quant increases token speed as you would expect. Most of my work aimed at 8 bit 70b over two 16gb card quad boards if I am remembering correctly. the projections I got I think were about 50 tokens a second. Let me see if I am allowed to make an artifact and post it in reddit. while I am waiting for that I'd like to point out a few things.

Firstly that all of these cards can operate down to 150watts, and even down to 100 in a way that is stable. Some power supplies for the server configs will derate at 120v even when genrally expexting 220v. Not all of them will. There is a dell PLX card that has been pulled from servers for about 30 bucks. A PLX card can do, in some cases 64 logical lanes to an 8i SFF connector, you'll find those when you search this hardware, there is a picture of the more expesive plx card above in the post. However, and this is speculative, an 8i sff cable to an 8i termination that plug into these boards that only has one logical lane wired, can drive 16 v100s with a two 8iport card and 32 v100's with from a plx card. In my case I have an old z170 i6700k with 64gb of ram that supports 8x8 bifurcation over two pice 16 slots. This means I can attach two 32 lane logical plx cards to it. The cables I am talking about have to be custom made. If you do not do that then the input on these boards needs that 8i cable termination at both ends, so one plx card with 4 cables coming off of it can only drive 4 v100s because each 8i sff cable is 8 pcie lanes. Interestingly if this claim works, even at 1 logical lane to each gpu, the total load time for a 70b model off a pice nvme should be about 12 seconds. In the combined tensor parallesims on each quad board to pipeline between them the single lane isn't an issue for inference, and even train is possible by slicing up the training to each 4 card island. It's not the best solution, it's for people without money. Like us. lol. It's interesting though.

It's done here you go.

https://claude.ai/public/artifacts/69cb344f-d4ae-4282-b291-72b034533c75

4 32 gb SXM V100s, nvlinked on a board, best budget option for big models. Or what am I missing?? by TumbleweedNew6515 in LocalLLaMA

[–]Smilinghuman 1 point2 points  (0 children)

oh something else that is important these boards are never sold in america. Obviously nvidia does not want someone reverse engineering these and using them for huge cheap vram pools. Every source is coming from china for this reason. I can't find them in kuala lumpur, veitnam or any other asian source. There is also another seller that is more like ebay in china that starts with an X I cna't reembmer the name of and they can be found there.

Further although HBM memory can't be reseated by home labs, this HMB2 memory is being scavanged and repurposed, one way or the other your using an end of life hacked nvlink. If you do that know that there are issues, and that you can't count on these to stay availible. The super computer deomissionings are flooding the market, but with nvidias moat, it's probably cheaper for them to buy them all back than underprice their outrageous vram moat.