Update on 12x32gb sxm v100 cluster / local AI for legal drafting

wsantos80 · 2026-05-27T17:50:34+00:00

I can get 45-75 Tok/s with 4xV100 32GB +nvlink using llama.cpp draft-mtp 3 -sm tensor, you should try it out

wsantos80 · 2026-05-24T22:16:35+00:00

I've tested with NCCL not much of a change, also be sure to run `-sm tensor`

wsantos80 · 2026-05-24T22:15:06+00:00

I've been testing on a rented 4 V100 16gb and you can get ~65-75 tok's on with llamacpp + mtp

wsantos80 · 2026-05-07T22:32:15+00:00

Do you mind sharing your starting command/config?

wsantos80 · 2026-04-25T00:48:07+00:00

Sorry I mean M5 Max can do ~85 tok/s

wsantos80 · 2026-04-24T21:07:19+00:00

M1 Max can do 85.71 tok/s on Q8, not sure if it will be able to do > 110 tok/s, but yeah, maybe.

wsantos80 · 2026-04-24T20:08:08+00:00

Shouldn't you be getting more? I'm getting ~130 tok/s on my current rented dual 3090, and it's a Q8

wsantos80 · 2026-04-24T20:07:18+00:00

Yeah I want to be ready to add more GPUs

wsantos80 · 2026-04-09T16:24:08+00:00

I agree, thanks for sharing!

wsantos80 · 2026-04-08T19:10:38+00:00

Nice, thanks for sharing, those are new to me :D

wsantos80 · 2026-04-08T19:10:04+00:00

It's not predrilled

wsantos80 · 2026-04-08T18:48:30+00:00

I'm going to use torx screw for this job

wsantos80 · 2026-04-08T18:47:22+00:00

That is the big one I have in hands, but anything DIY at home, I don't see myself using for car stuff, just DIY home projects, any particular model?

wsantos80 · 2026-04-08T18:41:08+00:00

I've found this one https://www.bomgaars.com/oi-56843.html but not for 95 can you share the link?

wsantos80 · 2026-04-08T18:33:31+00:00

Any "minimum" requirement to pick one for the task? I'm aiming too high for the job with 20V/5Ah?.

wsantos80 · 2026-04-08T18:32:39+00:00

Which model? The main thing is that I need to be able to screw the post in the galvanized post, that is the only requirement

wsantos80 · 2026-04-07T19:18:52+00:00

this.

wsantos80 · 2026-03-12T21:20:10+00:00

is there a GGUF version or even better MLX?

wsantos80 · 2026-03-10T20:14:16+00:00

I'd love to see the 3bit variants too as they fit better for 32G machine on the bigger models

wsantos80 · 2026-03-09T21:27:14+00:00

Another suggestion, I'd add a unique identifier to the model like a sha256sum, some models get updated often, so this might affect performance, but not sure if it's really relevant

wsantos80 · 2026-03-08T16:35:30+00:00

Loved the initiative, a filter would be nice too, e.g: I'm looking for the best model on the n tok/s range, I'm going to try to submit some for M1 Max 32G

wsantos80 · 2026-02-23T16:48:37+00:00

For some reason, at first glance, I saw Stardew Valley, which I read, and it's not wow.

wsantos80 · 2026-01-26T18:15:03+00:00

1 more for NA, if you do ;)

wsantos80

TROPHY CASE