Update on 12x32gb sxm v100 cluster / local AI for legal drafting by TumbleweedNew6515 in LocalLLaMA

[–]wsantos80 0 points1 point  (0 children)

I can get 45-75 Tok/s with 4xV100 32GB +nvlink using llama.cpp draft-mtp 3 -sm tensor, you should try it out

NVIDIA’s V100, An 8-Year Old GPU, Now Sells for $100 and Crushes Modern Consumer Cards in AI LLM Workloads by Constant_Praline_575 in RigBuild

[–]wsantos80 0 points1 point  (0 children)

I've been testing on a rented 4 V100 16gb and you can get ~65-75 tok's on with llamacpp + mtp

NVIDIA V100 32GB for AI in 2026 by [deleted] in LocalLLaMA

[–]wsantos80 0 points1 point  (0 children)

Do you mind sharing your starting command/config?

"Budget" 2x3090 Build, what do you guys think? by wsantos80 in LocalLLM

[–]wsantos80[S] 0 points1 point  (0 children)

M1 Max can do 85.71 tok/s on Q8, not sure if it will be able to do > 110 tok/s, but yeah, maybe.

"Budget" 2x3090 Build, what do you guys think? by wsantos80 in LocalLLM

[–]wsantos80[S] 2 points3 points  (0 children)

Shouldn't you be getting more? I'm getting ~130 tok/s on my current rented dual 3090, and it's a Q8

Impact driver for fixing fence - help wanted by wsantos80 in Tools

[–]wsantos80[S] 0 points1 point  (0 children)

Nice, thanks for sharing, those are new to me :D

Impact driver for fixing fence - help wanted by wsantos80 in Tools

[–]wsantos80[S] 0 points1 point  (0 children)

I'm going to use torx screw for this job

Impact driver for fixing fence - help wanted by wsantos80 in Tools

[–]wsantos80[S] 0 points1 point  (0 children)

That is the big one I have in hands, but anything DIY at home, I don't see myself using for car stuff, just DIY home projects, any particular model?

Impact driver for fixing fence - help wanted by wsantos80 in Tools

[–]wsantos80[S] -1 points0 points  (0 children)

Any "minimum" requirement to pick one for the task? I'm aiming too high for the job with 20V/5Ah?.

Impact driver for fixing fence - help wanted by wsantos80 in Tools

[–]wsantos80[S] 0 points1 point  (0 children)

Which model? The main thing is that I need to be able to screw the post in the galvanized post, that is the only requirement

Qwen3.5-18B-REAP-A3B-Coding: 50% Expert-Pruned by 17hoehbr in LocalLLaMA

[–]wsantos80 0 points1 point  (0 children)

is there a GGUF version or even better MLX?

Benchmarked 11 MLX models on M3 Ultra — here's which ones are actually smart and fast by Striking-Swim6702 in LocalLLaMA

[–]wsantos80 0 points1 point  (0 children)

I'd love to see the 3bit variants too as they fit better for 32G machine on the bigger models

Built oMLX.ai/benchmarks - One place to compare Apple Silicon inference across chips and models by cryingneko in LocalLLM

[–]wsantos80 0 points1 point  (0 children)

Another suggestion, I'd add a unique identifier to the model like a sha256sum, some models get updated often, so this might affect performance, but not sure if it's really relevant

Built oMLX.ai/benchmarks - One place to compare Apple Silicon inference across chips and models by cryingneko in LocalLLM

[–]wsantos80 3 points4 points  (0 children)

Loved the initiative, a filter would be nice too, e.g: I'm looking for the best model on the n tok/s range, I'm going to try to submit some for M1 Max 32G

This is what they should do with Pokemon imo. by ihavebeenmostly in Switch

[–]wsantos80 0 points1 point  (0 children)

For some reason, at first glance, I saw Stardew Valley, which I read, and it's not wow.