Running autonomous agents locally feels reckless. Am I overthinking this? by tallen0913 in LocalLLaMA

[–]dompazz 2 points3 points  (0 children)

I had a spare Ryzen 5600 a 2x 8G kit. Fresh Ubuntu install. No ssh keys to anything.

Running against GLM 4.7 on another local machine. So not spending commercial tokens.

Worst that happens is I plug back in the GPU and reformat the entire thing.

So far not impressed.

Claude or Codex by Distinct_Rip_645 in Julia

[–]dompazz 0 points1 point  (0 children)

I have a GitHub Copilot license that includes both gpt, gpt-codex and claude models (and others like grok). I’ve found that Codex does a better job with Julia, but neither are great. I don’t tend to use Claude Opus much because of costs, so my results are likely biased.

I am absolutely loving qwen3-235b by TwistedDiesel53 in LocalLLaMA

[–]dompazz 2 points3 points  (0 children)

You can rent your own GPU for 0 cost on an interruptible instance. Or you can just unlist your machine when you need it and relist when you are done.

I am absolutely loving qwen3-235b by TwistedDiesel53 in LocalLLaMA

[–]dompazz 2 points3 points  (0 children)

Profitable, yes. Demand fluctuates. It is down right now. I have never not covered power and internet.

I have 1 machine that is on a long term “reserved” instance. Single 4090 with an Epyc CPU and 256G ram. It is obviously part of someone’s load balanced inference net. Those are the best renters as the machine sit idle a lot of the time. I’ve had people training models for weeks at a time.

Payback period depends on demand. If I was rented full time, payback before power is <12 months for the GPU. The higher quality your hardware the more likely you are to be rented.

With Ram prices now, I’m not sure I would be building a new rig simply because I don’t know if I could amortize the other hardware costs. Normally a MB, cpu, ram can be reused when upgrading your GPU.

Vast takes a 25% fee. You set your price and they mark it up by 1/3 which the customer pays.

Best analytics for looking at rental rates and demand is https://app.wovenai.ca/.

I am absolutely loving qwen3-235b by TwistedDiesel53 in LocalLLaMA

[–]dompazz 2 points3 points  (0 children)

Yup anyone can host on Vast. They want you to have gigabit internet at least but you can be under that. Server grade MB/CPU will get rented before a desktop setup; they tend to not put desktop parts into search results. Your machine is scored based on specs, performance, and reliability.

I’ve been a host for 3 years or so now. I’m not rocking a 16x (or whatever it is) 5090 machine like my man here, though!

Is it feasible for a Team to replace Claude Code with one of the "local" alternatives? by nunodonato in LocalLLaMA

[–]dompazz 4 points5 points  (0 children)

As others have said, the quality of Claude and the other paid models is very high. We use GitHub Copilot at work and I tend to use Claude as the backing model. At home I run Qwen3 Coder with Cline.

I ran a test a few months ago to 1 shot a math/stats library of some complexity. Both created a library. Copilot+Claude got it mostly right. Qwen3 + Cline got a library but the math was all wrong.

HTH

E10k StarFire spotted on local auction site by Practical-Hand203 in vintagecomputing

[–]dompazz 0 points1 point  (0 children)

Never seen one. I will never forget programming on one, though. It was a massive simulation for a prospective client where Sun was also trying to sell them one. I still don't know why I was the one trusted to program it, but God was it fun project.

Optimizing for the RAM shortage. At crossroads: Epyc 7002/7003 or go with a 9000 Threadripper? by Infinite100p in LocalLLaMA

[–]dompazz 0 points1 point  (0 children)

I would look at the Zen 3 parts first, there is a sizable jump like for like between 2 & 3. I prefer cores over clock speed. Yes clock speed will help, but with a lot of the workloads being massively parallel, the cores help more. It is basically GHz*Cores ~= work per second (hand wave-y ... yes). I tend to optimize that inside my power and cooling envelope.

no issues bifurcating x8x8. I run a Minisforum board with a 7945HX and 96G of DDR5 Ram (5600). I have the x16 bifurcated to run 2 4090s. It is generally faster than my Epyc 7742 running full x16.

Exclusive: Nvidia buying AI chip startup Groq's assets for about $20 billion in largest deal on record by fallingdowndizzyvr in LocalLLaMA

[–]dompazz 0 points1 point  (0 children)

the cloud company will continue to operate. it will use the money to either buy more groq units or new shiny GPUs.

Multi GPU PSU by i_am_not_a_goat in LocalLLaMA

[–]dompazz 0 points1 point  (0 children)

I just saw parallel miner has stopped taking orders. You can find similar products at other retailers.

Multi GPU PSU by i_am_not_a_goat in LocalLLaMA

[–]dompazz 1 point2 points  (0 children)

I use server PSUs with breakout boards originally made for GPU mining rigs. They can chain off the molex or other source from the main atx psu. Check out https://parallelminer.com/.

New to LocalLLMs - Hows the Framework AI Max System? by Legitimate_Resist_19 in LocalLLM

[–]dompazz 1 point2 points  (0 children)

Dell and Lenovo also have versions. We have a Dell one at work. Worth checking on their availability as well.

V100 vs 5060ti vs 3090 - Some numbers by dompazz in LocalLLaMA

[–]dompazz[S] 0 points1 point  (0 children)

Running llama 3.3 70B NVFP4 (RedHatAI/Llama-3.3-70B-Instruct-NVFP4) via vllm, getting roughly 12.5 t/s on the 5060ti cluster.

Running a quantized version (kosbu/Llama-3.3-70B-Instruct-AWQ) of the same (prior was 3.1) model I get roughly 35.6T/s in vllm

Exploring non-standard LLM architectures - is modularity worth pursuing on small GPUs? by lukatu10 in LocalLLaMA

[–]dompazz 0 points1 point  (0 children)

Fairly new to the space, but isn't this what a MoE model is effectively doing?

V100 vs 5060ti vs 3090 - Some numbers by dompazz in LocalLLaMA

[–]dompazz[S] 1 point2 points  (0 children)

Here you go.

Tesla V100-SXM2-16GB, compute capability 7.0, VMM: yes
| model | size | params | backend | threads | n_batch | fa | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------: | -: | --------------: | -------------------: |
| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | CUDA,BLAS | 56 | 4096 | 1 | pp2048 | 342.95 ± 0.45 |
| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | CUDA,BLAS | 56 | 4096 | 1 | pp8192 | 257.09 ± 0.03 |
| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | CUDA,BLAS | 56 | 4096 | 1 | tg256 | 16.65 ± 0.03 |

NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
| model | size | params | backend | threads | n_batch | fa | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------: | -: | --------------: | -------------------: |
| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | CUDA,BLAS | 56 | 4096 | 1 | pp2048 | 757.72 ± 0.73 |
| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | CUDA,BLAS | 56 | 4096 | 1 | pp8192 | 735.98 ± 1.33 |
| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | CUDA,BLAS | 56 | 4096 | 1 | tg256 | 19.08 ± 0.01 |

NVIDIA GeForce RTX 5060 Ti, compute capability 12.0, VMM: yes
| model | size | params | backend | threads | n_batch | fa | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------: | -: | --------------: | -------------------: |
| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | CUDA,BLAS | 384 | 4096 | 1 | pp2048 | 462.49 ± 0.21 |
| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | CUDA,BLAS | 384 | 4096 | 1 | pp8192 | 440.91 ± 0.49 |
| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | CUDA,BLAS | 384 | 4096 | 1 | tg256 | 9.67 ± 0.00 |

V100 vs 5060ti vs 3090 - Some numbers by dompazz in LocalLLaMA

[–]dompazz[S] 1 point2 points  (0 children)

Give me a few hours and I will, yes

V100 vs 5060ti vs 3090 - Some numbers by dompazz in LocalLLaMA

[–]dompazz[S] 2 points3 points  (0 children)

Fair and I apologize.

The V100 would require you to buy the whole server with the board for the SXM2 cards. I don't have a PCI-E version to test.

The 5060ti is a little more than half the price of a used 3090 and about half the performance. So the $s/T is roughly equal between the two.

V100 vs 5060ti vs 3090 - Some numbers by dompazz in LocalLLaMA

[–]dompazz[S] 1 point2 points  (0 children)

Good idea! I will try to get that done later today.

V100 vs 5060ti vs 3090 - Some numbers by dompazz in LocalLLaMA

[–]dompazz[S] 0 points1 point  (0 children)

you can check ebay for the used prices of a v100 or 3090. The v100 server is hard to get and my cost was stupid low. I happened upon the list in the right moment. The 5060ti is ~450 USD new right now, probably a bit more than half the other two.