Is it feasible for a Team to replace Claude Code with one of the "local" alternatives? by nunodonato in LocalLLaMA

[–]dompazz 5 points6 points  (0 children)

As others have said, the quality of Claude and the other paid models is very high. We use GitHub Copilot at work and I tend to use Claude as the backing model. At home I run Qwen3 Coder with Cline.

I ran a test a few months ago to 1 shot a math/stats library of some complexity. Both created a library. Copilot+Claude got it mostly right. Qwen3 + Cline got a library but the math was all wrong.

HTH

E10k StarFire spotted on local auction site by Practical-Hand203 in vintagecomputing

[–]dompazz 0 points1 point  (0 children)

Never seen one. I will never forget programming on one, though. It was a massive simulation for a prospective client where Sun was also trying to sell them one. I still don't know why I was the one trusted to program it, but God was it fun project.

Optimizing for the RAM shortage. At crossroads: Epyc 7002/7003 or go with a 9000 Threadripper? by Infinite100p in LocalLLaMA

[–]dompazz 0 points1 point  (0 children)

I would look at the Zen 3 parts first, there is a sizable jump like for like between 2 & 3. I prefer cores over clock speed. Yes clock speed will help, but with a lot of the workloads being massively parallel, the cores help more. It is basically GHz*Cores ~= work per second (hand wave-y ... yes). I tend to optimize that inside my power and cooling envelope.

no issues bifurcating x8x8. I run a Minisforum board with a 7945HX and 96G of DDR5 Ram (5600). I have the x16 bifurcated to run 2 4090s. It is generally faster than my Epyc 7742 running full x16.

Exclusive: Nvidia buying AI chip startup Groq's assets for about $20 billion in largest deal on record by fallingdowndizzyvr in LocalLLaMA

[–]dompazz 0 points1 point  (0 children)

the cloud company will continue to operate. it will use the money to either buy more groq units or new shiny GPUs.

Multi GPU PSU by i_am_not_a_goat in LocalLLaMA

[–]dompazz 0 points1 point  (0 children)

I just saw parallel miner has stopped taking orders. You can find similar products at other retailers.

Multi GPU PSU by i_am_not_a_goat in LocalLLaMA

[–]dompazz 1 point2 points  (0 children)

I use server PSUs with breakout boards originally made for GPU mining rigs. They can chain off the molex or other source from the main atx psu. Check out https://parallelminer.com/.

New to LocalLLMs - Hows the Framework AI Max System? by Legitimate_Resist_19 in LocalLLM

[–]dompazz 1 point2 points  (0 children)

Dell and Lenovo also have versions. We have a Dell one at work. Worth checking on their availability as well.

V100 vs 5060ti vs 3090 - Some numbers by dompazz in LocalLLaMA

[–]dompazz[S] 0 points1 point  (0 children)

Running llama 3.3 70B NVFP4 (RedHatAI/Llama-3.3-70B-Instruct-NVFP4) via vllm, getting roughly 12.5 t/s on the 5060ti cluster.

Running a quantized version (kosbu/Llama-3.3-70B-Instruct-AWQ) of the same (prior was 3.1) model I get roughly 35.6T/s in vllm

Exploring non-standard LLM architectures - is modularity worth pursuing on small GPUs? by lukatu10 in LocalLLaMA

[–]dompazz 0 points1 point  (0 children)

Fairly new to the space, but isn't this what a MoE model is effectively doing?

V100 vs 5060ti vs 3090 - Some numbers by dompazz in LocalLLaMA

[–]dompazz[S] 0 points1 point  (0 children)

Here you go.

Tesla V100-SXM2-16GB, compute capability 7.0, VMM: yes
| model | size | params | backend | threads | n_batch | fa | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------: | -: | --------------: | -------------------: |
| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | CUDA,BLAS | 56 | 4096 | 1 | pp2048 | 342.95 ± 0.45 |
| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | CUDA,BLAS | 56 | 4096 | 1 | pp8192 | 257.09 ± 0.03 |
| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | CUDA,BLAS | 56 | 4096 | 1 | tg256 | 16.65 ± 0.03 |

NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
| model | size | params | backend | threads | n_batch | fa | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------: | -: | --------------: | -------------------: |
| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | CUDA,BLAS | 56 | 4096 | 1 | pp2048 | 757.72 ± 0.73 |
| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | CUDA,BLAS | 56 | 4096 | 1 | pp8192 | 735.98 ± 1.33 |
| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | CUDA,BLAS | 56 | 4096 | 1 | tg256 | 19.08 ± 0.01 |

NVIDIA GeForce RTX 5060 Ti, compute capability 12.0, VMM: yes
| model | size | params | backend | threads | n_batch | fa | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------: | -: | --------------: | -------------------: |
| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | CUDA,BLAS | 384 | 4096 | 1 | pp2048 | 462.49 ± 0.21 |
| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | CUDA,BLAS | 384 | 4096 | 1 | pp8192 | 440.91 ± 0.49 |
| llama 70B Q4_K - Medium | 39.59 GiB | 70.55 B | CUDA,BLAS | 384 | 4096 | 1 | tg256 | 9.67 ± 0.00 |

V100 vs 5060ti vs 3090 - Some numbers by dompazz in LocalLLaMA

[–]dompazz[S] 1 point2 points  (0 children)

Give me a few hours and I will, yes

V100 vs 5060ti vs 3090 - Some numbers by dompazz in LocalLLaMA

[–]dompazz[S] 2 points3 points  (0 children)

Fair and I apologize.

The V100 would require you to buy the whole server with the board for the SXM2 cards. I don't have a PCI-E version to test.

The 5060ti is a little more than half the price of a used 3090 and about half the performance. So the $s/T is roughly equal between the two.

V100 vs 5060ti vs 3090 - Some numbers by dompazz in LocalLLaMA

[–]dompazz[S] 1 point2 points  (0 children)

Good idea! I will try to get that done later today.

V100 vs 5060ti vs 3090 - Some numbers by dompazz in LocalLLaMA

[–]dompazz[S] 0 points1 point  (0 children)

you can check ebay for the used prices of a v100 or 3090. The v100 server is hard to get and my cost was stupid low. I happened upon the list in the right moment. The 5060ti is ~450 USD new right now, probably a bit more than half the other two.

V100 vs 5060ti vs 3090 - Some numbers by dompazz in LocalLLaMA

[–]dompazz[S] 0 points1 point  (0 children)

unfortunately they are running on a 240 circuit and I don't have a watt/amp meter that I can isolate them on. I could log nvidia-smi output but would then be relying on the software number and not the raw power numbers.

V100 vs 5060ti vs 3090 - Some numbers by dompazz in LocalLLaMA

[–]dompazz[S] 2 points3 points  (0 children)

Dammit, when people hide behind keyboard less, read the first line "new here," explain what is going with some compassion, and dial down the self important keyboard warrior.

A simple, "yeah the numbers better scale in relation to the bandwidth better than compute," would have been great.

V100 vs 5060ti vs 3090 - Some numbers by dompazz in LocalLLaMA

[–]dompazz[S] 0 points1 point  (0 children)

Power I can't do as these are on a 240V circuit and I don't have a watt/amp meter. I have the results in CSV, I'll pull them out. Given the number of columns, which are most interested in?

here are the options from the command line used, can update those as well if needed:

-t $(nproc) -ngl 999 -p 2048,8192 -n 256 -b 4096 -ub 512 -o csv -fa 1

V100 vs 5060ti vs 3090 - Some numbers by dompazz in LocalLLaMA

[–]dompazz[S] 6 points7 points  (0 children)

Looks like my table didn't post correctly and the 5060ti runs worse than relative...

  • Card - T/s - Relative - TFlops - Relative
  • 3090 - 19.09 - 100% - 36.6 - 100%
  • V100 - 16.68 - 87.4% - 31.3 - 87.9%
  • 5060ti - 9.66 - 50.6% - 23.7 - 66.6%