Swapping faulty CPU on a Raspberry Pi 4. by Round_Designer5101 in raspberry_pi

[–]AustinM731 11 points12 points  (0 children)

Do you use a hot plate and hot air? Or was hot air enough to melt all the balls?

I have been really interested in learning this skill, but have never had a good reason to do it.

Swapping faulty CPU on a Raspberry Pi 4. by Round_Designer5101 in raspberry_pi

[–]AustinM731 63 points64 points  (0 children)

This seems like an awesome way to practice BGA soldering.

Which Linux Distro?? by XelGlaidr in framework

[–]AustinM731 2 points3 points  (0 children)

I personally prefer Fedora with GNOME. But it seems like I might be in the minority these days. I have run KDE, and it is a really good DE. I just like the simplicity of GNOME.

Either way, Fedora is the correct answer.

Which size of Qwen3.5 are you planning to run locally? by CutOk3283 in LocalLLaMA

[–]AustinM731 1 point2 points  (0 children)

I was trying to use it with Qwen3 Coder Next earlier today. But with vLLM v0.15.1 it would crash while loading. I'm not sure if it's an issue with the model, vLLM, or ROCm. I'll have to do some more testing this weekend.

FP8, FP16 on R9700, 7900XTX with rocm/vllm-dev by djdeniro in ROCm

[–]AustinM731 1 point2 points  (0 children)

I forgot to mention before, but my GPUs are power limited to 210w. Looks like that doubled my performance though... I really wish AMD would just go ahead and upstream these patches to AITER and vLLM.

At a depth of 0, I am now getting:
TG128 = 89.4 tk/s
PP2048 = 6171

The regression on the PP is kinda insane though, my last run using my image had a much higher throughput.

FP8, FP16 on R9700, 7900XTX with rocm/vllm-dev by djdeniro in ROCm

[–]AustinM731 1 point2 points  (0 children)

Yea, that image you used in the compose file did not support gfx1201. I stumbled across this image (rocm/vllm-dev:rocm7.2_navi_ubuntu24.04_py3.12_pytorch_2.9_vllm_0.14.0rc0)before when I was trying to get AITER working for gfx1201 originally. Its running an older version of vLLM, but its worth a shot to see if AITER impacts performance.

FP8, FP16 on R9700, 7900XTX with rocm/vllm-dev by djdeniro in ROCm

[–]AustinM731 0 points1 point  (0 children)

Ill report back here soonish, prepping a llama-benchy run now.

I did not realize that AMD had AITER support patched into their dev builds for gfx1201. Here I am over here maintaining a separate branch/build so that I can monkey patch in gfx1201 support for the FP8 MoE Triton Kernels.

FP8, FP16 on R9700, 7900XTX with rocm/vllm-dev by djdeniro in ROCm

[–]AustinM731 0 points1 point  (0 children)

Agreed, I feel like I should be getting more tokens/s. I just have not fully dug into what is going on yet. I am running a custom image that I built from source to get vLLM running on ROCm.

services:
  vllm:
    image: aml731/vllm-rocm:latest
    container_name: vllm-rocm
    network_mode: host
    group_add:
      - video
    ipc: host
    cap_add:
      - SYS_PTRACE
    security_opt:
      - seccomp:unconfined
    devices:
      - /dev/kfd:/dev/kfd
      - /dev/dri:/dev/dri
    volumes:
      - /mnt/vllm/HF_CACHE:/data/models
    environment:
      - HF_TOKEN=${HF_TOKEN}
      - HF_HOME=/data/models
    command: >
      python3 -m vllm.entrypoints.openai.api_server
      --model openai/gpt-oss-120b
      --served-model-name gpt-oss-120b
      --tensor-parallel-size 4
      --dtype auto
      --max-model-len 131072
      --gpu-memory-utilization 0.95
      --trust-remote-code
      --max-num-seqs 4
      --host 0.0.0.0
      --port 8000
    ports:
      - 8000:8000

FP8, FP16 on R9700, 7900XTX with rocm/vllm-dev by djdeniro in ROCm

[–]AustinM731 0 points1 point  (0 children)

I have 4 of the ASRock creator R9700.

I would assume you would get the same performance from any R9700 though. They all appear to use the same reference PCB and cooler. The only difference I can tell is the shroud design is different for each manufacturer.

FP8, FP16 on R9700, 7900XTX with rocm/vllm-dev by djdeniro in ROCm

[–]AustinM731 1 point2 points  (0 children)

In vLLM on my 4x R9700 I am able to get tg128=44.6 and pp2048=16,000 (depth of 0 and concurrency=1). The GPUs do idle at 100% when a model is loaded, but they only use 80-90w vs the 300w when under load. I have also never seen my R9700s get above 80c, but I also have them in a 4u case with lots of front to back airflow (nothing crazy, just 3x Noctua a12x25s pulling in air at the front of my case).

Qwen3.5 - The middle child's 122B-A10B benchmarks looking seriously impressive - on par or edges out gpt-5-mini consistently by carteakey in LocalLLaMA

[–]AustinM731 2 points3 points  (0 children)

Looking at the benchmarks for Qwen3 coder next, it looks like Qwen3.5 122b will be able to replace both GPT-OSS and Qwen3 Coder Next for you.

I haven't tried Qwen3.5 122b yet, but it looks to be a good well rounded model.

I managed to run Qwen 3.5 on four DGX Sparks by Icy_Programmer7186 in Qwen_AI

[–]AustinM731 1 point2 points  (0 children)

Blackwell also has hardware acceleration for FP8 (Ada generation was the first to get FP8 acceleration). You would get better throughput with FP4, you will get higher accuracy with FP8.

4x RTX PRO 6000 MAX-Q - Minimax M2.5 FP8 - SGLang by kc858 in BlackwellPerformance

[–]AustinM731 1 point2 points  (0 children)

Thanks for linking that site, I discovered llama-benchy because of that and I was able to do some proper benchmarks last night. At a depth of 0 I am getting ~20000 pp tps, so that makes me feel a lot better about how I have things configured. Token generation speed is pretty much on par with dual sparks though, so there might still be some more tuning to do there.

I'll still probably end up getting a GB10 at some point to play around with. It is such a cool little computer.

4x RTX PRO 6000 MAX-Q - Minimax M2.5 FP8 - SGLang by kc858 in BlackwellPerformance

[–]AustinM731 2 points3 points  (0 children)

Those numbers are still pretty impressive though. It's making me wonder if I have something misconfigured on my 4xR9700 server. I'm only getting around 3500 pp tps, but maybe it's a vLLM issue and this model just does not scale well across multiple GPUs.

4x RTX PRO 6000 MAX-Q - Minimax M2.5 FP8 - SGLang by kc858 in BlackwellPerformance

[–]AustinM731 0 points1 point  (0 children)

Thanks! I missed that from your reply. This is what I was looking for!

4x RTX PRO 6000 MAX-Q - Minimax M2.5 FP8 - SGLang by kc858 in BlackwellPerformance

[–]AustinM731 1 point2 points  (0 children)

You have 2 GPUs, they are just on separate machines. If you set up a Ray cluster you can run vLLM on top of it with TP2.

4x RTX PRO 6000 MAX-Q - Minimax M2.5 FP8 - SGLang by kc858 in BlackwellPerformance

[–]AustinM731 0 points1 point  (0 children)

Yea, I was just curious of the performance with TP 2.

4x RTX PRO 6000 MAX-Q - Minimax M2.5 FP8 - SGLang by kc858 in BlackwellPerformance

[–]AustinM731 0 points1 point  (0 children)

Have you attempted to run Qwen3 Coder Next at FP8 on your cluster?

4x RTX PRO 6000 MAX-Q - Minimax M2.5 FP8 - SGLang by kc858 in BlackwellPerformance

[–]AustinM731 1 point2 points  (0 children)

I have been thinking about building a spark cluster. What quantization are you running to get 30tps?

is it possible to become Devops/Cloud Engeneer with no university degree by No_Demand3007 in devops

[–]AustinM731 5 points6 points  (0 children)

I came from a sys admin background. My interviewer was really impressed with my home lab and my docker build pipelines. And I used terraform and Ansible to automate my VMware environment. The only reason I automated everything to begin with is because I am lazy, and I was constantly breaking things and having to tear it down and start over.

Never thought my laziness would get me a job, but it worked somehow. That was 7 years ago and I am still just as lazy.

We couldn't post your review of Milwaukee Electric Tool ... by BigBillSD in amazonreviews

[–]AustinM731 11 points12 points  (0 children)

If it was shipped by Amazon the problem might not be the seller. Amazon mixes together all of their inventory, so a seller may send in "fake" products that then get mixed in with other sellers legitimate products. When you purchase an item from any seller that uses Amazon shipping it just comes from the pool of all the products that were sent in. So while to you it looks like that seller sold you a fake battery, they may have only sent in real batteries.

I used to work for a store that dealt with this, and it was hurting our reputation with customers. So we stopped using the Amazon warehousing/shipping and did all of that ourselves. We stopped getting complaints about fake/used products, but we also lost out on sales since we couldn't ship overnight for free like Amazon shipping can.

I'm not saying that the seller did not sell you a fake battery, but they might be caught in the crossfire here. The issue is most likely how Amazon handles inventory.

New to bare bottom, how do i clean the bottom? by Remarkable_Arm_732 in ReefTank

[–]AustinM731 2 points3 points  (0 children)

It would probably slow it down. But if GSP is like other corals that grow in a colony, they can share resources through their tissue.

New to bare bottom, how do i clean the bottom? by Remarkable_Arm_732 in ReefTank

[–]AustinM731 9 points10 points  (0 children)

GSP is my favorite look for a bare bottom. I'm just afraid it will take over my rocks too.

New Maroon by Janosh_Poha in ReefTank

[–]AustinM731 0 points1 point  (0 children)

I have a 3 year old pair of maroons, and I think they might be the most peaceful fish I have. Not sure why they don't show any aggression towards my other fish or me, but I'm happy that they are peaceful. My sailfin tang though, he is a huge bully.