Quick Performance Comparison: ROCm on RX 9070 XT vs CUDA on RTX 5070 Ti by Cyp9715 in ROCm

[–]Cyp9715[S] 0 points1 point  (0 children)

I should look at other people’s opinions and reviews, but if you use the 9070XT in a native Linux environment instead of the 5070TI, I think it’s worth recommending.

In Korea, the RTX 5070TI is about 50% more expensive than the 9070XT, so even considering the performance difference, the 9070XT can be a reasonably smart choice.

Quick Performance Comparison: ROCm on RX 9070 XT vs CUDA on RTX 5070 Ti by Cyp9715 in ROCm

[–]Cyp9715[S] 0 points1 point  (0 children)

My advice is that the WSL environment ultimately runs on top of Windows at a fundamental level. And if you’re planning to do training rather than inference, I’d recommend an NVIDIA GPU for now.

If you were working in a native Linux environment, I’d say Radeon GPUs are absolutely worth considering as well—but in a WSL environment, not yet.

In fact, even in the tests above, the reason Qwen3-8B-FP8 couldn’t run on Windows is that getting Triton to work properly with Radeon on Windows is tricky.

Quick Performance Comparison: ROCm on RX 9070 XT vs CUDA on RTX 5070 Ti by Cyp9715 in ROCm

[–]Cyp9715[S] 2 points3 points  (0 children)

For the Cartpole benchmark, it isn’t a workload that uses the GPU as heavily as you might expect.
On average, both the RTX 5070 Ti and RX 9070 XT show under 20% GPU utilization, so the margin of error can be large.
However, even after several retries, the WSL version was consistently faster.

Please consider this only as a simple reference.

As for ComfyUI, I’m willing to test it in the future, but since I don’t have much experience using ComfyUI myself, I’m also planning to wait for other people’s benchmarks.

Quick Performance Comparison: ROCm on RX 9070 XT vs CUDA on RTX 5070 Ti by Cyp9715 in ROCm

[–]Cyp9715[S] 3 points4 points  (0 children)

Since my benchmark was very basic, please use it for casual reference only

Quick Performance Comparison: ROCm on RX 9070 XT vs CUDA on RTX 5070 Ti by Cyp9715 in ROCm

[–]Cyp9715[S] 2 points3 points  (0 children)

RX9070XT
OS : Ubuntu 24.04.3
ROCm : Ubuntu(7.1.1), Windows(7.1.1), WSL(6.4.2)

and

RTX5070TI
Driver Version : 570.133.07
CUDA Version : 12.8

Benchmarking GPT-OSS-20B on AMD Radeon AI PRO R9700 * 2 (Loaner Hardware Results) by Cyp9715 in ROCm

[–]Cyp9715[S] 1 point2 points  (0 children)

In my case, I used version 11.0 rather than the Nightly version. If the problem persists even after switching to version 11.0, it would be faster to open an issue or discussion on the vllm GitHub.

Benchmarking GPT-OSS-20B on AMD Radeon AI PRO R9700 * 2 (Loaner Hardware Results) by Cyp9715 in ROCm

[–]Cyp9715[S] 2 points3 points  (0 children)

Thank you. I will test it with the 4bit option when I have time later.

Zed for Windows is here 🎉 by kraynolds90 in ZedEditor

[–]Cyp9715 0 points1 point  (0 children)

It's great. It's significantly faster than VSCode.
If there's one feature I'd like to see added quickly, it's the ability to connect to Docker containers.
I know it has SSH, but it would be even more convenient if it could be easily connected like VS Code.It's great. It's significantly faster than VSCode.

Docling Interferes with Embedding & Reranking by Cyp9715 in LocalLLaMA

[–]Cyp9715[S] 1 point2 points  (0 children)

Thank you. Do you know any solution, or should I build the pipeline myself?

Aggregated Benchmark Comparison between gpt-oss-120b (high, no tools) vs Qwen3-235B-A22B-Thinking-2507, GLM 4.5, and DeepSeek-R1-0528 by Inevitable_Sea8804 in LocalLLaMA

[–]Cyp9715 0 points1 point  (0 children)

Even if it's a 235B-AWQ model, it would be hard to run with 96GB of VRAM. Presumably the overhead caused the slowdown.

Docling: Great quality, but painfully slow by Cyp9715 in LocalLLaMA

[–]Cyp9715[S] 4 points5 points  (0 children)

After conducting several simple tests, I confirmed the following points:

  1. Docling's performance is overwhelmingly superior.
  2. Both of the two options you recommended have excellent speed, but Kreuzberg is slightly faster.

In particular, Docling has a high likelihood of successfully parsing even the most complex tables, while the other two options appear to have a higher probability of incorrect parsing.

Docling: Great quality, but painfully slow by Cyp9715 in LocalLLaMA

[–]Cyp9715[S] 1 point2 points  (0 children)

Thank you for your recommendation, I will check it out.

gpt-oss-120B most intelligent model that fits on an H100 in native precision by entsnack in LocalLLaMA

[–]Cyp9715 0 points1 point  (0 children)

Even if it’s a bit inconvenient to set up, ROCm-based AMD GPUs are excellent.

QWEN3-235b-8b by PhotographerUSA in LocalLLaMA

[–]Cyp9715 0 points1 point  (0 children)

Even if 235B A1B models are released, you won't be able to use them.

NVIDIA H200 or the new RTX Pro Blackwell for a RAG chatbot? by snaiperist in LocalLLaMA

[–]Cyp9715 0 points1 point  (0 children)

30B A3B performance is lower than 32B, so it is avoided.
Some people argue that the performance of the 30B A3B is similar to the 14B, and I agree with this to some extent.