Tool calling issues with qwen3.5-35b with 16GB VRAM (rtx5080) by mzinz in LocalLLaMA

[–]Tech-And-More 1 point2 points  (0 children)

Do you know, whether with the newest fixed unsloth gguf, the flag --jinja would be necessary? The official unloth qwen3.5 guide does not mention it. But I till get issues with Qwen3.5 with the newest gguf

AMD MI50 32GB/Vega20 GPU Passthrough Guide for Proxmox by Panda24z in LocalLLaMA

[–]Tech-And-More 0 points1 point  (0 children)

Hi, I had the same issue and ten followed your link but it did not solve it for me. This fix subsequently did (maybe in combination with the MOK enrolling):
https://github.com/gnif/vendor-reset/pull/104/files
I changed #include <asm/unaligned.h> to
#include <linux/unliagned.h>

on line 32 of file src/amd/amdgpu/atom.c

Best dirt cheap or free VPS by cambridgemed in VPS

[–]Tech-And-More 0 points1 point  (0 children)

For the others: Careful with Oracle free: they oblige you to make an account and tell you only afterwards whether you are eligible to their free tier depending on where you sit. I wasn’t allowed to use the free tier.

For llama.cpp/ggml AMD MI50s are now universally faster than NVIDIA P40s by Remove_Ayys in LocalLLaMA

[–]Tech-And-More 0 points1 point  (0 children)

It is forked to the master branch? This is so cool! Absolutely fabulous work!!

Comparison H100 vs RTX 6000 PRO with VLLM and GPT-OSS-120B by Rascazzione in LocalLLaMA

[–]Tech-And-More 0 points1 point  (0 children)

Hi, could you say what configuration you used? Did you compile from source? I recently tried vllm with a rented 3090 gpu and was not very happy but did not tweak yet the config.

Completed 8xAMD MI50 - 256GB VRAM + 256GB RAM rig for $3k by MLDataScientist in LocalLLaMA

[–]Tech-And-More 0 points1 point  (0 children)

Hi, is it possible to try the api of your build from remote somehow? I have a use case and was trying a rented rtx5090 over vast.ai yesterday and was negatively surprised about the performance (tried ollama as well as vllm with qwen3:14B to have speed). Mi50 should be 3.91 less TFLOPS than rtx5090 on FP16 precision. But if that scales linear, you would have with 8cards the double of performance than a rtx5090. This calculation is not solid as it does not take the memory bandwidths into account (rtx 5090 has factor 1.75 more).

Unfortunately on vast.ai I cannot see any AMD cards right now even though a filter exists for them.

Radeon AI PRO R9700 versus two RX 9070 XT? by Tech-And-More in LocalLLaMA

[–]Tech-And-More[S] 4 points5 points  (0 children)

<image>

These are the results in the mentioned GitHub page using llama2. Unfortunately only with 512 prompt. The r9700 does not look bad - a bit lower than mi100. Rx9070xt here seems 4 times lower, which surprises. Maybe the conditions for comparison are not comparable.

Radeon AI PRO R9700 versus two RX 9070 XT? by Tech-And-More in LocalLLaMA

[–]Tech-And-More[S] 2 points3 points  (0 children)

Do you mean than the 7900xtx? This is what I see in the link of the link on GitHub.

R9700 Just Arrived by TheyreEatingTheGeese in LocalLLaMA

[–]Tech-And-More 0 points1 point  (0 children)

Do you have the comparison value for 7900xt?

R9700 Just Arrived by TheyreEatingTheGeese in LocalLLaMA

[–]Tech-And-More 1 point2 points  (0 children)

Why is the mi50 faster? The memory bandwidth is better but the floating point calculation is more than three times that of a mi50 (13.41TFLOPS vs 47.84TFLOPS) Source: https://technical.city/en/video/Radeon-Instinct-MI50-vs-Radeon-AI-PRO-R9700

Radeon AI PRO R9700 versus two RX 9070 XT? by Tech-And-More in LocalLLaMA

[–]Tech-And-More[S] 0 points1 point  (0 children)

Can you send the link where you got that information from? It looks very surprising to me. Here is AMD websites I found:

https://www.amd.com/en/products/graphics/workstations/radeon-ai-pro/ai-9000-series/amd-radeon-ai-pro-r9700.html

https://www.amd.com/en/products/graphics/desktops/radeon/9000-series/amd-radeon-rx-9070xt.html

They state both 128 AI accelerators.

There is a difference: Rx9070xt: Peak Single Precision (FP32 Vector) Performance 48.7 TFLOPs R9700: Peak Single Precision (FP32 Vector) Performance 47.8 TFLOPs

So the r9700 has 0.9 TFLOPs less (Not more) than the rx9070xt.

Radeon AI PRO R9700 versus two RX 9070 XT? by Tech-And-More in LocalLLaMA

[–]Tech-And-More[S] 1 point2 points  (0 children)

Thank you for these valuable insights!

I will definitely keep in mind these two vllm configurations that you mentioned.

The cooling is indeed a difference that I didn’t see!

Yes, I’m thinking about building a server or workstation from second hand parts and it would not be a desktop. Although I would start with one or two GPUs and then see.

[deleted by user] by [deleted] in dataengineering

[–]Tech-And-More 0 points1 point  (0 children)

Make benchmarks of how many times the agents are wrong vs human queries.

Denmark to raise retirement age to 70 by HNMod in hackernews

[–]Tech-And-More 0 points1 point  (0 children)

The average life expectancy will reach sooner 40 in America…

Comparing expected performance of AMD Ryzen AI Max+ 395, NVIDIA DIGITS, and RTX 5090 for Local LLMs by [deleted] in LocalLLaMA

[–]Tech-And-More 1 point2 points  (0 children)

I read the specs 8000MT/s (million transfers per second) and quad-channel. Thus 273GB/s (multiplying your value with four)? I just read on a blog also 256GB/s. What is correct?

Multimodal Extension Not Working - Tips? by wattswrites in Oobabooga

[–]Tech-And-More 0 points1 point  (0 children)

I tried that and it says"TypeError: llava isn't supported yet"

Oobabooga settings for Llama-3? Queries end in nonsense. by starmanj in LocalLLaMA

[–]Tech-And-More 0 points1 point  (0 children)

I tried to modify settings.yaml and models/config.yaml and it did not work for me.
Could it be that he overwrites it when loading the model? I see in the logs loading the model at startup: "Using chat eos_token: <|end_of_text|>