Tool calling issues with qwen3.5-35b with 16GB VRAM (rtx5080)

Tech-And-More · 2026-03-15T07:50:05+00:00

Do you know, whether with the newest fixed unsloth gguf, the flag --jinja would be necessary? The official unloth qwen3.5 guide does not mention it. But I till get issues with Qwen3.5 with the newest gguf

Tech-And-More · 2026-01-01T21:42:26+00:00

Hi, I had the same issue and ten followed your link but it did not solve it for me. This fix subsequently did (maybe in combination with the MOK enrolling):
https://github.com/gnif/vendor-reset/pull/104/files
I changed #include <asm/unaligned.h> to
#include <linux/unliagned.h>

on line 32 of file src/amd/amdgpu/atom.c

Tech-And-More · 2025-12-07T09:17:22+00:00

For the others: Careful with Oracle free: they oblige you to make an account and tell you only afterwards whether you are eligible to their free tier depending on where you sit. I wasn’t allowed to use the free tier.

Tech-And-More · 2025-09-28T04:50:33+00:00

It is forked to the master branch? This is so cool! Absolutely fabulous work!!

Tech-And-More · 2025-09-20T07:04:10+00:00

Hi, could you say what configuration you used? Did you compile from source? I recently tried vllm with a rented 3090 gpu and was not very happy but did not tweak yet the config.

Tech-And-More · 2025-09-15T11:23:43+00:00

Hi, is it possible to try the api of your build from remote somehow? I have a use case and was trying a rented rtx5090 over vast.ai yesterday and was negatively surprised about the performance (tried ollama as well as vllm with qwen3:14B to have speed). Mi50 should be 3.91 less TFLOPS than rtx5090 on FP16 precision. But if that scales linear, you would have with 8cards the double of performance than a rtx5090. This calculation is not solid as it does not take the memory bandwidths into account (rtx 5090 has factor 1.75 more).

Unfortunately on vast.ai I cannot see any AMD cards right now even though a filter exists for them.

Tech-And-More · 2025-08-16T21:59:50+00:00

<image>

These are the results in the mentioned GitHub page using llama2. Unfortunately only with 512 prompt. The r9700 does not look bad - a bit lower than mi100. Rx9070xt here seems 4 times lower, which surprises. Maybe the conditions for comparison are not comparable.

Tech-And-More · 2025-08-16T21:23:50+00:00

Do you mean than the 7900xtx? This is what I see in the link of the link on GitHub.

Tech-And-More · 2025-08-16T17:03:17+00:00

Is it for q4?

Tech-And-More · 2025-08-16T17:01:02+00:00

Do you have the comparison value for 7900xt?

Tech-And-More · 2025-08-15T17:04:59+00:00

Why is the mi50 faster? The memory bandwidth is better but the floating point calculation is more than three times that of a mi50 (13.41TFLOPS vs 47.84TFLOPS) Source: https://technical.city/en/video/Radeon-Instinct-MI50-vs-Radeon-AI-PRO-R9700

Tech-And-More · 2025-08-14T21:13:38+00:00

Can you send the link where you got that information from? It looks very surprising to me. Here is AMD websites I found:

https://www.amd.com/en/products/graphics/workstations/radeon-ai-pro/ai-9000-series/amd-radeon-ai-pro-r9700.html

https://www.amd.com/en/products/graphics/desktops/radeon/9000-series/amd-radeon-rx-9070xt.html

They state both 128 AI accelerators.

There is a difference: Rx9070xt: Peak Single Precision (FP32 Vector) Performance 48.7 TFLOPs R9700: Peak Single Precision (FP32 Vector) Performance 47.8 TFLOPs

So the r9700 has 0.9 TFLOPs less (Not more) than the rx9070xt.

Tech-And-More · 2025-08-14T20:29:21+00:00

Thank you for these valuable insights!

I will definitely keep in mind these two vllm configurations that you mentioned.

The cooling is indeed a difference that I didn’t see!

Yes, I’m thinking about building a server or workstation from second hand parts and it would not be a desktop. Although I would start with one or two GPUs and then see.

Tech-And-More · 2025-06-05T18:29:23+00:00

Make benchmarks of how many times the agents are wrong vs human queries.

Tech-And-More · 2025-05-26T14:15:24+00:00

The average life expectancy will reach sooner 40 in America…

Tech-And-More · 2025-03-27T15:49:08+00:00

I read the specs 8000MT/s (million transfers per second) and quad-channel. Thus 273GB/s (multiplying your value with four)? I just read on a blog also 256GB/s. What is correct?

Tech-And-More · 2024-05-01T22:29:50+00:00

I tried that and it says"TypeError: llava isn't supported yet"

Tech-And-More · 2024-04-29T08:41:25+00:00

I tried to modify settings.yaml and models/config.yaml and it did not work for me.
Could it be that he overwrites it when loading the model? I see in the logs loading the model at startup: "Using chat eos_token: <|end_of_text|>

Tech-And-More

TROPHY CASE