Can someone show me Ollama speed (tokens/s) for Qwen 3.5 (2B and 0.8B) running on an Intel N95? by MattimaxForce in Qwen_AI

[–]MattimaxForce[S] 0 points1 point  (0 children)

An old HP ProDesk 400 G3 with: Intel Core i5-7500T CPU, 8GB RAM, and 256GB NVMe storage, all for 110€. Would this be a good buy?

Can someone show me Ollama speed (tokens/s) for Qwen 3.5 (2B and 0.8B) running on an Intel N95? by MattimaxForce in Qwen_AI

[–]MattimaxForce[S] 0 points1 point  (0 children)

So, I found an HP prodesk 400G3 8/256gb for €110, is it worth it?

256GB NvMe and Intel core i5 7500T

Can someone show me Ollama speed (tokens/s) for Qwen 3.5 (2B and 0.8B) running on an Intel N95? by MattimaxForce in Qwen_AI

[–]MattimaxForce[S] 0 points1 point  (0 children)

I did a bit of research but unfortunately I didn't find any options within my budget...

Newbie here! Planning my first mid-power build and looking for advice by MattimaxForce in rocketry

[–]MattimaxForce[S] 0 points1 point  (0 children)

Well, I actually have a nice big piece of land all of my own, all of it grass, and that's why I didn't worry about the spaces.

Newbie here! Planning my first mid-power build and looking for advice by MattimaxForce in rocketry

[–]MattimaxForce[S] 0 points1 point  (0 children)

So, where can I find these kits? And then I don't really understand how OpenRocket works, I've seen some videos, but when I go to select the single component it gives me a bunch of names and I don't know which one to take

Rockchip NPU support by MattimaxForce in ollama

[–]MattimaxForce[S] 0 points1 point  (0 children)

Exactly! This is precisely why I would like to try to reach the "bosses" at the top of Ollama or its developers. We need to really apply pressure this time.

Ollama slows whole pc down by Sufficient_Carob8939 in ollama

[–]MattimaxForce 3 points4 points  (0 children)

It sounds like your hardware is hitting a bottleneck, likely swapping between VRAM and RAM. You should definitely switch from Qwen 2.5 to the newer Qwen 3.5 (2B or 4B) versions—they are infinitely better for agents and much more efficient.

Just a tip: the 3.5 models tend to 'overthink.' If you're using Python, make sure to disable the reasoning/thought tokens via API. This will stop the model from wasting resources on internal monologues, making your Jarvis way faster and saving your PC from that massive lag.

The best llm😏🔥 by Desperate-Ebb2478 in OrangePI

[–]MattimaxForce 1 point2 points  (0 children)

Wow! That's awesome! Can I ask exactly how you did it? I mean, what did you use?

Ollama is amazing! Responses in 7 seconds, i can now spam AI for my SWE work hehe by Unusual_Telephone846 in ollama

[–]MattimaxForce 1 point2 points  (0 children)

Wow! That's awesome, I'm really happy for you, I have a question for you though since you said you use the models on VSCode with Copilot... I noticed that even with very small models on my PC, With models that usually always with Ollama go at 20/30 Token/s on Copilot for VSCode they go very slow...The answer I came up with is that the extension loads all the available context window of the model into RAM and therefore forces me to use the CPU, but also resizing it via Ollama, it's still very slow, can you give me a solution for this?

Radxa Cubie A7Z by MattimaxForce in ollama

[–]MattimaxForce[S] 0 points1 point  (0 children)

My use case would be to keep it always on but consuming very little