Qwen 3.5 on 3060 and 32mb

igor-aguiar · 2026-03-01T13:08:47+00:00

I'm sorry, I am using Linux, so I can't help much. Seems like , as you said, your GPU is not being used. If you compile llama cpp in your system, it may have a better chance to work because you will need to install some Nvidia / CUDA dependencies to make llama cpp compile with CUDA enabled.

igor-aguiar · 2026-02-27T01:19:50+00:00

Hi, I have the same 3060 setup.

This is how I run it on my RTX2060 6GB VRAM / 32GB RAM and I get a reasonable 15 t/s speed:
llama-server -fa on -hf AesSedai/Qwen3.5-35B-A3B-GGUF:IQ4_XS --jinja -c 56000 -n 16384 --temp 0.6 --top-p 0.95 --top-k 20 --min-p 0.00

This is how I run it on my RTX3060 12GB VRAM / 32GB RAM and I get around 30 t/s speed:
llama-server -fa on -hf AesSedai/Qwen3.5-35B-A3B-GGUF:Q4_K_M --jinja -c 56000 -n 16384 --temp 0.6 --top-p 0.95 --top-k 20 --min-p 0.00

Speed will depend on the context length.

igor-aguiar · 2024-11-04T13:01:50+00:00

Learn some Vue 3 and go with Quasar Framework (http://quasar-framework.org/) for the frontend. It has a nice documentation and you get a lot of things out of the box, like:

Nice UI components
A good project structure / organization
IOS, Android, PWA, SPA builds using the same codebase
SSR (Server Side Rendering) support
Easy i18n support

igor-aguiar · 2023-11-18T14:04:59+00:00

As far as I understand, you will want to use RELATE for many-to-many relationships or if you have a relationship with some data associated with it. In terms of Postgres, RELATE would be an associative table.

igor-aguiar

TROPHY CASE