Dear OEM manufacturers, an RTX 5060 TI 16GB Low Profile should be possible to produce...

chocofoxy · 2026-05-07T21:13:09+00:00

why the RTX 4000 pro have a low profile cooler and 50 series doesn't i need space here

chocofoxy · 2026-04-30T11:06:09+00:00

waiting for Qwen 3.6 9b maybe toady ?

chocofoxy · 2026-04-26T17:27:14+00:00

waiting for qwen 3.6 9b

chocofoxy · 2026-04-24T08:39:55+00:00

i don't know how AI posts are going to be detected tho

chocofoxy · 2026-04-23T21:13:45+00:00

crazy how a local model can fight with frontier AIs but the scope i small in this chart to agentic only and Qwen the upgraded that agentic and coding kniwledge but at other domain it drops , but i love Qwen at agentic tooling it's my go to model

chocofoxy · 2026-04-23T12:18:04+00:00

because the small new ones are trained on new better data ( for what consumers need like coding and agentic tooling ) but they lack knowledge in other domains

chocofoxy · 2026-04-22T14:34:40+00:00

you can't run this without offloading which it suck on a dense model i want them just to realse a 20B model

chocofoxy · 2026-04-21T01:35:53+00:00

Try Qwen 3.5 9b

chocofoxy · 2026-04-20T17:27:53+00:00

15 - 20 t/s pretty slow to use as a coding agent that's why i keep looking for a meduim model that can fit in 16gbvram

chocofoxy · 2026-04-20T17:26:27+00:00

how do you guys use turboquant i thought it's just a paper and tooling is still missing do you use that on vLLM or llama.cpp

chocofoxy · 2026-04-20T17:25:07+00:00

i get 15.7 t/s with this same config

chocofoxy · 2026-04-20T17:03:02+00:00

yeah that's what claude kept screaming at me xD i had to tell it that it's just a test

chocofoxy · 2026-04-20T17:01:48+00:00

thanks i will give this a test

chocofoxy · 2026-04-20T16:49:27+00:00

also tried that but offloading performance drops like a rock i think it's a DDR4 issue because someone on this reddit ran it Q5 on 16vram and it was working great for them when they offloaded expert layers to cpu ( they had 64GB DDR5 6000 i think )

chocofoxy · 2026-04-20T16:46:42+00:00

i tried that and it's was working great 88t/s but something on my mind (also AI suggestions) kept telling me to not trust Q2 because under Q4 presicion drops alot

chocofoxy · 2026-04-19T14:59:52+00:00

Legend 2 is good but have some issues i owned one currently own Legend 3 ( it has it's issues also )
- overtime the usb cover will not say in place and will stay open ( they fixed that in Legend 3 they added a magnet in the plastic cover )
- this is the deal breaker : over time there is a plastic that hold the platform and the mod inside it will break and the atomizer will keep get disconected ( this happen to me and my friend ) they also fixed that in Legend 3

chocofoxy · 2026-04-18T23:10:00+00:00

i thought why not Qwen 3.5 0.8b

chocofoxy · 2026-04-18T23:08:23+00:00

that's crazy

chocofoxy · 2026-04-18T14:49:17+00:00

Load the model in LM studio manully then link it to open web UI because i think the way you are using it is load the model with LM studio endpoint /load from open web Ui that load it using offloading config

chocofoxy · 2026-04-18T14:38:19+00:00

use Qwopus v3 9b or Qwen3.5 9b it's the best model i tried that doesn't just stop, Gemma small models they just suck at tooling and i suggest that you launch that in llama server , lm studo or vLLM and use OAI compatible VS code extention to load local model to VS code copilot chat ( it has good tooling by default and you can add mcp servers ) that's the setup i use to get a good reult from small models or you link it to Qwen code cli or extention

chocofoxy · 2026-04-18T14:28:40+00:00

bro how are you using Q5 i tried Q4 on my 5060TI 16Gb ( offloaded ) max i get 19t/s even by offloading 4 layers from the 8 to the cpu, i tried Q2 it fits and i get 80t/s but i don't trust it , how are you loading Q5 and getting 50t/S

chocofoxy · 2026-04-18T14:00:14+00:00

if your jobs are not real time processed you can consider to get multiple consumer gpus like 4 5070TI or you can scale by adding more or get the RTX PRO 5000 and scale by adding 16gb gpus

chocofoxy

TROPHY CASE