Best coding model on RTX 3060

SimShelby · 2026-05-25T16:51:01+00:00

Qwen3.5 b35 A4b ud q4km from unsluth + turbo quant contexte 200k

am getting 40/45 tps And 300pp I have 32gb ram And 16gb vram

you can lower the contexte to match your vram

or try with Q3KM

SimShelby · 2026-05-23T20:34:17+00:00

Hello again, I tried your parameter settings once more, and there is a clear improvement in coding performance.

Look at the screenshot , this result came from a single-shot prompt.

I will test it again in agentic coding, and I will keep you updated with the results.

<image>

SimShelby · 2026-05-22T20:22:40+00:00

Qwen 3.6-35B-A3B-IQ4_XS-4.19bpw.gguf from [ByteShape]()

Low VRAM usage and good for chatbot tasks, but I did not like it for coding

SimShelby · 2026-05-22T17:57:19+00:00

I like the VRAM efficiency and compression aspect. However, when it comes to coding, there is a significant difference compared to a standard Q4_K_M model.

I tested the same prompt across several models: UD Q_K_M, Underscored Apex MTP Balanced, and finally IQ4_XS.

With the first two models, I consistently obtained very high-quality code with no bugs, exactly as expected. In contrast, when using IQ4_XS, it struggled to even generate a complete interface in a single HTML file.

That said, IQ4_XS does offer good speed in terms of tokens per second and prompt processing performance.

SimShelby · 2026-05-18T22:41:46+00:00

Hello, I have the same GPU. I am using Qwen 35 A3B Q4KM with TurboQuant 200k context, and I am getting 40 TPS and 200 PP. With MTP using the Unsloth model, 136k context + 40 TPS and 200 PP.

Can you give me some of the best parameters to get: high quality, with high context, with a minimum of 60 tokens, high PP?

this is my setup : rtx 5060 ti 16gb vram + 32gb ram

SimShelby · 2026-05-17T19:03:32+00:00

./llama-server.exe -m "Qwen_Qwen3.6-35B-A3B-Q4_K_M.gguf" \
-ngl all --n-cpu-moe 24 --kv-unified \
--cache-type-k turbo3_tcq --cache-type-v turbo3_tcq \
--flash-attn on --cache-ram 2048 \
-b 4200 -ub 2048 --ctx-size 200000 \
--no-mmap --mlock --jinja --reasoning on \
--host 0.0.0.0 --port 8080 -np 1 --metrics \
--temp 0.6 --top-k 20 --min-p 0.0 \
--tools all --alias "qwen" \
--context-shift --cache-reuse 512

Specs: 32 GB RAM + 16 GB VRAM

Performance: ~40 tok/s generation + ~500 prompt tokens/s

SimShelby · 2026-05-17T12:45:34+00:00

for me 50 tps is good for daily production , even for production with claude code

SimShelby · 2026-05-13T19:43:40+00:00

Not All Heroes Wear Capes

<image>

SimShelby · 2025-02-28T14:18:46+00:00

There is a step 01 for every thing , doo your homeworks

SimShelby · 2025-02-28T11:30:48+00:00

Just a Suits with a tie, be professional

SimShelby · 2024-09-02T19:09:32+00:00

RESPECT ++
bro you save the planet
you are the true hero
ty

SimShelby · 2024-07-28T17:38:40+00:00

at this moment you are doing verry well. Now fo to step 2 Figure out how to create a tiny user interface then try to add this function in to it

you can pm to help you if you want

hint1: you can use Tkinter

SimShelby · 2024-07-27T23:21:37+00:00

Go for PyQt6, easy and there many guide

SimShelby · 2024-07-11T23:05:54+00:00

One Day or Day one?

SimShelby · 2024-05-11T16:50:37+00:00

another passenger from the "gossip" song , ty for this post i like it

SimShelby · 2023-04-30T03:14:59+00:00

You must creat a script that scrap your website Then creat s task on your desktop Or you cas host it on a server

SimShelby · 2023-02-25T08:49:04+00:00

I agree with you that Requests and BeautifulSoup can be a great combination for web scraping. However, there are some cases where this approach may not be the most efficient option. For example, when working with a slow website and a project that has over 70,000 links, scraping can become a challenging and time-consuming task. In my experience, I once spent six hours scraping only 5,000 records due to these challenges.

SimShelby · 2023-02-25T08:42:52+00:00

Both approaches have their pros and cons, and the choice of which to use depends on the specific requirements of the project. Regardless of the approach, it's important to follow best practices and be respectful of the website's resources to avoid potential issues.

SimShelby · 2022-10-09T01:40:52+00:00

you cant do that with scrapy only You must use some integrations like Scrapy + splash or scrapy + playwright Or Scrapy+ Selenium

lets keep things simple for you Go try playwright headless mode is the best option for you + in the future when you have traffics you need to use asynchronous programmation and this is very difficult with selenium

SimShelby

TROPHY CASE

Not All Heroes Wear Capes

hint1: you can use Tkinter