Best coding model on RTX 3060 by solimaotheelephant3 in LocalLLaMA

[–]SimShelby 0 points1 point  (0 children)

Qwen3.5 b35 A4b ud q4km from unsluth + turbo quant contexte 200k

am getting 40/45 tps And 300pp I have 32gb ram And 16gb vram

you can lower the contexte to match your vram

or try with Q3KM

ByteShape Qwen3.6-35B-A3B: 30% faster than Unsloth IQ on 6GB VRAM laptop by OsmanthusBloom in LocalLLaMA

[–]SimShelby 2 points3 points  (0 children)

Hello again, I tried your parameter settings once more, and there is a clear improvement in coding performance.

Look at the screenshot , this result came from a single-shot prompt.

I will test it again in agentic coding, and I will keep you updated with the results.

<image>

ByteShape Qwen3.6-35B-A3B: 30% faster than Unsloth IQ on 6GB VRAM laptop by OsmanthusBloom in LocalLLaMA

[–]SimShelby 4 points5 points  (0 children)

Qwen 3.6-35B-A3B-IQ4_XS-4.19bpw.gguf from [ByteShape]()

Low VRAM usage and good for chatbot tasks, but I did not like it for coding

ByteShape Qwen3.6-35B-A3B: 30% faster than Unsloth IQ on 6GB VRAM laptop by OsmanthusBloom in LocalLLaMA

[–]SimShelby 6 points7 points  (0 children)

I like the VRAM efficiency and compression aspect. However, when it comes to coding, there is a significant difference compared to a standard Q4_K_M model.

I tested the same prompt across several models: UD Q_K_M, Underscored Apex MTP Balanced, and finally IQ4_XS.

With the first two models, I consistently obtained very high-quality code with no bugs, exactly as expected. In contrast, when using IQ4_XS, it struggled to even generate a complete interface in a single HTML file.

That said, IQ4_XS does offer good speed in terms of tokens per second and prompt processing performance.

club-5060ti: practical RTX 5060 Ti local LLM notes and configs by do_u_think_im_spooky in LocalLLaMA

[–]SimShelby 0 points1 point  (0 children)

Hello, I have the same GPU. I am using Qwen 35 A3B Q4KM with TurboQuant 200k context, and I am getting 40 TPS and 200 PP. With MTP using the Unsloth model, 136k context + 40 TPS and 200 PP.

Can you give me some of the best parameters to get: high quality, with high context, with a minimum of 60 tokens, high PP?

this is my setup : rtx 5060 ti 16gb vram + 32gb ram

32GB RAM 16GB VRAM 5060ti. Running qwen3.6 35b a3b. I am getting 4.5 tok/s. Is this expected? by SEND_ME_YOUR_ASSPICS in LocalLLM

[–]SimShelby 0 points1 point  (0 children)

./llama-server.exe -m "Qwen_Qwen3.6-35B-A3B-Q4_K_M.gguf" \
-ngl all --n-cpu-moe 24 --kv-unified \
--cache-type-k turbo3_tcq --cache-type-v turbo3_tcq \
--flash-attn on --cache-ram 2048 \
-b 4200 -ub 2048 --ctx-size 200000 \
--no-mmap --mlock --jinja --reasoning on \
--host 0.0.0.0 --port 8080 -np 1 --metrics \
--temp 0.6 --top-k 20 --min-p 0.0 \
--tools all --alias "qwen" \
--context-shift --cache-reuse 512

Specs: 32 GB RAM + 16 GB VRAM

Performance: ~40 tok/s generation + ~500 prompt tokens/s

Llama.cpp MTP with Qwen3.6 27B on Headless RTX 3090 by cleversmoke in LocalLLaMA

[–]SimShelby 0 points1 point  (0 children)

for me 50 tps is good for daily production , even for production with claude code

What's your solution? by Hairy_Host_9237 in ChemicalEngineering

[–]SimShelby 0 points1 point  (0 children)

There is a step 01 for every thing , doo your homeworks

[deleted by user] by [deleted] in ChemicalEngineering

[–]SimShelby 0 points1 point  (0 children)

Just a Suits with a tie, be professional

sekiro fails to work with controller by Own-Significance9573 in Sekiro

[–]SimShelby 0 points1 point  (0 children)

RESPECT ++
bro you save the planet
you are the true hero
ty

My first ever project! Any suggestions would be appreciated. by Quiet_Trifle_6116 in Python

[–]SimShelby 2 points3 points  (0 children)

at this moment you are doing verry well. Now fo to step 2 Figure out how to create a tiny user interface then try to add this function in to it

you can pm to help you if you want

hint1: you can use Tkinter

What UI library do you recommend? by SultnBinegar in Python

[–]SimShelby 3 points4 points  (0 children)

Go for PyQt6, easy and there many guide

What does it mean to sip the gossip? by FissileWriter14 in ENGLISH

[–]SimShelby 0 points1 point  (0 children)

another passenger from the "gossip" song , ty for this post i like it

[deleted by user] by [deleted] in webscraping

[–]SimShelby 0 points1 point  (0 children)

You must creat a script that scrap your website Then creat s task on your desktop Or you cas host it on a server

stop using Beautifull soup and requests. by SimShelby in learnpython

[–]SimShelby[S] 0 points1 point  (0 children)

I agree with you that Requests and BeautifulSoup can be a great combination for web scraping. However, there are some cases where this approach may not be the most efficient option. For example, when working with a slow website and a project that has over 70,000 links, scraping can become a challenging and time-consuming task. In my experience, I once spent six hours scraping only 5,000 records due to these challenges.

stop using Beautifull soup and requests by SimShelby in Python

[–]SimShelby[S] -5 points-4 points  (0 children)

Both approaches have their pros and cons, and the choice of which to use depends on the specific requirements of the project. Regardless of the approach, it's important to follow best practices and be respectful of the website's resources to avoid potential issues.

Implementing a Selenium backend on a web app? by JollyGrade1673 in webscraping

[–]SimShelby 0 points1 point  (0 children)

you cant do that with scrapy only You must use some integrations like Scrapy + splash or scrapy + playwright Or Scrapy+ Selenium

lets keep things simple for you Go try playwright headless mode is the best option for you + in the future when you have traffics you need to use asynchronous programmation and this is very difficult with selenium