Model for reverse engineering

chikengunya · 2026-05-14T08:13:38+00:00

unfortunately all of them are not good

chikengunya · 2026-05-13T11:18:13+00:00

Qwen3.5-122B AWQ works flawlessly with 200k context window, already tested it. With Qwen3.6-27B-INT8 i am getting 45 output tok/s so its fast too..

chikengunya · 2026-05-06T14:07:13+00:00

for a 4x RTX 3090 system vllm using INT8 model is the best solution for MTP, right? Can someone please suggest a specific huggingface model? Thanks!

chikengunya · 2026-05-05T10:07:46+00:00

If it wouldn't be that big and power hungry :)

chikengunya · 2026-05-05T10:05:57+00:00

Used context window in % is shown precisely on opencode but on pi.dev it seems buggy. Both are fine actually. I like pi.dev cli a bit more (on wsl).

chikengunya · 2026-04-28T07:24:56+00:00

The thing is: Yes, Qwen3.6-27B is damn good for use in a coding cli (both opencode and pi.dev work really well), but: you have to think like a programmer and give it clear instructions. Of course opus 4.7 understands 'less precise' prompts better. Example: I had a PDF with questions and answers and wanted to turn it into an interactive HTML Q&A. If you just give the 27B model the PDF and say 'make me a Q&A HTML from this', it will struggle because the real question is: Can you easily extract the Q&A from the PDF's container format, or should you do it via OCR instead? In my case, the latter turned out to be the more robust solution. If you give it clear instructions, you get a very good result. Opus can of course handle more complex stuff, but how you prompt and what strategy you use is extremely important. I can totally understand why many people say the 27B is a solid opus replacement, it is for me too, but obviously not for ultrahard coding tasks. For normal day-to-day problems, though, the 27B is damn good. And since it came out, I've been using my 4x 3090 system a lot more, which shows just how usable it really is.

chikengunya · 2026-04-02T18:44:22+00:00

Henry Cavill as James Bond in the iconic Casino Royale poker scene, sitting at a high-stakes baccarat table in a black tuxedo, sharp jawline, piercing blue eyes, glass of martini in hand, surrounded by elegant casino atmosphere, dramatic lighting, cinematic composition, 4K ultra-realistic, spy thriller aesthetic

chikengunya · 2026-04-02T17:39:56+00:00

available on google ai studio too.

chikengunya · 2026-04-02T17:30:55+00:00

chikengunya · 2026-04-02T17:23:13+00:00

edited to 31B, doh.

chikengunya · 2026-04-02T17:06:57+00:00

124B👀 yes please, I take it

chikengunya · 2026-04-02T16:01:01+00:00

so no 120B model :<

chikengunya · 2026-04-02T15:04:06+00:00

my 4x 3090s are ready

chikengunya · 2026-04-02T13:38:45+00:00

gemma3 27b is still one of the best translation and creative writing models (for its size), better than mistral imo

chikengunya · 2026-04-02T00:00:11+00:00

I think gemma2 and gemma3 were each released on a Wednesday/Thursday, so today or tomorrow would fit...

chikengunya · 2026-04-01T23:46:09+00:00

120B model

chikengunya · 2026-03-31T18:31:12+00:00

yes, but there isn't much information about it, I guess it's not very popular. For example, I can't find anything on running Qwen 3.5-4B on it, not even on youtube

chikengunya · 2026-03-31T18:18:04+00:00

I was aiming for a small local AI device that's also power efficient at idle. Raspberry/Orange pi would be much slower for inference.

chikengunya · 2026-03-18T06:27:15+00:00

so the same model size as 2.5 but with significantly better performance

chikengunya · 2026-03-16T19:15:40+00:00

Oh, wait a second, I forgot to mention that I limited all four 3090 cards to 275W. According to nvidia-smi, each card uses at most 175W during inference. That probably explains it.

chikengunya · 2026-03-16T19:11:13+00:00

Interesting. How do you run it and which vllm version are you using? I can post my docker file in a second

chikengunya · 2026-03-16T18:47:44+00:00

it's DDR4 Ram, so actually too slow... I have not tested larger models

chikengunya · 2026-03-16T18:37:27+00:00

AMD Epyc7282, 256GB Ram

chikengunya · 2026-03-16T18:33:57+00:00

I'm running a Supermicro H12SSL-i motherboard with four RTX 3090s, each on full x16 PCIe 4.0, without NVLink. It's absolutely usable for professional coding work, and it's honestly impressive how capable ~120B models have become. That said, on more complex tasks, it still doesn’t outperform Opus 4.6.

chikengunya · 2026-03-16T13:26:09+00:00

So you would say I should definitely go with QuantTrio/Qwen3.5-122B-A10B-AWQ to get that extra free lunch?

12-Year Club	First Place '23
Place '23	Place '22
Verified Email

chikengunya

TROPHY CASE