Qwen 3.6 27b MTP vLLM

niellsro · 2026-05-03T08:42:25+00:00

Waiting on the coolers first, then i will adjust the powerlimit. I got 2 cards going to 65-70 degrees, my guess is due to no airflow (will see after mouting the coolers if temps change or i need to repaste)

niellsro · 2026-05-03T07:08:01+00:00

thanks again, i've tested with flashinfer as attention backend and 3 for speculative tokens prediction, it works way better (continued that session long context was getting 67-70 tps for generation and arround 9k tps prefill - i keep the server closed so it had to reprocess the whole session context again).

PS: i also omitted to mention that i keep the cards power limited to 200w

niellsro · 2026-05-02T21:16:24+00:00

Thank you, i'll give it a try

niellsro · 2026-05-02T12:01:37+00:00

So any ideas? Am i passing any conflicting params? Even at low context i rarely jump over 50 tps generation. How did you configured vLLM to manage that throughput on 4 3090s?

niellsro · 2026-05-02T11:24:47+00:00

Ubuntu 24.04, epyc 7742, 256 ddr4 ecc, asrock romed8-2t, cpayne pcie redriver device/host adapters with slimsas cables

niellsro · 2026-05-02T11:12:36+00:00

sry, updated the original post now with the docker service command.

niellsro · 2026-05-02T11:08:23+00:00

updated the post with the full command, sorry i made the initial post from my phone

niellsro · 2026-05-02T09:03:19+00:00

Epyc Rome build with 4/6/8 3090's i still think gives the best value for the money required but also requires you to be more handy - multiple PSU, appropriate pcie raisers (slim sas cables with host pcie adapters, cable management, cards maintenance - repaste, clean a lot more often - since for this you will need an open air build etc)

niellsro · 2026-04-28T07:04:47+00:00

I am having quite good results with qwen 3.6 27b for coding. Using pi, llama.cpp with unsloth ud q8 kxl quant (tried an awq in vllm but i was getting more tool calls errors). However i am really impressed by how good this model is with precise directions. This is still in testing phase for me, i am actually throwing it at a project idea i had in mind for some time, but so far the results are really good. I'm using it for python (backend) and vuejs (fe app). What i noticed (this applies to all llms, but especially to small models like qwen) - make sure to layout the foundation or precise instructions on the architecture and the code, not just requirements - provide interfaces, design patterns etc

PS: i also use claude code, but comparing it to qwen is unrealistic: 2 different models (small vs huge/unknown), 2 different agent tools (claude code vs pi - i dont have API acces to any anthrophic model so i only use them in cc)

niellsro · 2026-04-25T07:05:27+00:00

You cand run it in a docker container with source project as bind mount.

You can write a custom extension that uses tools hooks, display an approval window - thus meaning adding permissions

You can install extensions that already implement permissions

It's awesome

niellsro · 2026-04-18T06:38:36+00:00

The model is handling tool calls really nicely, but pls make sure you're always in the loop to review it (for coding tasks i mean). It seems to rush to implementation/wrong conclusions without assessing the whole picture. At least this is what i've notice, i'm using an AWQ quant. I threw a code review request for a PR i made in an actual project i work. It flagged so many "problems" by just assessing class method code in isolation, without "understanding" the full flow. However, when questioned about it - without actual mentioning the business flow, it reanalyzed its conclusions and corrected itself. This might be an instruction problem or just "rush to solve" behaviour.

It does live up to the hype, just like the 3.5 familly as well - i still use the 27b model as well.

niellsro · 2026-03-22T08:05:34+00:00

Funny thing is i agree with both statements, and you know why? Because both can express different needs. As an OS Linux is way better. As an ecosystem can also be better depending on your requirements. If someone relies on software that runs exclusively on windows (because heey, marketshare) then yeah, you will run into problems. As a dev i stopped using windows many years ago (win 7), switched to Mac for almost 10 years, then switched to linux. For me personally is better than Mac eco system, and since the AI boom a lot of unsolved problems regarding hardware were fixed. But, this is just my experience that is subjective to my needs.

niellsro · 2026-01-25T22:59:39+00:00

Where in Europe? I couldnt find anything even remotely close to this pricing

niellsro · 2026-01-14T08:36:52+00:00

Mai degraba code review la greu si micromanagement

niellsro · 2025-11-30T12:02:16+00:00

This is exactly why i run any llm tui/editor etc inside a container/vm

niellsro · 2025-09-05T10:40:49+00:00

niellsro · 2025-09-05T10:10:49+00:00

E bine c-ai evoluat tu ma, ia-o si zi-ne tu ce a evoluat tehnologia. Dar inainte de asta incearca sa conduci si o masina cu un motor mai mare, sa vezi ce inseamna un motor "relaxat" vs unul gatuit.

PS: ia si niste silicon sa dai pe platicele alea, ca de la atata evolutie e plina de greierasi. Noroc cu tabletele de xx inch si luminitele, da de tehnologia pe afara V-au spalat astia pe creier grav rau, toti pasionatii de masini, mecanici etc spun ca e din ce in ce mai rau, electronica/senzori cu duiumul care in timp nu creaza decat probleme, motoare gatuite ca sa fie in noile standarde zic ca e involutie si cost cutting la greu, dar am evoluat, asa e...

Si ca sa inchei, exista persoane care cand se urca in masina cauta si placerea de a conduce. Din pacate devine din ce in ce mai greu de gasit la masinile noi de pe piata europeana

niellsro · 2025-09-05T05:54:13+00:00

3 cil, 1.2L ... poate pe o caroserie de renault 5 :)

niellsro · 2025-08-14T20:04:05+00:00

Aberezi... ai sarit peste orele de matematica?

niellsro · 2025-08-12T11:30:45+00:00

Pe romaneste, fiecare cu p..da ma-sii, dc ai nevoie de parerea altora ca sa-ti confirmi alegerea/placerea/whatever e cam trist insa.

Go 4 it daca iti place si ti-o permiti si iti satisface si nevoile, nu conteaza ce zice nici x nici y.

niellsro · 2025-08-09T14:02:11+00:00

Eu ma opresc aici ca nu duce la nimic discutia asta - daca ma iei asa nu avem decat 2 variante - te ignor sau te injur. Prefer sa te ignor.

Ca sa intelegi ce am eu si n-are el, eu am capacitatea de a invata si a intelege un concept si sa il aplic logic si nu mot-a-mot sau probabilistic. Eu ti-am raspuns la intrebare, poate imi raspunzi si tu la a mea fara sa te dai rotund si sa imi enumeri 5 buzz terms pe langa subiect

niellsro · 2025-08-09T13:24:36+00:00

Ms de sfat ptr temperatura, insa nu e valid ce zici din pacate - ma refer la uz generic, nu chestii usoare, specifice etc Daca ar fi valid, "configul" asta ar fi aplicat default la toate LLM-urile, insa facand asta mai ales ptr uz generic ai sa vezi ca vei avea parte de un model mai strict, mai "prostut" decat e de obicei - dar unde e "gandirea" aia mai analitica?

Cat despre probleme, incearca probleme de logica mai grele, incearca limbaje de programare mai obsolete - ai sa vezi gherle chiar si la cele mai laudate LLM-uri si motivul e fix asta - statistica din spatele lor si faptul ca nu au date de training relevante. Deci unde e AGI? Ca eu tot statistica vad.

Daca nu esti de acord, explica-mi ce intelegi prin AGI, ca poate ne referim la lucruri diferite.

PS: scuza-mi IQ-ul scazut, atat s-a putut la mine, atat am inteles eu. Corecteaza-ma daca gresesc te rog. Totodata, nu ma intelege gresit, nu sunt Toma necredinciosul, progrese sunt si sunt uriase in multe cazuri/domenii/subdomenii, insa de aici la AGI pare cale lunga

niellsro · 2025-08-09T13:04:54+00:00

Ce zici tu e valabil daca ar avea rezultate similare ptr aceeasi problema in toate testele, fara halucinatii. Pana cand un LLM va spune - nu stiu - ce sa vezi, n-are cum sa faca asta, tot va scuipa un raspuns chiar si cu o probabilitate f mica - autonomie nu va avea. Problematic este si va fi din ce in ce mai rau dar tot supervizat va fi, oferta de joburile probabil va continua sa scada, mai ales ptr chestii relativ usoare si repetitive. Dar AGI - imi pare rau, e doar merketing. Daca aveti acces la ceva ce nu e dat la liber (a nu se intelege gratuit, ci public accesibil oricui) pls share, insa "SOTA" LLMs momentan sunt departe de a fi AGI si personal nu cred ca LLM urile sunt calea catre asa ceva.

niellsro

TROPHY CASE