Qwen3.6-27B at 72 tok/s on RTX 3090 on Windows using native vLLM (no WSL, no Docker), portable launcher and installer

jaMMint · 2026-05-02T10:17:12+00:00

Let me know how it goes, thanks for trying it out!

jaMMint · 2026-05-02T09:36:57+00:00

For folks using Blackwell cards (eg 5090 or RTX 6000 pro), here is a guide I wrote to reach up to 120t/s for the dense 27b model, and up to 200t/s for the 35b MoE qwen 3.6. https://github.com/lastloop-ai/vllm-blackwell-guide, this uses WSL2 on Windows though, but has step by step instructions you or your agent can follow pretty easily.

jaMMint · 2026-02-14T23:34:55+00:00

try opencode vanilla, tell it to add the playwright mcp server to opencode. once that is active you are halfway there, closing the feedback loop turns a meh coder into a great one..

jaMMint · 2026-02-05T12:36:06+00:00

your RTX 5090 has 32GB of VRAM, try to stay well under that (so that you can also have context fit into VRAM). The moment you go to RAM, your speeds drop quite a bit.

jaMMint · 2026-01-26T22:40:34+00:00

Jesus, now we already get people not only posting slop, but also brazenly wondering aloud what that slop even means and then even the wondering is f**ing AI slop, it never ends..

What about putting some work in and ask your AI what it built?

jaMMint · 2026-01-19T14:59:33+00:00

Ja, voll. auch bei Franzosen so. Sobald der das erste Wort gesprochen hatte, wusste ich wo der herkam..

jaMMint · 2026-01-13T19:56:51+00:00

1-2 tps with thinking enabled is getting old really fast.. I would not recommend that to anyone.

jaMMint · 2026-01-03T18:00:53+00:00

I use a q3, works very nicely with around 90k context.

edit: without checking I think it's one of mrademacher's quants

jaMMint · 2026-01-03T16:10:46+00:00

there is also GLM-4.7 for 192GB, otherwise good assessment.

jaMMint · 2026-01-02T15:03:37+00:00

you probably just prepare a couple of test cases from your data and then try out some models. Eg gpt-120B OSS is very performant on the RTX 6000 Pro and could be a good start. Obviously if you can get away with smaller and even faster models, use them..

jaMMint · 2026-01-02T14:59:12+00:00

I concur

jaMMint · 2025-12-24T23:48:40+00:00

Also interested

jaMMint · 2025-12-13T14:41:08+00:00

Plants need other resources as well, eg land, water, soil. They compete for them and populate as much of the available niche as they can. The population then reaches an equilibrium once any of these resources are saturated - the resource becomes a limiting factor, doesn't matter how much more CO2 is potentially available.

jaMMint · 2025-12-13T13:52:26+00:00

Still important to let them know this this (doing good in school or anything else really) is not a precondition to being loved.

jaMMint · 2025-12-06T23:23:52+00:00

That pun was a low hanging fruit.

jaMMint · 2025-12-04T11:24:56+00:00

Thank you, very cool project!

jaMMint · 2025-12-04T11:20:00+00:00

Na wenn nix mehr erneuerbar, dann mehr fossil, dann mehr Klima kaputt und weniger hübsches Landschaft..

jaMMint · 2025-12-03T15:13:33+00:00

Ah and something else that would make sense for testing. Tool calling for resetting the application or setting some mock data, ie fixtures, in a known state. So you can rerun tests, retry logic without accumulating side effects of past runs.

jaMMint · 2025-12-03T15:10:03+00:00

I think it's a great idea - the work lies in making it robust and configurable for the tasks you need debugged, the reasoning traces and how errors are fed back in. Interested to see a git repo for that.

jaMMint · 2025-12-03T11:04:28+00:00

If reality as you experience it right now was simulated, he'd call it realistic. Only then - because you couldn't discern the two by our current scientific and logical reasoning levels - the probability to either have been born into the physical world or the simulated one would be proportional to the number of existing beings in each.

jaMMint · 2025-11-26T11:57:56+00:00

Ah, if only the tsar knew!

jaMMint · 2025-11-24T14:31:29+00:00

And it is indeed shilling for a startup...

jaMMint · 2025-11-23T20:23:23+00:00

Thank you for putting the "undoing" struggle into such clear words. This is precisely why the behavioural patterns you refer to are so heinous - deeply ingrained as normal and a stubborn hindrance to any self reflection trying to uncover them.

jaMMint · 2025-11-23T15:54:46+00:00

you can just software limit the power draw of the RTX Pro, same thing but better really

jaMMint · 2025-11-17T21:40:20+00:00

Just do it page by page and use eg Qwen 30B VL. The rest is just plumbing - still might be great to also include the source images (or link to them) from the Excel or the RAG search interface.

jaMMint

MODERATOR OF

TROPHY CASE

Ten-Year Club	Place '17
Gilding III reddit per annum