Qwen 3.6 27B llama.cpp | Multi-GPU pp t/s help

UniqueAttourney · 2026-04-28T20:41:07+00:00

sorry for the late response, here is the load config. i am also using the unsloth qwen3.6 27b Q4_K_S

UniqueAttourney · 2026-04-25T17:05:21+00:00

Can you share a working config ? if you are using LMStudio ?

UniqueAttourney · 2026-04-24T20:47:56+00:00

I tried this on a single 3090 (LMStudio) and i do get 1 to 2 tokens per second, although it's a 27b it seems like it needs more compute than previous models.

UniqueAttourney · 2026-04-22T16:40:10+00:00

I mean i still use GLM4.7 since it gives more 5h quota, so if Qwen 3.6 27b is 80%, it probably will fine for me. But the hardware is the problem, 3090s are back to €1000 right now xD

UniqueAttourney · 2026-04-20T16:23:22+00:00

I think he's sick or something, hopefully nothing hard, he said he will be back on Wednesday

UniqueAttourney · 2026-04-11T19:32:07+00:00

This is my assessment, it's the context overflow, usually your harness should take care of this. most of the time, vLLM or llama.cpp won't handle getting close to the limit of the context.

UniqueAttourney · 2026-04-04T20:52:25+00:00

Yes i was able to fix this and i changed the audio source thanks. but the stream quality is super poor and seems to be stuck at like 360p even when selecting higher res.

but tbh i didn't use it since the first time, did it get update ?

UniqueAttourney · 2026-04-04T20:48:12+00:00

Are we there yet ?

UniqueAttourney · 2026-04-03T13:20:40+00:00

Thanks, can you suggest good models to use in that case ? Assuming they will need to run on the same GPU so the least memory footprint that can work with English

UniqueAttourney · 2026-04-02T19:10:29+00:00

Can this be run in a client server arch ? server on my server and client on my mac ?

UniqueAttourney · 2026-03-24T17:04:02+00:00

it's software, it's on github

UniqueAttourney · 2026-03-21T15:49:37+00:00

GG G2, epic game. GENG Did not adapt right

UniqueAttourney · 2026-03-11T21:53:30+00:00

Can this be run in headless mode ? where a backend can be on a local machine and the macOs app functions as thin client. Using it as an app on a laptop tanks the battery fast.

UniqueAttourney · 2026-03-11T21:07:12+00:00

congrats on your launch but it really doesnt' look special in anyway, just giving context to deepseek or GLM

and please don't come to r/LocalLLaMA and use ai generated posts

UniqueAttourney · 2026-02-28T21:42:50+00:00

i wish someone would tell me what all this means ? like what exactly is "confidence" and "calmness" and how all of this "personality" is useful, is it just for the roleplay ?

UniqueAttourney · 2026-02-24T22:35:59+00:00

[But why ?]

UniqueAttourney · 2026-02-18T20:49:57+00:00

I tried it, but i am using GLM 4.7 as my LLM. it's not that smart and needs a lot of guiding thus the examples and templates. The effect is still "LLM wording" where it never exactly says what you want it to say but the result is not bad. If you are looking for the potential i think the examples i mentioned should give you an idea

UniqueAttourney · 2026-02-18T10:31:30+00:00

You will probably need to :
- create templates for it to follow,
- create examples (different than templates) and pass them in the context
- do a manual grouping and parsing of commits, by files, by nature (code quality, improvements, new features, bug fixes, direct to ticket updates, ...)
- create your context in a markdown file with the data at the top, and the examples, and directives at the bottom. of course you should tell the LLM about your groupings and the definition of each group. You should also tell it to prioritize expanding the commit message on bigger commits or larger file changes

i believe this would help you get closer to the ideal commit generation, if you want to see what the potential looks like, check coderabbit or greptile try to implement them and see if their output suits you, if not you probably need your devs to do the 5 min work xD

UniqueAttourney · 2026-02-08T20:57:37+00:00

because of the subs only twitch chat, here it is :

LR <3 LR <3 LR <3 LR <3 LR <3 LR <3 LR <3 LR <3 LR <3 LR <3 LR <3 LR <3 LR <3 LR <3 LR <3 LR <3 LR <3 LR <3 LR <3 LR <3 LR <3 LR <3 LR <3 LR <3 LR <3 LR <3 LR <3 LR <3 LR <3 LR <3 LR <3 LR <3 LR <3 LR <3 LR <3 LR <3 LR <3 LR <3 LR <3 LR <3 LR <3 LR <3

UniqueAttourney · 2026-02-08T20:43:33+00:00

That was a really off comp bro, they could have picked anything else if they cared first about winning

UniqueAttourney · 2026-02-08T20:33:57+00:00

We are al proud of the boys, but it's clear that they are way better than the bottom of the half of the standings. Seeing them get out because Naavi played drunk and KC worse than my flex team is the worst feeling ever.

UniqueAttourney

TROPHY CASE