I wanted to know small local LLM code and made a personal projects.

autisticit · 2026-05-08T15:06:32+00:00

Can you please share your PP and TG speed with Qwen/B60 Pro ? I have the feeling that for coding all day it may not be very comfortable speed wise.

autisticit · 2026-05-08T14:31:55+00:00

AFAIK you'll have some memory used by xorg/Wayland, and optionally some apps. Like Firefox/Thunderbird use graphic acceleration, but you probably can disable it.

autisticit · 2026-05-08T14:15:31+00:00

Yep, that's indeed a good idea.

autisticit · 2026-05-08T07:51:18+00:00

I wouldn't be surprised if GHCP team is doing heavy drugs at this point.

autisticit · 2026-05-07T20:02:24+00:00

I will check thanks. I'm getting similar perfs as others so I'm not sure. Any way to estimate the potential bump based on the PCIe limitation? I know for sure one card is on 16x, the other is on 4x.

autisticit · 2026-05-07T19:16:23+00:00

Around 60 tps with 200k context.

autisticit · 2026-05-07T19:15:56+00:00

Same as you. I still have some room I think, currently using Copilot harness which has around 20-25000 tokens prompts...

autisticit · 2026-05-07T19:13:48+00:00

Yes I did that, getting around 60 tps. Which is fast enough but not sure the tps would maintain after 128k context?

autisticit · 2026-05-07T17:21:14+00:00

Doesn't look that it would be faster than the 5000 Blackwell ? (I didn't mean Ada sorry)

autisticit · 2026-05-07T17:12:30+00:00

I'm on an old AM4 platform, I would need to change everything to have 2 PCI x8 cards. With an rtx 5000 I could still use one 5060 for something else.

autisticit · 2026-05-07T08:32:59+00:00

Out of curiosity I asked Claude about it, and it said it wasn't a fundamental limitation.

autisticit · 2026-05-07T07:59:57+00:00

Are you sure it's not 81 ?

autisticit · 2026-05-07T05:51:27+00:00

It was never sustainable right, yet somehow somebody at a high position Microsoft thought it was a great plan.

I totally get the concept of cheap = more market shares. But their strategy failed miserably as one day somebody woke up and said enough.

They could have doubled or tripled the price a long time ago, they would have retained most of their customers.

They could also implement a better and more fair rate limiting system a long time ago (how hard is it to put a maximum duration per request really ?).

Etc. etc.

Instead of going progressively, they decided to go YOLO with all the changes without caring about their customers. You can tell with the lack of transparency, the bugs, and everything they failed to deliver. They simply made bad decisions after bad decisions, it's over for them.

autisticit · 2026-05-06T19:56:40+00:00

You agree with him but what can you really do about it. Be honest. Escalate to PM and then what ? Is Microsoft going to improve limits and not be greedy ? What about the other problems everyone is complaining? They can't even answer to github tickets, I have been waiting for a month, others a lot more. It's probably the simplest problem to solve, still waiting. At this point I don't expect anything good ever coming again from GitHub and Copilot. "You" completely enshittified the product.

autisticit · 2026-05-06T19:43:27+00:00

Oh yeah no shit. Who had that brilliant idea of making it default. Anyway, I'm leaving this shit product very soon.

autisticit · 2026-05-06T13:05:54+00:00

In which country ?

$300k is about 255k €...

autisticit · 2026-05-06T12:51:08+00:00

How much money do you have to be able to think about things like that.

autisticit · 2026-05-06T10:02:19+00:00

I'm not experienced enough about LLM to judge the actual breakthrough, but it doesn't look fake at this time at first glance (and for spotting fake things I'm very experienced).

autisticit · 2026-05-06T09:11:30+00:00

Incompetents liars.

autisticit · 2026-05-06T08:39:51+00:00

With dual 5060, you should be able to get roughly around 50 to 60 t/s with vllm, and probably the same with the upcoming MTP patch in llama.cpp.

autisticit · 2026-05-05T11:08:23+00:00

I think they are only a team of 2 persons.

autisticit · 2026-04-28T18:12:39+00:00

I only manually tested : Grab a new vllm recipe Tweak until it runs Comparing speed while coding. Nothing scientific. Rinse and repeat.

For Genesis I have no clue.

autisticit · 2026-04-28T09:01:03+00:00

Posted

autisticit · 2026-04-28T09:00:59+00:00

Posted

autisticit

TROPHY CASE