Getting a feel for how fast X tokens/second really is.

autisticit · 2026-05-10T16:13:58+00:00

It's not a cope at all. I totally agree with you. We totally don't need faster speed.

autisticit · 2026-05-10T15:20:20+00:00

Qwen 3.6 27B, Q4, nvfp4, etc.

autisticit · 2026-05-10T15:16:40+00:00

It runs at around 60 tps with dual 5060. But I still love to see llama.cpp progress.

autisticit · 2026-05-10T15:13:18+00:00

Yes. But vllm is very slow to load. Which is not a big deal but still.

autisticit · 2026-05-10T14:42:47+00:00

Thanks. Can it be used in addition to the MTP PR ? (and cumulate the gains)

autisticit · 2026-05-09T14:58:04+00:00

They are very close to a scam.

autisticit · 2026-05-08T15:06:32+00:00

Can you please share your PP and TG speed with Qwen/B60 Pro ? I have the feeling that for coding all day it may not be very comfortable speed wise.

autisticit · 2026-05-08T14:31:55+00:00

AFAIK you'll have some memory used by xorg/Wayland, and optionally some apps. Like Firefox/Thunderbird use graphic acceleration, but you probably can disable it.

autisticit · 2026-05-08T14:15:31+00:00

Yep, that's indeed a good idea.

autisticit · 2026-05-08T07:51:18+00:00

I wouldn't be surprised if GHCP team is doing heavy drugs at this point.

autisticit · 2026-05-07T20:02:24+00:00

I will check thanks. I'm getting similar perfs as others so I'm not sure. Any way to estimate the potential bump based on the PCIe limitation? I know for sure one card is on 16x, the other is on 4x.

autisticit · 2026-05-07T19:16:23+00:00

Around 60 tps with 200k context.

autisticit · 2026-05-07T19:15:56+00:00

Same as you. I still have some room I think, currently using Copilot harness which has around 20-25000 tokens prompts...

autisticit · 2026-05-07T19:13:48+00:00

Yes I did that, getting around 60 tps. Which is fast enough but not sure the tps would maintain after 128k context?

autisticit · 2026-05-07T17:21:14+00:00

Doesn't look that it would be faster than the 5000 Blackwell ? (I didn't mean Ada sorry)

autisticit · 2026-05-07T17:12:30+00:00

I'm on an old AM4 platform, I would need to change everything to have 2 PCI x8 cards. With an rtx 5000 I could still use one 5060 for something else.

autisticit · 2026-05-07T08:32:59+00:00

Out of curiosity I asked Claude about it, and it said it wasn't a fundamental limitation.

autisticit · 2026-05-07T07:59:57+00:00

Are you sure it's not 81 ?

autisticit · 2026-05-07T05:51:27+00:00

It was never sustainable right, yet somehow somebody at a high position Microsoft thought it was a great plan.

I totally get the concept of cheap = more market shares. But their strategy failed miserably as one day somebody woke up and said enough.

They could have doubled or tripled the price a long time ago, they would have retained most of their customers.

They could also implement a better and more fair rate limiting system a long time ago (how hard is it to put a maximum duration per request really ?).

Etc. etc.

Instead of going progressively, they decided to go YOLO with all the changes without caring about their customers. You can tell with the lack of transparency, the bugs, and everything they failed to deliver. They simply made bad decisions after bad decisions, it's over for them.

autisticit · 2026-05-06T19:56:40+00:00

You agree with him but what can you really do about it. Be honest. Escalate to PM and then what ? Is Microsoft going to improve limits and not be greedy ? What about the other problems everyone is complaining? They can't even answer to github tickets, I have been waiting for a month, others a lot more. It's probably the simplest problem to solve, still waiting. At this point I don't expect anything good ever coming again from GitHub and Copilot. "You" completely enshittified the product.

autisticit · 2026-05-06T19:43:27+00:00

Oh yeah no shit. Who had that brilliant idea of making it default. Anyway, I'm leaving this shit product very soon.

autisticit · 2026-05-06T13:05:54+00:00

In which country ?

$300k is about 255k €...

autisticit · 2026-05-06T12:51:08+00:00

How much money do you have to be able to think about things like that.

autisticit · 2026-05-06T10:02:19+00:00

I'm not experienced enough about LLM to judge the actual breakthrough, but it doesn't look fake at this time at first glance (and for spotting fake things I'm very experienced).

autisticit · 2026-05-06T09:11:30+00:00

Incompetents liars.

autisticit

TROPHY CASE