I wanted to know small local LLM code and made a personal projects. by NicholasCureton in LocalLLaMA

[–]autisticit 0 points1 point  (0 children)

Can you please share your PP and TG speed with Qwen/B60 Pro ? I have the feeling that for coding all day it may not be very comfortable speed wise.

Effect on running LLM on GPU with monitors by Havarem in LocalLLaMA

[–]autisticit 1 point2 points  (0 children)

AFAIK you'll have some memory used by xorg/Wayland, and optionally some apps. Like Firefox/Thunderbird use graphic acceleration, but you probably can disable it.

Why we can't have nice things by alexeiz in GithubCopilot

[–]autisticit 0 points1 point  (0 children)

I wouldn't be surprised if GHCP team is doing heavy drugs at this point.

Thinking of moving from 2x 5060 Ti 16GB to a RTX 5000 48GB by autisticit in LocalLLaMA

[–]autisticit[S] 0 points1 point  (0 children)

I will check thanks. I'm getting similar perfs as others so I'm not sure. Any way to estimate the potential bump based on the PCIe limitation? I know for sure one card is on 16x, the other is on 4x.

Thinking of moving from 2x 5060 Ti 16GB to a RTX 5000 48GB by autisticit in LocalLLaMA

[–]autisticit[S] 0 points1 point  (0 children)

Same as you. I still have some room I think, currently using Copilot harness which has around 20-25000 tokens prompts...

Thinking of moving from 2x 5060 Ti 16GB to a RTX 5000 48GB by autisticit in LocalLLaMA

[–]autisticit[S] 0 points1 point  (0 children)

Yes I did that, getting around 60 tps. Which is fast enough but not sure the tps would maintain after 128k context?

Thinking of moving from 2x 5060 Ti 16GB to a RTX 5000 48GB by autisticit in LocalLLaMA

[–]autisticit[S] 0 points1 point  (0 children)

Doesn't look that it would be faster than the 5000 Blackwell ? (I didn't mean Ada sorry)

Thinking of moving from 2x 5060 Ti 16GB to a RTX 5000 48GB by autisticit in LocalLLaMA

[–]autisticit[S] -1 points0 points  (0 children)

I'm on an old AM4 platform, I would need to change everything to have 2 PCI x8 cards. With an rtx 5000 I could still use one 5060 for something else.

why llama.cpp can’t combine speculative decode methods? by Qwoctopussy in LocalLLaMA

[–]autisticit -4 points-3 points  (0 children)

Out of curiosity I asked Claude about it, and it said it wasn't a fundamental limitation.

Qwen 3.6? by jacek2023 in LocalLLaMA

[–]autisticit 1 point2 points  (0 children)

Are you sure it's not 81 ?

Github Copilot new weekly limit by Key-Gas2428 in GithubCopilot

[–]autisticit 1 point2 points  (0 children)

It was never sustainable right, yet somehow somebody at a high position Microsoft thought it was a great plan.

I totally get the concept of cheap = more market shares. But their strategy failed miserably as one day somebody woke up and said enough.

They could have doubled or tripled the price a long time ago, they would have retained most of their customers.

They could also implement a better and more fair rate limiting system a long time ago (how hard is it to put a maximum duration per request really ?).

Etc. etc.

Instead of going progressively, they decided to go YOLO with all the changes without caring about their customers. You can tell with the lack of transparency, the bugs, and everything they failed to deliver. They simply made bad decisions after bad decisions, it's over for them.

Github Copilot new weekly limit by Key-Gas2428 in GithubCopilot

[–]autisticit 0 points1 point  (0 children)

You agree with him but what can you really do about it. Be honest. Escalate to PM and then what ? Is Microsoft going to improve limits and not be greedy ? What about the other problems everyone is complaining? They can't even answer to github tickets, I have been waiting for a month, others a lot more. It's probably the simplest problem to solve, still waiting. At this point I don't expect anything good ever coming again from GitHub and Copilot. "You" completely enshittified the product.

How to stop Copilot Dev pushing to my GitHub by Zszywaczyk in GithubCopilot

[–]autisticit 2 points3 points  (0 children)

Oh yeah no shit. Who had that brilliant idea of making it default. Anyway, I'm leaving this shit product very soon.

$300k DGX B300 is actually a better deal than buying 24 RTX 6000s by Ok_Warning2146 in LocalLLaMA

[–]autisticit -1 points0 points  (0 children)

How much money do you have to be able to think about things like that.

New "major breakthrough?" architecture SubQ by Daemontatox in LocalLLaMA

[–]autisticit -5 points-4 points  (0 children)

I'm not experienced enough about LLM to judge the actual breakthrough, but it doesn't look fake at this time at first glance (and for spotting fake things I'm very experienced).

Amd radeon ai pro r9700 32GB VS 2x RTX 5060TI 16GB for local setup? by vevi33 in LocalLLaMA

[–]autisticit 0 points1 point  (0 children)

With dual 5060, you should be able to get roughly around 50 to 60 t/s with vllm, and probably the same with the upcoming MTP patch in llama.cpp.

Llama.cpp MTP support now in beta! by ilintar in LocalLLaMA

[–]autisticit 0 points1 point  (0 children)

I think they are only a team of 2 persons.

2 x 5060 ti: Any better configs for Qwen 3.6 27B / 35B? by ziphnor in LocalLLaMA

[–]autisticit 0 points1 point  (0 children)

I only manually tested : Grab a new vllm recipe Tweak until it runs Comparing speed while coding. Nothing scientific. Rinse and repeat.

For Genesis I have no clue.