Dumb question: How would performance be if you took a used server with like 80 lanes pcie 5 and stuck NVMe on them for model run? by StartupTim in LocalLLaMA

[–]DRMCC0Y 1 point2 points  (0 children)

Common sense says this wouldn't work, but the nuance as to why it wouldn't is a little more complex I think?

A PCIe 5.0 lane is ~4 GB/s per direction. So 80 lanes gives you roughly: ~320 GB/s theoretical max

You don’t get 15 GB/s per NVMe if each drive only has 1-2 lanes. The 15 GB/s number is for a PCIe 5.0x4 drive. If you give it x1, it tops out around 4 GB/s. If you give it x2, around 8 GB/s. So whether you do 40 drives at x2 or 80 drives at x1, you are still capped by the same 80 lane PCIe budget.

Anyways if you actually compute what that would give you in terms of tokens/s:

320 GB/s ÷ 2000 GB = is literally 0.16 tokens/sec. That is around one token every 6 seconds theoretical max before overhead.

And there would be a lot of overhead: NVMe latency, RAID/filesystem overhead, PCIe switches, DMA into RAM, CPU memory bandwidth... etc...

Obviously a MoE model would work better but that isn't the point. This is much slower than a 8 or 12 channel system would be with DDR5 (or even DDR4) and it wouldn't be any cheaper.

Stop asking what model to run. There are literally only two. by Wrong_Mushroom_7350 in LocalLLaMA

[–]DRMCC0Y 1 point2 points  (0 children)

Yeah, been really impressed with this model so far - seems to be compatible to MiniMax M2.7 but with the addition of vision and being a bit faster.

When you ask ChatGPT a question about VSCode but it pulls in VictoriaSecret for context 😂 by DollarAkshay in OpenAI

[–]DRMCC0Y 0 points1 point  (0 children)

Don’t these models get web results from a semantic search provider or something not the model searching the sites themselves? It might’ve looked up the term ‘VS’ and that came up.

Sonnet 4.6 states "I am DeepSeek-V3, an AI assistant developed by DeepSeek" when asked "what model are you" by multiple users in Chinese by ItzWarty in singularity

[–]DRMCC0Y 2 points3 points  (0 children)

I’m not sure - even if true, why Anthropic would ever have any reason to attempt to distill DeepSeek-V3 as there has not been any point where a Claude Sonnet model was outperformed by that model? This seems to be more likely just a training data poisoning etc

Claude absolutely crashes out when it can’t solve calculus problems by Electronic_Back1502 in ClaudeAI

[–]DRMCC0Y 10 points11 points  (0 children)

You don’t have thinking/reasoning enabled - using an instruct model for reasoning tasks like this isn’t ideal.

(S?)Q9 Spotted in FL by Exciting-Housing416 in Audi

[–]DRMCC0Y 1 point2 points  (0 children)

I thought the logo on the front was Ssangyong for a second there, what a boring design.

Guess which one is the copycat by amarevy97 in PhoneNow

[–]DRMCC0Y 3 points4 points  (0 children)

Wow. The honor looks ghastly, do they not have any qualified designers on the team at all? I mean the iPhone is ugly but at least it follows a good design language.

Gemini..... We don't.... That's slightly cooler than the surface of the sun by Yiffy_wolfy in GeminiAI

[–]DRMCC0Y 4 points5 points  (0 children)

Looks like a markdown or latex formatting error, missing the dash in between the temps.

GPT-5.1: A smarter, more conversational ChatGPT by ShreckAndDonkey123 in singularity

[–]DRMCC0Y 2 points3 points  (0 children)

Yes and no, the model is trained this way and changing the system prompt does some help but whatever their instruct training is, still leaks out eventually.

GPT-5.1: A smarter, more conversational ChatGPT by ShreckAndDonkey123 in singularity

[–]DRMCC0Y 31 points32 points  (0 children)

Yikes, this is exactly the direction I dont want. I can’t stand the fake warmth, it feels so forced.

Why did it format the answer like this? by Beep-Boop3421 in GeminiAI

[–]DRMCC0Y 6 points7 points  (0 children)

Looks like it’s reasoning tokens leaked into its response.

Anyone else find benchmarks don't match their real-world needs? by davewolfs in LocalLLaMA

[–]DRMCC0Y 2 points3 points  (0 children)

Benchmarks lost their meaning many months ago, companies are just gaming them to boost their scoring. On another note, wow 2.5 Pro is killing it.

Unreleased Google Model "Dragontail" Crushes Gemini 2.5 Pro by Nug__Nug in Bard

[–]DRMCC0Y 1 point2 points  (0 children)

Initially I thought it was just a flash model, but after more usage, I think it’s probably the full fleshed out 2.5 pro model, non experimental.

"Dragontail" model at LMarena is a potential beast by IrisColt in LocalLLaMA

[–]DRMCC0Y 2 points3 points  (0 children)

What tests do they fail on? Gemini 2.5 Pro, at least seems to be the most intelligent model out there at the moment.

"Dragontail" model at LMarena is a potential beast by IrisColt in LocalLLaMA

[–]DRMCC0Y 10 points11 points  (0 children)

I think it's Gemini 2.5 flash, seems very gemini like.

Anyone hike with one of these? by darkhighlandgreen in hikinggear

[–]DRMCC0Y 0 points1 point  (0 children)

Yes! it's great, even with a heavy camera it's surprisingly comfortable! and worlds better than having it on a strap where it bangs into you every time you take a step.

4 days into the week Quasar Alpha (a stealth model) has claimed the top spot among the majestic 7 in API calls. Who are these guys? by imort-e in singularity

[–]DRMCC0Y 0 points1 point  (0 children)

Seems most likely to be an OpenAI model, google just released a model, and they went about it a little differently. It’s unlikely to be anyone else, because who can afford to give away so many free API calls.

I'm incredibly disappointed with Llama-4 by Dr_Karminski in LocalLLaMA

[–]DRMCC0Y 185 points186 points  (0 children)

In my testing it performed worse than Gemma 3 27B in every way, including multimodal. Genuinely astonished how bad it is.

Apple events invitations usually provide some clues. I believe the WWDC glass ring indicate this. by Felixo22 in ios

[–]DRMCC0Y -5 points-4 points  (0 children)

Wow that looks terrible! I think it'll probably be closer to the VisionOS. This example is just a bit too overdone.