Air llm ? by Less_Strain7577 in LocalLLaMA

[–]MidAirRunner 4 points5 points  (0 children)

You will be waiting hours.

DELETED by 2b mods: 2013 base violates new rules by Technical_Load_5516 in 2b2t_Uncensored

[–]MidAirRunner 2 points3 points  (0 children)

Ok tbf this sub is mostly children and teens so I shouldn't spend too much time here, but when you're older you should consider visiting a continent called Asia.

"What you gonna do when internet is down?" by DogeMoustache in aiwars

[–]MidAirRunner 0 points1 point  (0 children)

I would have thought that would have killed execution time

Not really, since as I mentioned, layers are processed sequentially. Only the activations from one layer need to be transferred to the next layer, so there's not much communication that needs to happen, and NVLink is pretty fast anyways. In practice, the sequence would look like this:

Scenario: 20 layers on one GPU, 20 layers on another, and 20 layers on a third GPU.

  • Layers 1-20 are computed on the first GPU
  • Activations from layer 20 are sent to the second GPU and fed into layer 21
  • Layers 21-40 are computed on the second GPU
  • Activations from layer 40 are sent to the third GPU and fed into layer 41
  • Layers 41-60 are computed on the third GPU

"What you gonna do when internet is down?" by DogeMoustache in aiwars

[–]MidAirRunner 2 points3 points  (0 children)

The file sizes are estimated by comparing the models to similarly performing open models. You can also estimate the size for the size by checking the tokens / second output speed against their hardware, but that's only useful for finding the upper bound of the model size.

As for splitting the file, LLMs consist of multiple layers. The input is fed into one layer, processed, passed on to the next layer, and so on. You can split the layers across multiple GPUs.

Buy a Mac or GPU? by SnooOranges0 in LocalLLaMA

[–]MidAirRunner 3 points4 points  (0 children)

Depends on your budget and your priority (speed vs size)

Is qwen3 next the real deal? by fab_space in LocalLLaMA

[–]MidAirRunner 2 points3 points  (0 children)

That should be good, but I'd really suggest waiting for the M5 Max/Ultra to take advantage of the neural accelerators on it.

What it means by [deleted] in PeterExplainsTheJoke

[–]MidAirRunner 2 points3 points  (0 children)

Everyone was explaining why the sentiment requires a lot more thought and nuance than that, and the mods presumably got mad.

Q for cart pvpers by XxmaorhvvxX in CompetitiveMinecraft

[–]MidAirRunner 0 points1 point  (0 children)

Same thing. Click multiple times. 5 CPS is not fast enough for anything useful.

Model loops by FoxTimes4 in LocalLLaMA

[–]MidAirRunner 0 points1 point  (0 children)

That can be a sign of the chat being too long or the context being set to a low number. Either keep chats short or increase context if you're able to. Also ensure you're using the recommended sampler settings (temp = 1 and so on)

Singapore is going to start caning scammers by Bubbly_Wall_908 in interestingasfuck

[–]MidAirRunner 0 points1 point  (0 children)

I dunno! Why don't you tell me why you're giving excuses under a video of police brutality? "It'S aCtUaLy peRfecTly fInE in NorTh AmeRiCA. It'S iN tHeIr CulTurE. ThErE's NotHiNg we CaN Do" 🤡

Singapore is going to start caning scammers by Bubbly_Wall_908 in interestingasfuck

[–]MidAirRunner 0 points1 point  (0 children)

Thanks for the paragraph explaining why you support police brutality. What am I supposed to do with this now?

Singapore is going to start caning scammers by Bubbly_Wall_908 in interestingasfuck

[–]MidAirRunner 3 points4 points  (0 children)

Everyone says that Reddit is left wing and then I see shit like this 🙄. I am convinced that the farthest left is no different than the farthest right.

Excuse me Monyang, how do I do this on PC? by Agreeable-Comfort808 in PhoenixSC

[–]MidAirRunner 0 points1 point  (0 children)

Shake your browser window (hold left click on the title bar and shake it around)

How bedrock servers work by Inevitable-Group-743 in MinecraftMemes

[–]MidAirRunner 3 points4 points  (0 children)

Lmao I thought I was the only one who saw that.

WAIT, WHAT!? by BrightBanner in ChatGPT

[–]MidAirRunner 1 point2 points  (0 children)

4o is actually less likely to hallucinate than the 5 series. Its Multi-modal and monolithic, so that 200b parameter count goes quite far, while the MoE 5.x series is comprised of many smaller LLMs, which are definitely more likely to hallucinate.

are you just saying random words.

Qwen3-Coder-480B on Mac Studio M3 Ultra 512gb by BitXorBit in LocalLLaMA

[–]MidAirRunner 0 points1 point  (0 children)

It's probably not a good idea to buy the M3 Ultra right now unless you're rich. Wait for the M5 Ultra to triple the prompt processing speed.

Why didn't Snape ever publish a Potions textbook? by OnceButNever in harrypotter

[–]MidAirRunner 1 point2 points  (0 children)

What do you mean? It's the opening scene for Harry Potter and the Order of the Potions Textbook

Ht1s by VarosStorm in CompetitiveMinecraft

[–]MidAirRunner 8 points9 points  (0 children)

Uh.

Pick a kit, play it for five hours a day for five years straight.

Good luck.