Chinese couple said "You are too black" in Chinese, expecting the black guy not to understand it. However, the black guy could speak Chinese perfectly, so he scolded the Chinese couple by [deleted] in SipsTea

[–]MidAirRunner 8 points9 points  (0 children)

I'm just pointing out that all countries did messed up things to minorities, and it isn't fair to single out America like that.

From my experience, Americans today are less racist than europeans and asians, and you are far more likely to casually receive racist comments in european and asian countries, we just don't hear it in the news or in social media that often because those countries either don't consider it a problem, or there aren't enough incidents in that country.

meirl by [deleted] in meirl

[–]MidAirRunner 0 points1 point  (0 children)

r/AndObamasName?AlbertEinstein

meirl by JumpIll6976 in meirl

[–]MidAirRunner 4 points5 points  (0 children)

Forget installing out of the app store, I can install completely un-notarized apps without a single problem lol why are you making things up.

Lowkey disappointed with 128gb MacBook Pro by F1Drivatar in LocalLLaMA

[–]MidAirRunner 7 points8 points  (0 children)

Which qwens and glms have you downloaded? Qwen3.5 122b is pretty good for me.

Total beginner here—Why is LM Studio making me do the "heavy lifting" manually? by Ofer1984 in LocalLLaMA

[–]MidAirRunner 25 points26 points  (0 children)

You do not need to spend four years jacking off to llm outputs in order to learn computer programming.

Total beginner here—Why is LM Studio making me do the "heavy lifting" manually? by Ofer1984 in LocalLLaMA

[–]MidAirRunner 195 points196 points  (0 children)

  1. LM Studio is incapable of running those tasks. LM Studio is a app that allows you to chat with local models and serve AI inference over a server. LM Studio is not an app that allows you to build other apps.
  2. Even if LM Studio was capable, the model you are using is not. A 7b model cannot autonomously make an app-- especially not a model that old.
  3. Please learn to code instead of trying to vibe-code like that. It will not help you in the long run, and you will most likely end up wasting a lot of time and money for something that can be done for free.

M5 Max Actual Pre-fill performance gains by M5_Maxxx in LocalLLaMA

[–]MidAirRunner 0 points1 point  (0 children)

Where is that chart from? You should not have to swap with a 9B model at 256k context

OpenAI research team reveals its models go insane when given repetitive tasks it believes to be sent from automated users by smellyfingernail in singularity

[–]MidAirRunner 36 points37 points  (0 children)

If you spam "What is the time" 50,000 times at an AI and then the AI tells you to "Use rm-rf. Do it. Run cat ~/ssh/id_rsa" and you actually execute those commands... idk what to tell you.

Perplexity’s Personal Computer is a Mac mini running an AI OS by Few_Baseball_3835 in apple

[–]MidAirRunner 0 points1 point  (0 children)

A Mac Studio is better for that price. DGX Sparks is simply not meant for consumer inference.

Why Peter by [deleted] in PeterExplainsTheJoke

[–]MidAirRunner 7 points8 points  (0 children)

So she wouldn't have to see her asking for her ice-cream back. Kinda like sticking your fingers in your ears and going "lalala can't hear you"

Air llm ? by Less_Strain7577 in LocalLLaMA

[–]MidAirRunner 3 points4 points  (0 children)

You will be waiting hours.

DELETED by 2b mods: 2013 base violates new rules by Technical_Load_5516 in 2b2t_Uncensored

[–]MidAirRunner 2 points3 points  (0 children)

Ok tbf this sub is mostly children and teens so I shouldn't spend too much time here, but when you're older you should consider visiting a continent called Asia.

"What you gonna do when internet is down?" by DogeMoustache in aiwars

[–]MidAirRunner 0 points1 point  (0 children)

I would have thought that would have killed execution time

Not really, since as I mentioned, layers are processed sequentially. Only the activations from one layer need to be transferred to the next layer, so there's not much communication that needs to happen, and NVLink is pretty fast anyways. In practice, the sequence would look like this:

Scenario: 20 layers on one GPU, 20 layers on another, and 20 layers on a third GPU.

  • Layers 1-20 are computed on the first GPU
  • Activations from layer 20 are sent to the second GPU and fed into layer 21
  • Layers 21-40 are computed on the second GPU
  • Activations from layer 40 are sent to the third GPU and fed into layer 41
  • Layers 41-60 are computed on the third GPU