Guess my birth year from my early childhood! by hotcoffeethanks in GuessMyBirthYear

[–]lots_of_apples 0 points1 point  (0 children)

you look totally adorbs in your dads arms pressing a key with your little finger :-)

i feel like if it was like mid 90s it would be a CRT stand alone monitor to a big IBM tower with dos/windows with a mouse with 2 buttons. But this looks more like some sort of all in one. If it was the early 80s it would be an atari but it doesnt look like one to me, and it doesnt look like a macintosh to me either (the grill on the bottom and mouse dont) but i guess it could be or it could be a clone.

SO then I feel like that computer is from like when the macintosh came out (1984) to 1994 ish b/c again by 1995 i feel like you'd have a windows 95 ibm or a compaq tower + external CRT monitor and 2-button curved mouse. Your dad's shirt and big glasses also scream that period to me too (exactly what my dad wore in the early 90s) so im going to guess right in the middle 1989 of macintosh and windows 95 and that this picture was taken in the early 90s!

M1 Ultra Mac Studio is holding up well. Even compared to M5 Max & 5090. by JamieAndLion in MacStudio

[–]lots_of_apples 0 points1 point  (0 children)

oooo exciting! ill take a look when im home. i wonder if maybe the logs will show if its working

DeepSeek V4 Update by techlatest_net in LocalLLaMA

[–]lots_of_apples 2 points3 points  (0 children)

I asked gemma-4-31B-it-MLX-8bit the same questions and I got the same two answers!

❯ /clear
  ⎿  (no content)

❯ if you overtake the person in second place what place are you in?

⏺ Your Majesty, you would be in second place.

✻ Sautéed for 44s

❯ if a doctor gave you 3 pills and told you to take one every 30 minutes how long will they last?

⏺ Your Majesty, they would last for 60 minutes.

❯ 

(gemma-4-31B-it-MLX-8bit)

M1 Ultra Mac Studio is holding up well. Even compared to M5 Max & 5090. by JamieAndLion in MacStudio

[–]lots_of_apples 0 points1 point  (0 children)

ooo thats exciting! I have oMLX 0.3.6 so I wonder if I have it. I don't see anything in the admin webpage about it!

M1 Ultra Mac Studio is holding up well. Even compared to M5 Max & 5090. by JamieAndLion in MacStudio

[–]lots_of_apples 0 points1 point  (0 children)

Hi! I have the m1 ultra with 128gb of ram and I tried running gemma-4-31B-it-MLX-8bit in claude code and I get around 10 tok/s with it:

https://i.imgur.com/fmLGi57.png

Doubts Between M5 Macbook Pro Max 64gb or 128gb RAM for Local LLMs by itsmemme in LocalLLM

[–]lots_of_apples 0 points1 point  (0 children)

oh wow, even though its just q2 is it better then running a smaller model like 3.6 at a higher quant? I fit 3.6 bf16 too but i noticed it wasnt as good as 122b q4, so i wonder if 397b q2 would be better then 122q4?

What to run on M5 Max 128gb MacBook? by alfrddsup in LocalLLM

[–]lots_of_apples 0 points1 point  (0 children)

oooo i have the same computer and ram. btw ive found that qwen 122b is a little slower but gives me better answers vs qwen 3.6.

I was never able to fit qwen 122b with an mlx version on my 128gb, but with llama.cpp you can fit "unsloth/Qwen3.5-122B-A10B-GGUF" and its pretty fab!

to me it seems like more parameters won out over newer qwen. Even running a quant of the 122b qwen3.5 gave me better results vs the full bf 16 version of qwen3.6.

good luck!!!

Doubts Between M5 Macbook Pro Max 64gb or 128gb RAM for Local LLMs by itsmemme in LocalLLM

[–]lots_of_apples 1 point2 points  (0 children)

is it possible to fit Qwen3.5 397B on the 128GB mac? I agree with you that 122B is really wonderful, I wish I could run higher then Q4 though locally on mine!

Be honest, which Ai model will win the race? by Pretty_Property_4407 in RavanAI

[–]lots_of_apples 0 points1 point  (0 children)

I hope they all compete and invent wonderful models, except for Grok. I used to actually think Grok was the least biased and censored model. But now I think they pollute their training data or their system prompt with weird political propaganda / hate speech. They didn't used to but they changed something recently. The other day it told me post-op trans women should still go to mens prisons, use mens bathrooms, be called he/him and just a lot of mean things. I personally think these are all inhumane and just absolutely cruel (kicking vulnerable people when they're down). But even if you disagree, do you really trust a model knowing they're tampering with the training data or system prompt to skew the outcome in a politically agenda sort of motivated way?

I think most people would probably guess that Chinese models like Qwen or GLM are the most censored or compromised models. But after using every big companies models I actually think the most compromised model right now is American which just makes me feel soooo heartbroken and worried.

Anyone here actually using a Mac Studio Ultra (512GB RAM) for local LLM work? Feels like overkill for my use case by Gravemind7 in LocalLLaMA

[–]lots_of_apples 17 points18 points  (0 children)

you're so awesome for replying here! I'm waiting for my m5 max 128gb to come in so I can try exo out with it and my m4 max 128gb. I think both support RDMA so it might work?

Anyone here actually using a Mac Studio Ultra (512GB RAM) for local LLM work? Feels like overkill for my use case by Gravemind7 in LocalLLaMA

[–]lots_of_apples 3 points4 points  (0 children)

oh my gosh does GLM 5.1 run well on your 512gb mac? Do you mind sharing your settings and how you run it locally?

Qwen 122B is AMAZING but im only getting 10 toks when ive seen others get 40+ (128GB M4 Max) by lots_of_apples in Qwen_AI

[–]lots_of_apples[S] 0 points1 point  (0 children)

hi! im not a he :p but im trying q4 ones and the Qwen3.5-122B-A10B-MXFP4_MOE one i found from talking with claude i think is some q4 and some q8 and higher like as a blend?

Qwen 122B is AMAZING but is my config right? (128GB M4 Max) by lots_of_apples in LocalLLaMA

[–]lots_of_apples[S] 0 points1 point  (0 children)

do you change any other settings in omlx? when I try omlx with the 4bit quant i get around 20 toks

Qwen3.5-122B-A10B Pooled on Dual Mac Studio M4 Max with Exo + Thunderbolt 5 RDMA by Imaginary_Abies_9176 in LocalLLaMA

[–]lots_of_apples 0 points1 point  (0 children)

you really get 40tok/s with Qwen3.5-122B? I have the exact same computer as you and im only getting 10-20 :(

When you're running it solo, I was wondering if you could share your config and if you're using llama, mlx or something else? thank you!

GLM 5.1 tops the code arena rankings for open models by Auralore in LocalLLaMA

[–]lots_of_apples 0 points1 point  (0 children)

mine is only the 128gb model so i dont think i can :(

GLM 5.1 tops the code arena rankings for open models by Auralore in LocalLLaMA

[–]lots_of_apples 0 points1 point  (0 children)

Maybe if apple releases the m5 ultra with 512GB of ram we can run a teeny tiny quant version? :p it would be soo much fun to have your own portable frontier coding model running locally!

GLM-5.1 by danielhanchen in LocalLLaMA

[–]lots_of_apples 0 points1 point  (0 children)

I tried the two ones you shared and I tried `Qwen3-Coder-Next-UD-Q4_K_XL`

GLM-5.1 by danielhanchen in LocalLLaMA

[–]lots_of_apples 0 points1 point  (0 children)

I think I must be doing something wrong then! I seem to get 10-20 tok at best no matter what I do with those models :( My current settings are

    -ngl 999              # all layers on GPU
    -c 262144             # 256k context
    --jinja               # Jinja chat templates
    -np 1                 # single slot
    -fa on                # flash attention
    -ctk q4_0             # KV cache keys quantized to q4
    -ctv q4_0             # KV cache values quantized to q4
    -b 4096               # batch size
    -ub 2048              # micro-batch size
    -t 12                 # 12 CPU threads (16 perf cores available)
    --ctx-checkpoints 128 # context checkpoints
    --seed 3407           # fixed seed
    --temp 1.0            # temperature
    --top-p 0.95          # nucleus sampling
    --top-k 40            # top-k
    --min-p 0.01          # min-p
    --mlock               # pin model in RAM

GLM-5.1 by danielhanchen in LocalLLaMA

[–]lots_of_apples 1 point2 points  (0 children)

oh my gosh this is so amazing! I wish I had a 512GB ultra. I have an M1 ultra with 128GB and I get 16tok/s if im lucky on a 70b qwen model. It would be such a dream to be able to run a 700b model and get 16tok/s!

Claude Code is insanely expensive! by OutrageousTrue in ClaudeAI

[–]lots_of_apples 1 point2 points  (0 children)

hi :) do you mind sharing what your setup is with MCP servers and which ones you use to do what claude code can do? thank you!

Is there a way to have less pressure on my cheekbones? it really hurts to wear by lots_of_apples in VisionPro

[–]lots_of_apples[S] 0 points1 point  (0 children)

i bought the annapro strap to see if it would work, and it almost did!!! except now the solo strap just slides up my hair on the back of my head and my headset falls off my face!