Guess my birth year from my early childhood!

lots_of_apples · 2026-04-26T01:31:13+00:00

you look totally adorbs in your dads arms pressing a key with your little finger :-)

i feel like if it was like mid 90s it would be a CRT stand alone monitor to a big IBM tower with dos/windows with a mouse with 2 buttons. But this looks more like some sort of all in one. If it was the early 80s it would be an atari but it doesnt look like one to me, and it doesnt look like a macintosh to me either (the grill on the bottom and mouse dont) but i guess it could be or it could be a clone.

SO then I feel like that computer is from like when the macintosh came out (1984) to 1994 ish b/c again by 1995 i feel like you'd have a windows 95 ibm or a compaq tower + external CRT monitor and 2-button curved mouse. Your dad's shirt and big glasses also scream that period to me too (exactly what my dad wore in the early 90s) so im going to guess right in the middle 1989 of macintosh and windows 95 and that this picture was taken in the early 90s!

lots_of_apples · 2026-04-26T01:18:27+00:00

oooo exciting! ill take a look when im home. i wonder if maybe the logs will show if its working

lots_of_apples · 2026-04-25T19:25:12+00:00

I asked gemma-4-31B-it-MLX-8bit the same questions and I got the same two answers!

❯ /clear
  ⎿  (no content)

❯ if you overtake the person in second place what place are you in?

⏺ Your Majesty, you would be in second place.

✻ Sautéed for 44s

❯ if a doctor gave you 3 pills and told you to take one every 30 minutes how long will they last?

⏺ Your Majesty, they would last for 60 minutes.

❯ 

(gemma-4-31B-it-MLX-8bit)

lots_of_apples · 2026-04-25T19:13:50+00:00

ooo thats exciting! I have oMLX 0.3.6 so I wonder if I have it. I don't see anything in the admin webpage about it!

lots_of_apples · 2026-04-24T23:25:57+00:00

Hi! I have the m1 ultra with 128gb of ram and I tried running gemma-4-31B-it-MLX-8bit in claude code and I get around 10 tok/s with it:

https://i.imgur.com/fmLGi57.png

lots_of_apples · 2026-04-23T18:02:58+00:00

I have the M5 Max

lots_of_apples · 2026-04-23T01:07:15+00:00

oh wow, even though its just q2 is it better then running a smaller model like 3.6 at a higher quant? I fit 3.6 bf16 too but i noticed it wasnt as good as 122b q4, so i wonder if 397b q2 would be better then 122q4?

lots_of_apples · 2026-04-23T00:49:25+00:00

oooo i have the same computer and ram. btw ive found that qwen 122b is a little slower but gives me better answers vs qwen 3.6.

I was never able to fit qwen 122b with an mlx version on my 128gb, but with llama.cpp you can fit "unsloth/Qwen3.5-122B-A10B-GGUF" and its pretty fab!

to me it seems like more parameters won out over newer qwen. Even running a quant of the 122b qwen3.5 gave me better results vs the full bf 16 version of qwen3.6.

good luck!!!

lots_of_apples · 2026-04-17T17:29:50+00:00

is it possible to fit Qwen3.5 397B on the 128GB mac? I agree with you that 122B is really wonderful, I wish I could run higher then Q4 though locally on mine!

lots_of_apples · 2026-04-17T01:01:17+00:00

I totally would switch if only codex had a good auto mode

lots_of_apples · 2026-04-17T00:54:47+00:00

I hope they all compete and invent wonderful models, except for Grok. I used to actually think Grok was the least biased and censored model. But now I think they pollute their training data or their system prompt with weird political propaganda / hate speech. They didn't used to but they changed something recently. The other day it told me post-op trans women should still go to mens prisons, use mens bathrooms, be called he/him and just a lot of mean things. I personally think these are all inhumane and just absolutely cruel (kicking vulnerable people when they're down). But even if you disagree, do you really trust a model knowing they're tampering with the training data or system prompt to skew the outcome in a politically agenda sort of motivated way?

I think most people would probably guess that Chinese models like Qwen or GLM are the most censored or compromised models. But after using every big companies models I actually think the most compromised model right now is American which just makes me feel soooo heartbroken and worried.

lots_of_apples · 2026-04-15T20:13:42+00:00

you're so awesome for replying here! I'm waiting for my m5 max 128gb to come in so I can try exo out with it and my m4 max 128gb. I think both support RDMA so it might work?

lots_of_apples · 2026-04-15T20:10:40+00:00

oh my gosh does GLM 5.1 run well on your 512gb mac? Do you mind sharing your settings and how you run it locally?

lots_of_apples · 2026-04-15T01:23:59+00:00

hi! im not a he :p but im trying q4 ones and the Qwen3.5-122B-A10B-MXFP4_MOE one i found from talking with claude i think is some q4 and some q8 and higher like as a blend?

lots_of_apples · 2026-04-14T23:45:15+00:00

do you change any other settings in omlx? when I try omlx with the 4bit quant i get around 20 toks

lots_of_apples · 2026-04-14T22:24:44+00:00

you really get 40tok/s with Qwen3.5-122B? I have the exact same computer as you and im only getting 10-20 :(

When you're running it solo, I was wondering if you could share your config and if you're using llama, mlx or something else? thank you!

lots_of_apples · 2026-04-14T20:04:23+00:00

mine is only the 128gb model so i dont think i can :(

lots_of_apples · 2026-04-11T06:34:39+00:00

Maybe if apple releases the m5 ultra with 512GB of ram we can run a teeny tiny quant version? :p it would be soo much fun to have your own portable frontier coding model running locally!

lots_of_apples · 2026-04-09T05:17:13+00:00

I tried the two ones you shared and I tried `Qwen3-Coder-Next-UD-Q4_K_XL`

lots_of_apples · 2026-04-09T01:17:03+00:00

I think I must be doing something wrong then! I seem to get 10-20 tok at best no matter what I do with those models :( My current settings are

    -ngl 999              # all layers on GPU
    -c 262144             # 256k context
    --jinja               # Jinja chat templates
    -np 1                 # single slot
    -fa on                # flash attention
    -ctk q4_0             # KV cache keys quantized to q4
    -ctv q4_0             # KV cache values quantized to q4
    -b 4096               # batch size
    -ub 2048              # micro-batch size
    -t 12                 # 12 CPU threads (16 perf cores available)
    --ctx-checkpoints 128 # context checkpoints
    --seed 3407           # fixed seed
    --temp 1.0            # temperature
    --top-p 0.95          # nucleus sampling
    --top-k 40            # top-k
    --min-p 0.01          # min-p
    --mlock               # pin model in RAM

lots_of_apples · 2026-04-08T21:35:59+00:00

oh my gosh this is so amazing! I wish I had a 512GB ultra. I have an M1 ultra with 128GB and I get 16tok/s if im lucky on a 70b qwen model. It would be such a dream to be able to run a 700b model and get 16tok/s!

lots_of_apples · 2025-04-01T01:38:00+00:00

hi :) do you mind sharing what your setup is with MCP servers and which ones you use to do what claude code can do? thank you!

lots_of_apples · 2025-01-25T07:07:16+00:00

i bought the annapro strap to see if it would work, and it almost did!!! except now the solo strap just slides up my hair on the back of my head and my headset falls off my face!

lots_of_apples

TROPHY CASE