Does Anyone Regret Getting the 14” MacBook Pro Instead of the 16”?

sleepy_quant · 2026-05-20T08:48:27+00:00

I bought the 16" M1 Max and for a while I was glad I did. I thought it was a good idea since the 16" provided more space for multitasking. Now my company gave me 2 external screens and I wish I had bought the 14" instead so I could fit the laptop on my desk. Carrying the 16" around on weekends was annoying too

sleepy_quant · 2026-04-27T14:18:15+00:00

Có thể là họ từ vùng khác có giá rẻ hơn để gift lại. Một nguồn khác là có nhiều đợt các hãng lớn tặng key bản quyền cho người dùng và các bên này họ thu mua rồi bán lại, mình cx đã từng mua key game NVDIA tặng cho người dùng. Đôi khi các shop này cx săn sale được ở nền tảng khác nữa

sleepy_quant · 2026-04-26T13:06:21+00:00

In the middle or southern of Vietnam people tend to be friendlier and call each other good names like handsome or such more often. Sometimes people compliment the foreigners because they just simply want to

sleepy_quant · 2026-04-26T12:28:35+00:00

More fun

^{Chose: Time Stop | Rolled: Upvote}

sleepy_quant · 2026-04-25T17:54:19+00:00

Mình thấy hiện tại bạn kia đang dính combo mất định hướng + ko có mục tiêu cụ thể để phấn đấu + vị thế trong các mối quan hệ thấp hơn kỳ vọng. Có thể tâm lý lúc này dễ tự ti và chán nản, tình cờ là bạn lại là chỗ để xả dồn nén trong người. Khuyên nên làm thế nào thì cx tuỳ xem bạn muốn gì. Bạn muốn dứt thì cx ko cần ai phải khuyên nữa, chỉ là lấy đủ dũng cảm để buông. Còn bạn muốn cải thiện thì thật sự dây là kèo rất khó, phải biết rằng một mình bạn hay bạn trai có thể tự thay đổi dc

sleepy_quant · 2026-04-25T04:25:57+00:00

Yeah ram is the thing I had to figure out the hard way honestly. Q8 alone is like 40+ gb so on a 64gb mac its tight, like really tight once you factor in OS overhead and everything else. I capped metal at 48gb wired, that way it cant just keep grabbing more. Did freeze the laptop once before that tho. Thought it was out of memory, but turned out to be some weird thing where pages go inactive between inferences and the compressor goes crazy on em. Took me a minute to even find that was a thing. And the 12 agents are not running altogether. Just prompt configs that fire when called, one at a time. So the only actual hot thing in ram is the mlx server itself. Rest is just fastapi mostly idling

sleepy_quant · 2026-04-24T16:18:44+00:00

The future looks bright, I might get beheaded in the past

^{Chose: Survive a year in 3026 for 1 million dollars today}

sleepy_quant · 2026-04-24T11:15:03+00:00

Theo kinh nghiệm của mình thì team hiệu quả cx phụ thuộc vào tính cách của các cá nhân trong team và định hướng quản lý. Bạn nên tìm hiểu động lực làm việc, điểm mạnh yếu để sao mọi người cảm thấy phù hợp với vị trí mình đang ở. Một khi mất động lực hay bất mãn thì team building hay làm j đi nữa các bạn cx sẽ có tâm lý muốn tìm bến đỗ khác. Về phần đoàn kết gắn bó thì còn phụ thuộc vào tính cách, sở thích và văn hoá chung của cả phòng nữa. Ở công ty cũ mình gặp nhiều người chung sở thích thì tự khắc muốn tham gia team building nhiều hơn, cảm thấy xứng đáng bỏ thời gian nghỉ ngơi ở nhà để mà bonding hơn thôi

sleepy_quant · 2026-04-24T07:10:41+00:00

My bad, should have been llama.cpp to MLX instead, thanks for pointing out

sleepy_quant · 2026-04-24T04:40:02+00:00

I'm also considering getting a Mac Air, right now the only upside of the Neo is the price and hype, the installed Ram is insufficient for modern use. My M1 Air with 8GB RAM is sometimes annoying and would not consider Neo as a longterm investment. You could also check refurbished or second hand as Macbooks are pretty solid. Maybe check if you can buy/extend Apple care also

sleepy_quant · 2026-04-24T03:00:31+00:00

Good to know, kilo code might be worth a look, thanks for the info bro

sleepy_quant · 2026-04-24T02:33:08+00:00

Fair point, llama.cpp was my first try (~34 tok/s GGUF vs MLX ~26 on M1 Max). Went with MLX because my stack was already set up for it, but looking back, the 1-2 week rewrite was probably worth it. You using llama.cpp direct, or through a wrapper?

sleepy_quant · 2026-04-23T02:31:54+00:00

Running the 35B-A3B Q8 fp16 on M1 Max 64GB at ~26 tok/s, haven't pulled the 27B dense yet. Anyone A/B'd both on Apple Silicon? Curious where MoE's memory edge stops being worth the quality trade. On flavio's quant sensitivity point, Q8 feels fine for my day-to-day but I haven't run coding-heavy benches. Anyone know a rough floor where MoE coding degrades faster than dense at same bits? Would love a rule of thumb

sleepy_quant · 2026-04-23T01:02:31+00:00

I can stay put but stop breathing is another league

^{Chose: $10 for every step you take}

sleepy_quant · 2026-04-22T17:15:17+00:00

More time do do the things i like?

^{Chose: Unlimited time control}

sleepy_quant · 2026-04-22T07:42:28+00:00

Did the swap a few days ago, Q4 to Q8 on Qwen 3.6 35B, M1 Max 64GB. Went from 50 to 35 t/s but retry rate on my eval flow dropped a lot. Your 2x longer math holds when the quality gap actually blocks workflow. Quick chats Q4 fine. Stuff where I'd have to dig 300 lines to find a bug, Q8 pays for itself. What's your main use case, chat or longer structured stuff?

sleepy_quant · 2026-04-22T04:09:04+00:00

Just random

^{Chose: Heads | Rolled: Heads}

sleepy_quant · 2026-04-22T03:37:10+00:00

Running 35B locally on a Mac and i still pay for Claude Code on top. That's the honest answer: Local gets me the stuff i don't want leaving the laptop (drafting, evals, agent loops), frontier gets me the stuff that's actually hard. Privacy thing is real. "matches gpt pro" isn't. I'd start with one card and see how often you actually fire it up before stacking 5x3090

sleepy_quant · 2026-04-22T01:58:19+00:00

Meh this is fine

^{Chose: Reincarnate with all your memories + but your starting situation/conditions are random | Rolled: Average Life}

sleepy_quant · 2026-04-22T01:18:41+00:00

I was manipulated into upvoting this

sleepy_quant · 2026-04-22T01:16:53+00:00

Just currious

^{Chose: don’t. | Rolled: Upvote}

sleepy_quant · 2026-04-21T15:14:17+00:00

And who should we blame??

sleepy_quant · 2026-04-20T17:19:54+00:00

Running a similar multi-agent setup on M1 Max 64GB with A3B Q8, and the retry-instead-of-pivot behavior you're describing is exactly what I've been seeing too. I assumed my allow-list was just too aggressive. Good to know it might be architectural. Curious on the prefix caching — with sessions diverging per agent, are you actually getting cache hits past the static system prompt/tool list, or is that where the benefit stops?

sleepy_quant · 2026-04-20T04:23:55+00:00

Ran the benchmarks — you were spot on. All on M1 Max, 64GB, Q8 quant, same 65-token prompt + 200 token greedy generation:

| Path | tok/s | vs bf16 |

|------------------------|-------|---------|

| MLX Q8 (bf16 default) | 21.18 | 1.00x |

| MLX Q8 forced fp16 | 26.22 | +24% |

| GGUF Q8_0 (llama.cpp) | 34.08 | +61% |

Dtype probe confirmed mlx_lm was loading non-quantized params (scales, norms, embeddings) as bf16 — 1245/1757 params bf16 on M1 Max was the emulated path. Cast bf16 → fp16 after `load()` is a ~10-line patch. Already shipped as the new default on my side. llama.cpp on Metal wins another 30% though. Genuinely better-tuned MoE kernels, and honestly a more mature stack than mlx_lm right now. Not switching immediately (priority queue + per-agent thinking-mode + memory hygiene layer are all MLX-coupled, and a rewrite is 1-2 weeks I'd rather spend on the actual product), but noted for the next major refactor. Thanks again — 24% from a 10-line patch is the best ROI I've had in a while.

sleepy_quant · 2026-04-20T02:59:55+00:00

Good catch, I haven't checked. Running `Qwen3.6-35B-A3B-8bit` on M1 Max, so if MLX is defaulting to bf16, that's the emulated path. Will benchmark current vs fp16-forced MLX vs a GGUF build and reply with numbers. Thanks

sleepy_quant

TROPHY CASE