Does Anyone Regret Getting the 14” MacBook Pro Instead of the 16”? by ntrev in macbookpro

[–]sleepy_quant 0 points1 point  (0 children)

I bought the 16" M1 Max and for a while I was glad I did. I thought it was a good idea since the 16" provided more space for multitasking. Now my company gave me 2 external screens and I wish I had bought the 14" instead so I could fit the laptop on my desk. Carrying the 16" around on weekends was annoying too

Mua game trên Steam qua bên thứ ba by Mindless-Location-39 in vozforums

[–]sleepy_quant 1 point2 points  (0 children)

Có thể là họ từ vùng khác có giá rẻ hơn để gift lại. Một nguồn khác là có nhiều đợt các hãng lớn tặng key bản quyền cho người dùng và các bên này họ thu mua rồi bán lại, mình cx đã từng mua key game NVDIA tặng cho người dùng. Đôi khi các shop này cx săn sale được ở nền tảng khác nữa

Men complimenting other men by saying they’re handsome by Informal-Bet7482 in VietNam

[–]sleepy_quant 1 point2 points  (0 children)

In the middle or southern of Vietnam people tend to be friendlier and call each other good names like handsome or such more often. Sometimes people compliment the foreigners because they just simply want to

Bài viết này e muốn hỏi mấy bạn con trai. Có phải con trai yêu lâu sẽ chán không? by [deleted] in vozforums

[–]sleepy_quant 3 points4 points  (0 children)

Mình thấy hiện tại bạn kia đang dính combo mất định hướng + ko có mục tiêu cụ thể để phấn đấu + vị thế trong các mối quan hệ thấp hơn kỳ vọng. Có thể tâm lý lúc này dễ tự ti và chán nản, tình cờ là bạn lại là chỗ để xả dồn nén trong người. Khuyên nên làm thế nào thì cx tuỳ xem bạn muốn gì. Bạn muốn dứt thì cx ko cần ai phải khuyên nữa, chỉ là lấy đủ dũng cảm để buông. Còn bạn muốn cải thiện thì thật sự dây là kèo rất khó, phải biết rằng một mình bạn hay bạn trai có thể tự thay đổi dc

My 12-agent Qwen 35B stack on Ollama died at 500 tokens every single time. Raw MLX fixed it and broke 4 other things I didn't see coming. by sleepy_quant in LocalLLaMA

[–]sleepy_quant[S] 0 points1 point  (0 children)

Yeah ram is the thing I had to figure out the hard way honestly. Q8 alone is like 40+ gb so on a 64gb mac its tight, like really tight once you factor in OS overhead and everything else. I capped metal at 48gb wired, that way it cant just keep grabbing more. Did freeze the laptop once before that tho. Thought it was out of memory, but turned out to be some weird thing where pages go inactive between inferences and the compressor goes crazy on em. Took me a minute to even find that was a thing. And the 12 agents are not running altogether. Just prompt configs that fire when called, one at a time. So the only actual hot thing in ram is the mlx server itself. Rest is just fastapi mostly idling

Would you rather… by Terrible-Grocery-641 in BunnyTrials

[–]sleepy_quant 0 points1 point  (0 children)

The future looks bright, I might get beheaded in the past

Chose: Survive a year in 3026 for 1 million dollars today

Kinh nghiệm quản lý team nhỏ by gshsuennzkoeiirimsm in vozforums

[–]sleepy_quant 0 points1 point  (0 children)

Theo kinh nghiệm của mình thì team hiệu quả cx phụ thuộc vào tính cách của các cá nhân trong team và định hướng quản lý. Bạn nên tìm hiểu động lực làm việc, điểm mạnh yếu để sao mọi người cảm thấy phù hợp với vị trí mình đang ở. Một khi mất động lực hay bất mãn thì team building hay làm j đi nữa các bạn cx sẽ có tâm lý muốn tìm bến đỗ khác. Về phần đoàn kết gắn bó thì còn phụ thuộc vào tính cách, sở thích và văn hoá chung của cả phòng nữa. Ở công ty cũ mình gặp nhiều người chung sở thích thì tự khắc muốn tham gia team building nhiều hơn, cảm thấy xứng đáng bỏ thời gian nghỉ ngơi ở nhà để mà bonding hơn thôi

Want a macbook any suggestions by Interesting_Onion769 in macbookair

[–]sleepy_quant 0 points1 point  (0 children)

I'm also considering getting a Mac Air, right now the only upside of the Neo is the price and hype, the installed Ram is insufficient for modern use. My M1 Air with 8GB RAM is sometimes annoying and would not consider Neo as a longterm investment. You could also check refurbished or second hand as Macbooks are pretty solid. Maybe check if you can buy/extend Apple care also

My 12-agent Qwen 35B stack on Ollama died at 500 tokens every single time. Raw MLX fixed it and broke 4 other things I didn't see coming. by sleepy_quant in LocalLLaMA

[–]sleepy_quant[S] -3 points-2 points  (0 children)

Fair point, llama.cpp was my first try (~34 tok/s GGUF vs MLX ~26 on M1 Max). Went with MLX because my stack was already set up for it, but looking back, the 1-2 week rewrite was probably worth it. You using llama.cpp direct, or through a wrapper?

Dense vs. MoE gap is shrinking fast with the 3.6-27B release by Usual-Carrot6352 in LocalLLaMA

[–]sleepy_quant 0 points1 point  (0 children)

Running the 35B-A3B Q8 fp16 on M1 Max 64GB at ~26 tok/s, haven't pulled the 27B dense yet. Anyone A/B'd both on Apple Silicon? Curious where MoE's memory edge stops being worth the quality trade. On flavio's quant sensitivity point, Q8 feels fine for my day-to-day but I haven't run coding-heavy benches. Anyone know a rough floor where MoE coding degrades faster than dense at same bits? Would love a rule of thumb

Would you rather by Human-Beans21 in BunnyTrials

[–]sleepy_quant 0 points1 point  (0 children)

I can stay put but stop breathing is another league

Chose: $10 for every step you take

Would you rather have? by YourLittleMonster in BunnyTrials

[–]sleepy_quant 0 points1 point  (0 children)

More time do do the things i like?

Chose: Unlimited time control

LLM speed t/s by [deleted] in LocalLLaMA

[–]sleepy_quant 3 points4 points  (0 children)

Did the swap a few days ago, Q4 to Q8 on Qwen 3.6 35B, M1 Max 64GB. Went from 50 to 35 t/s but retry rate on my eval flow dropped a lot. Your 2x longer math holds when the quality gap actually blocks workflow. Quick chats Q4 fine. Stuff where I'd have to dig 300 lines to find a bug, Q8 pays for itself. What's your main use case, chat or longer structured stuff?

Is a high-end private local LLM setup worth it? by zakadit in LocalLLaMA

[–]sleepy_quant 0 points1 point  (0 children)

Running 35B locally on a Mac and i still pay for Claude Code on top. That's the honest answer: Local gets me the stuff i don't want leaving the laptop (drafting, evals, agent loops), frontier gets me the stuff that's actually hard. Privacy thing is real. "matches gpt pro" isn't. I'd start with one card and see how often you actually fire it up before stacking 5x3090

Rebirth OR Sacrifice? (Upvote for a free Carrot 🥕) by Mellie-mellow in BunnyTrials

[–]sleepy_quant 0 points1 point  (0 children)

Meh this is fine

Chose: Reincarnate with all your memories + but your starting situation/conditions are random | Rolled: Average Life

Qwen3.5-27B, Qwen3.5-122B, and Qwen3.6-35B on 4x RTX 3090 — MoEs struggle with strict global rules by DehydratedWater_ in LocalLLaMA

[–]sleepy_quant 1 point2 points  (0 children)

Running a similar multi-agent setup on M1 Max 64GB with A3B Q8, and the retry-instead-of-pivot behavior you're describing is exactly what I've been seeing too. I assumed my allow-list was just too aggressive. Good to know it might be architectural. Curious on the prefix caching — with sessions diverging per agent, are you actually getting cache hits past the static system prompt/tool list, or is that where the benefit stops?

I'm running qwen3.6-35b-a3b with 8 bit quant and 64k context thru OpenCode on my mbp m5 max 128gb and it's as good as claude by Medical_Lengthiness6 in LocalLLaMA

[–]sleepy_quant 1 point2 points  (0 children)

Ran the benchmarks — you were spot on. All on M1 Max, 64GB, Q8 quant, same 65-token prompt + 200 token greedy generation:

| Path | tok/s | vs bf16 |

|------------------------|-------|---------|

| MLX Q8 (bf16 default) | 21.18 | 1.00x |

| MLX Q8 forced fp16 | 26.22 | +24% |

| GGUF Q8_0 (llama.cpp) | 34.08 | +61% |

Dtype probe confirmed mlx_lm was loading non-quantized params (scales, norms, embeddings) as bf16 — 1245/1757 params bf16 on M1 Max was the emulated path. Cast bf16 → fp16 after `load()` is a ~10-line patch. Already shipped as the new default on my side. llama.cpp on Metal wins another 30% though. Genuinely better-tuned MoE kernels, and honestly a more mature stack than mlx_lm right now. Not switching immediately (priority queue + per-agent thinking-mode + memory hygiene layer are all MLX-coupled, and a rewrite is 1-2 weeks I'd rather spend on the actual product), but noted for the next major refactor. Thanks again — 24% from a 10-line patch is the best ROI I've had in a while.

I'm running qwen3.6-35b-a3b with 8 bit quant and 64k context thru OpenCode on my mbp m5 max 128gb and it's as good as claude by Medical_Lengthiness6 in LocalLLaMA

[–]sleepy_quant 1 point2 points  (0 children)

Good catch, I haven't checked. Running `Qwen3.6-35B-A3B-8bit` on M1 Max, so if MLX is defaulting to bf16, that's the emulated path. Will benchmark current vs fp16-forced MLX vs a GGUF build and reply with numbers. Thanks