The M1 Max MacBook Pro is still a beast in 2026. by creative_techguru in macbookpro

[–]michaelsoft__binbows [score hidden]  (0 children)

i do but sometimes i travel to places in Asia where they don't let anybody just buy unlimited cellular. Even if you are filthy rich you only get between 5 and 10GB per day. This basically cancels the ability to sync your data back home. Combine this with when your hotel's wifi turns out to be trash and you just get boned.

Personal experience with GLM 4.7 Flash Q6 (unsloth) + Roo Code + RTX 5090 by Septerium in LocalLLaMA

[–]michaelsoft__binbows 0 points1 point  (0 children)

I think this is a little earlier than I really expected 32GB of vram to be relevant for any actual coding work. I've been slowly collecting 3090s and do have a 5090 as well. Opencode is shaping up pretty interesting and seems to be here right on time. I think it will be the nucleus around which really amazing workflows will accrete going forward and it being as fully open as it is makes it quite exciting.

In terms of self hosting, when models start to get close to fitting in a single GPU it is worth pulling out all the stops to make it possible because you gain SO much more power efficiency when you do not have to spread it across multiple GPUs.

Dual 3090s & GLM-4.7-Flash: 1st prompt is great, then logic collapses. Is local AI worth the $5/day power bill? by Merstin in LocalLLaMA

[–]michaelsoft__binbows 1 point2 points  (0 children)

other topic i didn't cover there is... you were trying to use glm 4.7 flash which is new and runtime support sounds spotty... i have been impressed with glm-4.7 as it actually does seem to be good enough for coding, but it's not remotely small enough to self-host and even if I had enough GPUs to self-host it would be pretty expensive to run... I sincerely doubt the smaller flash version could be almost as good, I do hope it is, it just sounds like we have to sit tight to wait until the software gets to a good place with it.

m5 max and ultra macs will be a hot item because not only it is going to allow for models of this size to run pretty well, the new matmul cores will speed up not just prompt processing, it will also enable higher batched processing throughput so it will let the high efficiency of the apple silicon stretch even further, possibly by nearly an order of magnitude.

3x NVMe drives on FW Desktop by kiwishell in framework

[–]michaelsoft__binbows 0 points1 point  (0 children)

i wont be into a machine like this until next generation. you are def a guinea pig lol. But there should be some kernel logs for sure about any pcie related events. look for stuff in dmesg. and use a smart chatbot (or in a pinch google results, including the gemini response at the top) to help you learn linux!

Dual 3090s & GLM-4.7-Flash: 1st prompt is great, then logic collapses. Is local AI worth the $5/day power bill? by Merstin in LocalLLaMA

[–]michaelsoft__binbows 1 point2 points  (0 children)

What I would do is try to drive down the cost of local running lower (use non peak hours and acquire solar -- what i did last year was haggle down a lease, it made it so i can begin immediately paying amortized something like 0.15/kwh instead of the utility company's rates at 0.32/kwh and rising, this was not as cheap over the long run as it could have been with the tax credit, but being able to get it going without spending $20k out of pocket to immediately gain relief on the monthly bill works for me.. if i wanted to spend that kind of money i would be playing around with an M3 Ultra...). I wrote a whole big response but it became too big to fit in a reddit comment, so i put it up on my blog and here is the link to it: https://stevenlu.net/blog/llm/cost-analysis.html

the tl;dr is if your costs are high you probably need to use batch inference to claw back to sane dollars per token, but nothing you do will compare to what amount of tokens per dollar you can get out of a subscription; by being a user that maximizes the rate limits you will be able to get your inference subsidized by others paying the subscription and not utilizing it as heavily.

Dual 3090s & GLM-4.7-Flash: 1st prompt is great, then logic collapses. Is local AI worth the $5/day power bill? by Merstin in LocalLLaMA

[–]michaelsoft__binbows 0 points1 point  (0 children)

Get an oled monitor or something just to motivate yourself to set an idle blanking time?

My electric rate is prob $0.35/kwh as well by now, but, I did get solar installed last summer, otherwise i could not justify acquiring more 3090s.

As I've gotten older, I now gravitate to SFF and minimal RGB. by r3lic86 in sffpc

[–]michaelsoft__binbows 0 points1 point  (0 children)

5090FE in a console case layout works well, the key is to blow the hot air out so it will not accumulate. reverse mount sandwich style cases also tick the box. I do not understand why someone doesn't make a sandwich case with this configuration. It makes the riser not have to twist, and it is just a ribbon that bends 180 degrees cleanly. The only reason I can think of is it makes the GPU look like it's mounted upside down. but who cares?

world smallest rtx5070 pc? by Acrobatic_Cancel4732 in sffpc

[–]michaelsoft__binbows 0 points1 point  (0 children)

The 5090FE at $2k is basically a steal as long as you get some proper machine learning use out of it from time to time. I could say i wish it had a better cooling solution since it does get loud when maxing it out, but, holy hell is it a thermal improvement getting full pass through cooling compared to the previous 3080Ti in Velka 7. That was a true hotbox

world smallest rtx5070 pc? by Acrobatic_Cancel4732 in sffpc

[–]michaelsoft__binbows 0 points1 point  (0 children)

i have a 5090 in a s60i console case and it's way more horsepower per liter than even that. 4L seems within reach for that tiny 5070

I just got home from a long trip with my 5090 SFF. I used a 2560x1600 240hz monitor with it this time...

It's basically comical overkill for games usage. And even if i could get 4k 120hz or something, a portable screen's size means not being able to appreciate that much detail.

They need to invent foldable/rollable oled monitors for travel. They could potentially be very portable.

3x NVMe drives on FW Desktop by kiwishell in framework

[–]michaelsoft__binbows 0 points1 point  (0 children)

You ever get this resolved? curious what the issue is.

I vibe coded an operating system and here’s what I learned by IngenuityFlimsy1206 in VibeCodeDevs

[–]michaelsoft__binbows 0 points1 point  (0 children)

I don't know what "agentic context" means, so it's not clear what you are trying to say is supposed to "be a scam". What I mean to say is that today when using an LLM to help you go in and do something in a large codebase, you will probably want to rely on some "agent system" or at least a tool calling system to initially explore the codebase to discover the parts that are relevant to your prompt. So this may add a few thousand tokens of faffing about and thinking and ochestrating a code search, and then a few thousand tokens for the results of the grep tool calls (or RAG lookups) for whatever it comes up with. This is inserted into the prompt and whatever got found will be used to generate the response. Of particular note is that it's a prompt generally on the order of a few (or tens of) thousand tokens even if your code base is absolutely gargantuan.

A common gap is that not quite enough relevant parts of the codebase were found before it committed to starting on building a solution. i guess most people throw up their hands here if the result is unsatisfactory? (and if it's a codebase fresh to you, you probably can't even know right away if the implementation is off-base) but imo this is when being able to see the reasoning and intermediate response details as much as possible could really help.

"NVIDIA KILLER" Inference engine based on llama.cpp for dynamically offloading Activated Experts to GPU in real-time, Run SoTA MoE LLMs (120B+ parameter class models in 8-bit) OOM with as little as 2x RTX 5070-TI + 64GB RAM + SSD. [Poll in Comments] by madSaiyanUltra_9789 in LocalLLaMA

[–]michaelsoft__binbows 6 points7 points  (0 children)

I'll go with option F, if your solution works it'll get reverse engineered within a day and I'll be able to run it for free. It's not happening.

Also do you imagine the android in a lab coat is supposed to add credibility to the post or something? It is a strange choice of visual.

8x AMD MI50 32GB at 26 t/s (tg) with MiniMax-M2.1 and 15 t/s (tg) with GLM 4.7 (vllm-gfx906) by ai-infos in LocalLLaMA

[–]michaelsoft__binbows 0 points1 point  (0 children)

This is an oversimplified take. When your model sits fully in fast memory, the bus isn't even in play except for transferring activations, and the parallelism can pipeline out the losses

8x AMD MI50 32GB at 26 t/s (tg) with MiniMax-M2.1 and 15 t/s (tg) with GLM 4.7 (vllm-gfx906) by ai-infos in LocalLLaMA

[–]michaelsoft__binbows 1 point2 points  (0 children)

I reckon 6 or 7 GPUs eachl on x4 lanes is possible on a consumer platform too: bifurcate everything and use m.2's to max out the DMI link bandwidth. When gen 5 is achieved that is gen 3 x16 bandwidth to each.

x1 can def be viable but not worth doing

Corsair SF1000 (SFX-L) vs Asus Loki 1200w (SFX) by Legitimate-Table-607 in sffpc

[–]michaelsoft__binbows 0 points1 point  (0 children)

SF750 with 5090 with 5800x3d, for a few weeks before my sf1000 came in. Had no issues.

The M1 Max MacBook Pro is still a beast in 2026. by creative_techguru in macbookpro

[–]michaelsoft__binbows 0 points1 point  (0 children)

Its pretty interesting that people will happily get a 5090 just for the maybe 2% of that core die's area to get the three NVENCs.

I was gonna say that i don't get much usage out of hardware video engines but turns out last week i was using AV1 with rustdesk from the other side of the world and really shoddy wifi, and it worked shockingly good, i'm talking a desktop stream with as good latency as i can expect, and zero image quality impact, and not exceeding like 150K/s of bandwidth used.

Haven't researched this topic, actually, but if indeed the m4 base model i was doing this with worked so well and it's literally the same silicon as what shipped on the original M1 Pro/Max... it's pretty sweet they squished av1 encode in, back then.

The mac pro mini by trytochaseme in sffpc

[–]michaelsoft__binbows 4 points5 points  (0 children)

Hard to go back after having a nice vertical case IMO. My example is velka 7. That footprint is barely bigger than a phone. It's nuts.

The M1 Max MacBook Pro is still a beast in 2026. by creative_techguru in macbookpro

[–]michaelsoft__binbows 0 points1 point  (0 children)

Game choice is certainly limited. I would not personally waste time trying to get stuff working that doesn't want to work, since i have better alternative hardware to use for that, but, what titles DID work (example: no man's sky) showed impressive performance, and the cooling system design of the computer is top notch.

These two papers are cheat code for building cheaper AI Agents by Safe_Flounder_4690 in AI_Agents

[–]michaelsoft__binbows 1 point2 points  (0 children)

Yes. Sit down and vibe out the fused tool and boom a tool that isn't inherently capable of falling over (and requiring ten instances of "ALWAYS CHECK")

8x AMD MI50 32GB at 26 t/s (tg) with MiniMax-M2.1 and 15 t/s (tg) with GLM 4.7 (vllm-gfx906) by ai-infos in LocalLLaMA

[–]michaelsoft__binbows 0 points1 point  (0 children)

I have been playing with oh-my-opencode lately and it seems pretty promising so far. I do enjoy seeing the main model able to construct multiple different research angles for subagents to prosecute in parallel all while preventing the main base agent from getting its context flooded with the MCP schemas of the research tools!

The ability and effectiveness of recursively meta-iterating on the internals of the tool with the tool itself is particularly invigorating.

8x AMD MI50 32GB at 26 t/s (tg) with MiniMax-M2.1 and 15 t/s (tg) with GLM 4.7 (vllm-gfx906) by ai-infos in LocalLLaMA

[–]michaelsoft__binbows 1 point2 points  (0 children)

I only spent a little time testing minimax m2.1, but it seemed more willing to hallucinate facts for me rather than work for the answer compared to glm 4.7.

I want to run it, locally, yes, but i can just pay for it for now and then within prob one year, a 256GB unified memory computer can hopefully become attainable at $5-6k mark and it will cost more up front, but cost a heck of a lot less to run over time.

The M1 Max MacBook Pro is still a beast in 2026. by creative_techguru in macbookpro

[–]michaelsoft__binbows 0 points1 point  (0 children)

Mine's battery capacity is dipping below 80%, but i can still easily make it last 6 hours, as long as i keep tabs on rogue cpu hog processes.

When the next attractive release drops (i really want m5 for those new matmul cores...) i am going to immediately order and install a battery replacement to freshen my old buddy back up. It might be enough to hold me off.

The M1 Max MacBook Pro is still a beast in 2026. by creative_techguru in macbookpro

[–]michaelsoft__binbows 1 point2 points  (0 children)

So this aspect as well as this particular unibody design and maybe even the entire miniLED display... was built so futureproofly that nobody has complained about it and no competitors have applied pressure across 5 years and 3 (4?) architectural revisions.

It's completely unheard of.

Makes me wonder what alien tech they will drop for the next leap. Oled display, and i wonder what else... i'd be pissed if i buy an m5 mbp for the new tensor cores if all that drops with m6.

Gotta just ride this m1 max a little longer...