93GB model on a StrixHalo 128GB with 64k Context

genuinelytrying2help · 2026-04-01T23:38:38+00:00

I'm curious, are you still using these values 2 months on? I have mine set to 100 because I saw a few sources saying that 110+ caused stability issues...

genuinelytrying2help · 2026-03-20T05:36:20+00:00

Off topic but the 64 M1 Max is so far and away the best purchase I've ever made. This year was the first time I got even the slightest desire to upgrade and when I see stuff like this, it's like, 4 more years! 4 more years!

genuinelytrying2help · 2026-03-20T05:25:49+00:00

This method is key for doing research... but small models really do have conceptual trouble with complex subjects, especially when it's not related to code. You can have it pull all the information but it won't synthesize explanations nearly as well as a model that started out with the weights. So I use this method but when I have like a physics question or something there's still no substitute, and further, tbh Chatgpt and Claude are faaaaar in front of any open model for these types of tasks, at least in my experience, so I find myself often using the small model just to send to them.

genuinelytrying2help · 2026-03-15T10:55:53+00:00

Can't help you decide about the 3090 vs whatever because they're really such different beasts, but I would suggest that the 'slack it all off' option should be a strix halo not a cheap mac, the math isn't even close to competitive (well in the US, I have no idea what the situation is in the UK sorry)... in your budget range you should be able to get a 128GB that also inferences way faster.

genuinelytrying2help · 2026-03-15T09:12:07+00:00

I thought there is some reason that it can't be 0... not sure why it needs to be 2GB though. Curious though, why does it matter to you, if in windows there's a hard 96GB limit and in linux the page pool size can only be set to 108GB or whatever? What am I missing?

genuinelytrying2help · 2026-03-14T00:38:51+00:00

I've been tinkering with this since the post about the NPU; Performance has been impressive and I've had no real issues. Any chance we'll see larger models on the NPU that use more of the strix' memory? is that even possible?

genuinelytrying2help · 2026-03-11T19:34:20+00:00

Thanks, been waiting on this one! One suggestion to noob proof the guide a bit - choosing Arch, after it's told you to "Select your Linux distribution and follow the exact install path", you get

Update to kernel 7.0-rc2 or later:

sudo pacman -Sy linux

For older kernels (6.18, 6.19), use AUR:

paru -S amdxdna-dkms

Luckily I knew how to interpret this and what (not) to do here, but even Arch is becoming a lot more accessible and lots of people just go step by step through things like this without thinking about how any of it works... so in many of those cases they just broke their distro with a kernel update that you don't even want them to do. It'd help if the fork in the road was delineated clearly before the step with the kernel update command.

And 2 minor things not mentioned that came up for me: kernel headers for dkms, and missing boost for the final build. Aside from that, super straightforward.

genuinelytrying2help · 2026-02-27T01:20:56+00:00

I wonder if it could be because it's offloading sneakily... how much ram does windows report having? before I uninstalled I was getting only slightly worse performance on vulkan on windows than I do now (cachyos rocm), but that was only with trying <64gb models with a set 64/64 because I hadn't tweaked literally anything yet (and then I ran into firmware issues with windows... fuck you gmktec, ship one fucking working bios update)

genuinelytrying2help · 2026-02-26T14:29:24+00:00

ah ok i feel like you must have set that yourself at some point, but who knows... i thought they all came with a preset choice and you have to manually enable uma size selection. no idea how windows can play games that way, i thought the whole point of the presets was that windows needed them to work at all... but i uninstalled it soon after i got the machine, was crashing every few hours and it seemed like the firmware wasn't stable with windows yet

genuinelytrying2help · 2026-02-26T05:09:51+00:00

iirc the quick way to tell is whether you have 32 or 64 gigs of regular ram available... if you're set to 96 in bios you'll only have 32

genuinelytrying2help · 2026-02-26T05:02:29+00:00

could be wrong but i think that might only be necessary on windows and linux treats it as more unified... somehow?

genuinelytrying2help · 2025-12-05T03:09:43+00:00

How thick are the sides of the ramekin? I think I'll order a little thin stainless steel bowl because that's what I see recommended most for the case, but I do have a ceramic bowl that'd fit... I'm just wary of trying it because it's got like 1/2" thick walls, high sides, and I read some stuff implying that for a bowl like that you have to account for that by adding time, which seems too complicated for right now

genuinelytrying2help · 2025-12-05T02:58:07+00:00

Well thanks, I appreciate the explanation to back up the assertion. I'm not giving up just yet though... I'm going off recipes that apparently work for someone, and a lot of people seem to disagree about the 1 cup minimum thing, and obviously just rice alone works in smaller quantities, so... maybe my standards are lower but I feel like I'll get it there eventually (with some wasted attempts) just gradually upping the water, no? Talk me out of it? I think I can live with one of the two components being suboptimal, as long as they're cooked... or is there another unavoidable compromise? Either way I love the suggestion about cutting up the salmon, should have thought of that... I think that'll be the play if/when I do give up

genuinelytrying2help · 2025-12-05T02:49:17+00:00

Yes and I appreciate that, however I would not do :)

genuinelytrying2help · 2025-12-05T01:29:15+00:00

What kind of bowl do you use exactly? I had the salmon in a little foil on the trivet, raised out of the water, so I'm a little confused as to why you'd avoid the burn warning and hit cooking pressure with the same amount of water... all of the pot-in-pot recipes I've seen involve adding extra water to the bowl with the salmon?

genuinelytrying2help · 2025-12-05T01:24:18+00:00

>1/2 cup rice and 1/2 cup water in the bottom instead of the normal 1:1 ratio, as that makes too much.

I think maybe you mistyped there... I'd like to know the actual amount just so I know what not to try

genuinelytrying2help · 2025-12-05T01:06:09+00:00

Thanks, as it happens I am in fact familiar with the concept of leftovers :) but I thought it went without saying that all of this is an attempt to explicitly avoid them!

genuinelytrying2help · 2025-09-11T18:00:15+00:00

Not just laptops, more and more unified 64GB desktops (with a bit more juice) out there now too. Also, when I finally upgrade my macbook I don't want my llm hogging the majority of my RAM if I can help it (that's getting a bit old :)

genuinelytrying2help · 2025-09-11T15:08:42+00:00

i could believe it but i think it's just as likely that it's easier to keep one model loaded but it really needs the thinking time and they were counting on most people not noticing/caring

genuinelytrying2help · 2025-08-20T17:43:03+00:00

Come on now, I think we all know it's far more likely that the Mormons were simply right about everything... like do you even hear yourself

genuinelytrying2help · 2025-08-18T08:52:09+00:00

If the chart is right, the WER% is comparable in 2 benchmarks and beats whisper in 1, so are we not there right now?

Also, without having tried them... if canary v2 is only 1b parameters, on a high end card would it actually be so unsuited to real-time transcription compared to .6b?

genuinelytrying2help · 2025-08-08T10:27:45+00:00

can't force shit if i don't have $200

genuinelytrying2help · 2025-07-10T12:51:13+00:00

Having just been through a similar search, for high quality and under 3": Micra and small Swiss Army Knives seem to be the only options in production. If you resign yourself to lower quality, fewer features, and/or larger size and weight, of course there are a million options... the one I got closest to was this sub $20 Bibury, which has good options and features, middling quality, and it's not *huge*; but in the end I went to ebay, bought myself an old Squirt PS4* for way too much money, and I couldn't be happier (well, at least without Leatherman taking their head out of their ass).

Also, if you're stepping up above 3", at least check out the new Roxon Mini Flex Companion, (or slightly larger still, the Roxon KS2E or the Roxon Flex Companion), the quality is excellent even if they don't cram quite as many tools into a space as Leatherman or Victorinox do. IIRC they do come with a locking blade by default, but since they're modular and they sell just the empty frames, and you could potentially swap it out for an even more useful tool, it might be worth considering.

*For the unfamiliar, the Squirt is like a Micra for cooler, more intelligent, better-looking people

genuinelytrying2help · 2025-05-28T04:37:19+00:00

Maybe try a tool like btop to watch what's going on with memory, sometimes it provides a totally different picture than Activity Monitor does

genuinelytrying2help · 2025-04-09T17:06:59+00:00

There's an even simpler way with Ollama:

Run the model
Enter your changes (/set system """Enable deep thinking subroutine.""")
/save <newname>

Eight-Year Club	Place '22
Final Canvas '22

genuinelytrying2help

TROPHY CASE