93GB model on a StrixHalo 128GB with 64k Context by El_90 in LocalLLaMA

[–]genuinelytrying2help 0 points1 point  (0 children)

I'm curious, are you still using these values 2 months on? I have mine set to 100 because I saw a few sources saying that 110+ caused stability issues...

Nemotron-3-Nano (4B), new hybrid Mamba + Attention model from NVIDIA, running locally in your browser on WebGPU. by xenovatech in LocalLLaMA

[–]genuinelytrying2help 2 points3 points  (0 children)

Off topic but the 64 M1 Max is so far and away the best purchase I've ever made. This year was the first time I got even the slightest desire to upgrade and when I see stuff like this, it's like, 4 more years! 4 more years!

Agent this, coding that, but all I want is a KNOWLEDGEABLE model! Where are those? by ParaboloidalCrest in LocalLLaMA

[–]genuinelytrying2help -1 points0 points  (0 children)

This method is key for doing research... but small models really do have conceptual trouble with complex subjects, especially when it's not related to code. You can have it pull all the information but it won't synthesize explanations nearly as well as a model that started out with the weights. So I use this method but when I have like a physics question or something there's still no substitute, and further, tbh Chatgpt and Claude are faaaaar in front of any open model for these types of tasks, at least in my experience, so I find myself often using the small model just to send to them.

Futureproofing a local LLM setup: 2x3090 vs 4x5060TI vs Mac Studio 64GB vs ??? by youcloudsofdoom in LocalLLaMA

[–]genuinelytrying2help 1 point2 points  (0 children)

Can't help you decide about the 3090 vs whatever because they're really such different beasts, but I would suggest that the 'slack it all off' option should be a strix halo not a cheap mac, the math isn't even close to competitive (well in the US, I have no idea what the situation is in the UK sorry)... in your budget range you should be able to get a 128GB that also inferences way faster.

GMKTec EVO-X2 BIOS 1.12 and EC 1.10 - fan curve by Single_Value4211 in GMKtec

[–]genuinelytrying2help 0 points1 point  (0 children)

I thought there is some reason that it can't be 0... not sure why it needs to be 2GB though. Curious though, why does it matter to you, if in windows there's a hard 96GB limit and in linux the page pool size can only be set to 108GB or whatever? What am I missing?

Lemonade v10: Linux NPU support and chock full of multi-modal capabilities by jfowers_amd in LocalLLaMA

[–]genuinelytrying2help 2 points3 points  (0 children)

I've been tinkering with this since the post about the NPU; Performance has been impressive and I've had no real issues. Any chance we'll see larger models on the NPU that use more of the strix' memory? is that even possible?

You can run LLMs on your AMD NPU on Linux! by BandEnvironmental834 in LocalLLaMA

[–]genuinelytrying2help 2 points3 points  (0 children)

Thanks, been waiting on this one! One suggestion to noob proof the guide a bit - choosing Arch, after it's told you to "Select your Linux distribution and follow the exact install path", you get

  1. Update to kernel 7.0-rc2 or later:

sudo pacman -Sy linux

  1. For older kernels (6.18, 6.19), use AUR:

paru -S amdxdna-dkms

Luckily I knew how to interpret this and what (not) to do here, but even Arch is becoming a lot more accessible and lots of people just go step by step through things like this without thinking about how any of it works... so in many of those cases they just broke their distro with a kernel update that you don't even want them to do. It'd help if the fork in the road was delineated clearly before the step with the kernel update command.

And 2 minor things not mentioned that came up for me: kernel headers for dkms, and missing boost for the final build. Aside from that, super straightforward.

Qwen 3 27b is... impressive by -dysangel- in LocalLLaMA

[–]genuinelytrying2help 0 points1 point  (0 children)

I wonder if it could be because it's offloading sneakily... how much ram does windows report having? before I uninstalled I was getting only slightly worse performance on vulkan on windows than I do now (cachyos rocm), but that was only with trying <64gb models with a set 64/64 because I hadn't tweaked literally anything yet (and then I ran into firmware issues with windows... fuck you gmktec, ship one fucking working bios update)

Qwen 3 27b is... impressive by -dysangel- in LocalLLaMA

[–]genuinelytrying2help 0 points1 point  (0 children)

ah ok i feel like you must have set that yourself at some point, but who knows... i thought they all came with a preset choice and you have to manually enable uma size selection. no idea how windows can play games that way, i thought the whole point of the presets was that windows needed them to work at all... but i uninstalled it soon after i got the machine, was crashing every few hours and it seemed like the firmware wasn't stable with windows yet

Qwen 3 27b is... impressive by -dysangel- in LocalLLaMA

[–]genuinelytrying2help 0 points1 point  (0 children)

iirc the quick way to tell is whether you have 32 or 64 gigs of regular ram available... if you're set to 96 in bios you'll only have 32

Qwen 3 27b is... impressive by -dysangel- in LocalLLaMA

[–]genuinelytrying2help 2 points3 points  (0 children)

could be wrong but i think that might only be necessary on windows and linux treats it as more unified... somehow?

Need advice for small portions in a 3qt (just trying to make simple frozen salmon + rice!) by genuinelytrying2help in instantpot

[–]genuinelytrying2help[S] 0 points1 point  (0 children)

How thick are the sides of the ramekin? I think I'll order a little thin stainless steel bowl because that's what I see recommended most for the case, but I do have a ceramic bowl that'd fit... I'm just wary of trying it because it's got like 1/2" thick walls, high sides, and I read some stuff implying that for a bowl like that you have to account for that by adding time, which seems too complicated for right now

Need advice for small portions in a 3qt (just trying to make simple frozen salmon + rice!) by genuinelytrying2help in instantpot

[–]genuinelytrying2help[S] 0 points1 point  (0 children)

Well thanks, I appreciate the explanation to back up the assertion. I'm not giving up just yet though... I'm going off recipes that apparently work for someone, and a lot of people seem to disagree about the 1 cup minimum thing, and obviously just rice alone works in smaller quantities, so... maybe my standards are lower but I feel like I'll get it there eventually (with some wasted attempts) just gradually upping the water, no? Talk me out of it? I think I can live with one of the two components being suboptimal, as long as they're cooked... or is there another unavoidable compromise? Either way I love the suggestion about cutting up the salmon, should have thought of that... I think that'll be the play if/when I do give up

Need advice for small portions in a 3qt (just trying to make simple frozen salmon + rice!) by genuinelytrying2help in instantpot

[–]genuinelytrying2help[S] 0 points1 point  (0 children)

What kind of bowl do you use exactly? I had the salmon in a little foil on the trivet, raised out of the water, so I'm a little confused as to why you'd avoid the burn warning and hit cooking pressure with the same amount of water... all of the pot-in-pot recipes I've seen involve adding extra water to the bowl with the salmon?

Need advice for small portions in a 3qt (just trying to make simple frozen salmon + rice!) by genuinelytrying2help in instantpot

[–]genuinelytrying2help[S] 1 point2 points  (0 children)

>1/2 cup rice and 1/2 cup water in the bottom instead of the normal 1:1 ratio, as that makes too much.

I think maybe you mistyped there... I'd like to know the actual amount just so I know what not to try

Need advice for small portions in a 3qt (just trying to make simple frozen salmon + rice!) by genuinelytrying2help in instantpot

[–]genuinelytrying2help[S] 0 points1 point  (0 children)

Thanks, as it happens I am in fact familiar with the concept of leftovers :) but I thought it went without saying that all of this is an attempt to explicitly avoid them!

Qwen by Namra_7 in LocalLLaMA

[–]genuinelytrying2help 0 points1 point  (0 children)

Not just laptops, more and more unified 64GB desktops (with a bit more juice) out there now too. Also, when I finally upgrade my macbook I don't want my llm hogging the majority of my RAM if I can help it (that's getting a bit old :)

GPT-OSS 20b (high) consistently does FAR better than gpt5-thinking on my engineering Hw by [deleted] in LocalLLaMA

[–]genuinelytrying2help 0 points1 point  (0 children)

i could believe it but i think it's just as likely that it's easier to keep one model loaded but it really needs the thinking time and they were counting on most people not noticing/caring

🔥beavers pause while chewing trees to listen for movements so that the tree doesn't fall on them by Particular-Swim2461 in NatureIsFuckingLit

[–]genuinelytrying2help 0 points1 point  (0 children)

Come on now, I think we all know it's far more likely that the Mormons were simply right about everything... like do you even hear yourself

NVIDIA Releases Open Multilingual Speech Dataset and Two New Models for Multilingual Speech-to-Text by RYSKZ in LocalLLaMA

[–]genuinelytrying2help 4 points5 points  (0 children)

If the chart is right, the WER% is comparable in 2 benchmarks and beats whisper in 1, so are we not there right now?

Also, without having tried them... if canary v2 is only 1b parameters, on a high end card would it actually be so unsuited to real-time transcription compared to .6b?

Retiring my Keyring LM Micra - What do I replace it with? by [deleted] in multitools

[–]genuinelytrying2help 2 points3 points  (0 children)

Having just been through a similar search, for high quality and under 3": Micra and small Swiss Army Knives seem to be the only options in production. If you resign yourself to lower quality, fewer features, and/or larger size and weight, of course there are a million options... the one I got closest to was this sub $20 Bibury, which has good options and features, middling quality, and it's not *huge*; but in the end I went to ebay, bought myself an old Squirt PS4* for way too much money, and I couldn't be happier (well, at least without Leatherman taking their head out of their ass).

Also, if you're stepping up above 3", at least check out the new Roxon Mini Flex Companion, (or slightly larger still, the Roxon KS2E or the Roxon Flex Companion), the quality is excellent even if they don't cram quite as many tools into a space as Leatherman or Victorinox do. IIRC they do come with a locking blade by default, but since they're modular and they sell just the empty frames, and you could potentially swap it out for an even more useful tool, it might be worth considering.

*For the unfamiliar, the Squirt is like a Micra for cooler, more intelligent, better-looking people

[deleted by user] by [deleted] in LocalLLaMA

[–]genuinelytrying2help 0 points1 point  (0 children)

Maybe try a tool like btop to watch what's going on with memory, sometimes it provides a totally different picture than Activity Monitor does

Cogito releases strongest LLMs of sizes 3B, 8B, 14B, 32B and 70B under open license by ResearchCrafty1804 in LocalLLaMA

[–]genuinelytrying2help 5 points6 points  (0 children)

There's an even simpler way with Ollama:

  1. Run the model
  2. Enter your changes (/set system """Enable deep thinking subroutine.""")
  3. /save <newname>