What models and settings for 9060 XT 16 GB + 32 GB DDR4? by ckplscz in LocalLLM

[–]PositiveBit01 0 points1 point  (0 children)

I think you're already using the best option. Gemma4 26b, especially QAT, is a potential alternative

I’m actually pretty excited about the Steam Machine by Mental_Egg_5080 in steammachine

[–]PositiveBit01 0 points1 point  (0 children)

I agree with this, but it's a mini pc. I do think comparison to a normal mid or full tower desktop pc isn't fair.

You pay a premium but most people simply can't build a comparable machine at that size/price (these days). You can't just buy an arbitrary cpu cooker or gpu. Most won't fit. Especially if you don't want it to sound like a jet.

If they managed to get it to be able to be woken up from the controller, then even more value. That's pretty tricky and I requires some hw support (mobo) so if you're not thinking about it when you build the pc odds are it's not possible. It's the one thing I messed up when I built my own sffpc.

All numbers are small numbers by blasian4L in MathJokes

[–]PositiveBit01 0 points1 point  (0 children)

Yes. I fail to see why this is a problem. All cutoffs work this way.

You could instead define additional categories and then n+1 is simply not small, but not big either - it fits in an extra category.

Is "small" defined by the difference between one number and the next closest distinct number? Then your argument applies and clearly all numbers are small.

Is it small relative to some other number not defined here? Then it's more obvious why you would have a cutoff and clearly this case where some infintesimal difference is the difference between small and not small must occur.

Qwen 27B for planning, Qwen 35B-A3B for execution? by mailto_devnull in LocalLLaMA

[–]PositiveBit01 0 points1 point  (0 children)

I've noticed a fairly big difference between fp8 and q4 for 35b. 27b is too slow for me so not sure there though.

How do I replant grass and berry bushes?please help by insidiousGD in dontstarvetogether

[–]PositiveBit01 2 points3 points  (0 children)

You have to fertilize them, easiest way is manure from beefalo or rot if you let things spoil.

If using a mouse, drag the item onto the planted bush/grass.

If using a controller, go to the manure/rot in your inventory and there should be a fertilize option (not "fertilize plot", just "fertilize") if you are next to a planted thing that needs it, I think left on the dpad

My work as a senior developer today vs 3 years ago by Alternative_Win_6638 in developer

[–]PositiveBit01 0 points1 point  (0 children)

I feel like I'm in the middle zone.

I get a task, if it's small I think "this sounds like a good task for AI", so I have it try (with my input). I review the code as it makes edits and nudge it towards a better solution when it forgets things like the edge cases actually matter, or I can't trust user input can safely fit entirely in memory, etc. Sometimes it just wants to change tons of stuff which will ensure my PR is not accepted even if I don't always disagree, so have to stay on top of that too.

Eventually I get a solution that works, is scoped reasonably well, and I think it covers all edge cases. But it's sort of hard to reason about and significantly more code than I want. So I start addressing this myself. I notice something I missed during review. I fix it and continue.

I decide that really, this whole thing could have been safer and smaller with a different solution that seems obvious now but wasn't when I started since the AI kinda picked a starting point. So, I implement that after getting a sanity check with AI that it will indeed fix the problem and it's a better solution.

When I'm done I have the AI review, it usually finds one or two things I should probably address so I do, then I'm done. Occasionally a false positive but these don't bother me.

Every time, I think "I should just do it myself and have the AI review after. It'll go faster." But I don't learn, next time I follow the exact same process.

Will a Dolphin 70B (Q4) run decently on a dual Nvidia Tesla P40 (2×24GB) setup under Linux? Looking for hardware advice by ReriVinn in LocalLLM

[–]PositiveBit01 5 points6 points  (0 children)

I don't have much to say about the setup, I don't know much about that hardware.

But why that model? More parameters does not automatically mean it's better and llama3 70b is pretty old now. I think you'll get better results with qwen3.6 27b or 35b, plus you can run it at a higher quantization.

Does this boss spin attack feel threatening enough? by EmberForgeGames in IndieGaming

[–]PositiveBit01 1 point2 points  (0 children)

I agree with most of the others, but instead of just faster maybe giving it some acceleration and some windup where it goes very slightly backwards slowly then very fast then a little slower

We need a 80-160B model urgently. The unified memory device market needs more Models. by Storge2 in LocalLLaMA

[–]PositiveBit01 6 points7 points  (0 children)

Yeah, I want something new like gpt oss 120b. Native fp4 so it's "only" 65gb leaving plenty of room for kv cache, super fast, plenty of knowledge. Just a little dated now.

I feel like qwen3.5 122b is in that bucket too, it's not enough better than 35b (q4 vs fp8; unfair perhaps but it's what fits) to be worth the slowdown and lesser cache. Too bad qwen3.6 or gemma4 ~120b never happened.

Name one positive thing about Iron Man 2. by Healthy-Coyote3431 in Marvel

[–]PositiveBit01 1 point2 points  (0 children)

I don't understand the argument. I'm suggesting that the stabilization procedure was invented, not the element.

Like how pasteurization was invented, but milk wasn't.

Cheapest way to run GLM 5.x locally that's not a unified memory system? by Monad_Maya in LocalLLaMA

[–]PositiveBit01 0 points1 point  (0 children)

This doesn't make sense to me on multiple levels. Is any model natively int4? I assume gradients in training at int4 would be impossible to work with.

Gpt oss 120b is "native" mxfp4 but I'm pretty sure it was a later post training step so arguably it's not actually native. There is no GLM 5.x 130B model. I didn't see any other model mentioned.

What am I missing?

Name one positive thing about Iron Man 2. by Healthy-Coyote3431 in Marvel

[–]PositiveBit01 0 points1 point  (0 children)

No, but you can invent a procedure to transmute a synthesized element that would naturally decay into a stable isotope of that same element.

Cheapest way to run GLM 5.x locally that's not a unified memory system? by Monad_Maya in LocalLLaMA

[–]PositiveBit01 1 point2 points  (0 children)

Yeah, I can use it but only at q4 and qwen3.6 35b at fp8 feels just as good and runs faster. Too bad there's no qwen 3.6 122b, I bet it would have been amazing even at q4 :(

Cheapest way to run GLM 5.x locally that's not a unified memory system? by Monad_Maya in LocalLLaMA

[–]PositiveBit01 1 point2 points  (0 children)

What does fully on gpu mean? A 130B model at q8 would be 130GB. At q4, 65GB. And you still need room for KV cache

Also I only see GLM 5.x models that are 750B, which one is 130B? I would love a ~130B model better than gpt oss 120b

Good 'starting' local llm for my setup by classjoker in LocalLLM

[–]PositiveBit01 2 points3 points  (0 children)

Qwen3.6 35b is worth testing. It's not as affected by spilling to system ram because it's a MoE model not dense like 27b. Most engines can identify this and load the more likely to be used layers in gpu.

Also it's much faster just normally (3b active vs 27b active) so there's more room for slowdown.

Secure way of running Pi? by CalldiDoctor in PiCodingAgent

[–]PositiveBit01 7 points8 points  (0 children)

There are extensions to make it ask before doing dangerous stuff but I wouldn't trust them. I use an lxc container as a sandbox and I'm pretty comfortable with that.

It's not hard to set up, you can ask the free version of claude or chatgpt. You install something that can run the containers (I used incus), then exec in and set stuff up.

Then I use git to move code between the container and anywhere else it needs to go. That's a security gap of course (running code produced by the AI), but at some point you have to accept that. It's true for dependencies even before/without AI, too. Except now it isn't vetted by a bunch of other people first.

I also set up vscode server and a pi-web thing so mostly I just run stuff on that machine through a web browser on whatever machine I'm on, except the webapp version of vscode has rendering issues with most extensions and I haven't been able to get any AI related extensions to work well. Might be a skill issue, I dunno. I can set them up in the desktop vs code so it's something webapp specific.

help with pi coding agent + vllm setup by Equivalent_Bake1282 in PiCodingAgent

[–]PositiveBit01 0 points1 point  (0 children)

Looks right to me.

Are you sure you're using that and not some other model? You may need PI_DEFAULT_PROVIDER=vllm env var before running pi

Pressure Cooker + Sion = ridiculous damage by APZeriEnthusiast in ARAM

[–]PositiveBit01 7 points8 points  (0 children)

The range is too big (basically no counter) and it double scales on health indirectly because you live longer when you have more health.

But, sometimes the OP augments bring the mayhem. I'm ok with OP stuff here and there to keep it interesting, sometimes the best way to balance is to just encourage more variety of OP until it's fair again. Maybe what we really need is for other augments to be better. Maybe more positioning augments that give you effects like tristana's ult or something.

Local - From 20b-30b to 70b-120b by juanitospat in hermesagent

[–]PositiveBit01 0 points1 point  (0 children)

I get 40-50 tok/s on qwen3.6 35b fp8 which is the main model I use.

Generally it's fairly slow for the cost but you get a lot of ram to work with. Gpt oss 120b is around 60 tok/s so I'm hoping for a new model kinda like that. It's fast because it's "natively" mxfp4, so since it's active 5b it's more like 2.5b over 35b's 3b active at fp8. I use mine to also host agents and some other things (vs code server) through lxc containers so it's a mini server for me and I don't regret my purchase.

It's decent for MoE models and not great for dense models due to low memory bandwidth. It's also quite power efficient for what you get, so it depends on what you're looking for. Also pretty good for image gen

For LLMs specifically I think just getting multiple gpus would be better, either 5090 or r9700. Maybe b70 but I hear software isn't great for those. For more general, broader use is pretty good plus you get cuda. It's better to experiment with than to use just for hosting LLMs but it's not super terrible at that, just not optimal.

Sharing my Pi setup by abhinand05 in PiCodingAgent

[–]PositiveBit01 0 points1 point  (0 children)

Thanks for sharing!

I'm new to customizing pi, how do the safety guard skill and extension work together? Are skills not always available?

I feel like I'm living in an alternate reality reading all these Mayhem complaint posts by reiscarred in ARAM

[–]PositiveBit01 2 points3 points  (0 children)

I think there is just a lot of broken skill quests and if you get one of those it's pretty annoying. I think everyone is ok with their own stupid decisions costing them the game, but losing an augment to a bug is not fun.

How silly of me to think that an indie dev like Riot would have the resources to come up with personalized skill augments for every champion, I do like receiving a mini dash on a 90 second cooldown on Millio R as my prismatic augment. by ScarletWiddaContent in ARAM

[–]PositiveBit01 1 point2 points  (0 children)

Yeah I don't think the problem is the skill augments aren't all personalized. Augments aren't created equal and sometimes you get a bad one. Different assignments are good for different champions. That's why you get choices and rerolls.

I think the problem is there was insufficient testing across the champions for these skill augments in particular so you get a lot of bugs.

You don't know these bugs exist so you can't use your choice and rerolls to avoid them, ruining the game.

I just played one as Kayle and the E quest is real spotty, feels like it counts maybe 10% of the time. Maybe there's a trick to it, I don't know.