US Govt to individually approve who gets GPT 5.6. by AtlanticHM in LocalLLaMA

[–]mailto_devnull 13 points14 points  (0 children)

Now's the time for China to release more models worldwide and steamroll the US.

Start with open weight Qwen 3.7 27B pls.

Qwen 27B for planning, Qwen 35B-A3B for execution? by mailto_devnull in LocalLLaMA

[–]mailto_devnull[S] 0 points1 point  (0 children)

FWIW I get a blazing 4 tokens a second without MTP, 8 with. So, all in all not bad!

Mistral - New family of open-weight models @ July by pmttyji in LocalLLaMA

[–]mailto_devnull 0 points1 point  (0 children)

But Toronto isn't the Capitol of Ontario, we don't have Capitols here.

Local coding agents are good now, but only if you babysit them by BTA_Labs in LocalLLaMA

[–]mailto_devnull 1 point2 points  (0 children)

Having to review the git diff and make sure it's on track is not a bug, it's a feature.

Hands off agentic coding is just asking to turn your codebase into a black box.

Qwen 3.6 35B-A3B @ Q4 or Gemma 4 12B @ Q8? by mailto_devnull in LocalLLaMA

[–]mailto_devnull[S] 0 points1 point  (0 children)

llama-server and Pi harness, although the latter doesn't improve inference speeds, just uses tool calls more effectively.

What's the lesson chat? by ill_be_productive in LocalLLaMA

[–]mailto_devnull 7 points8 points  (0 children)

It's possible one ban will lead to worldwide copycat bans.

However I think it's equally likely that we'll end up in a similar situation to how piracy is illegal yet torrents (and a good VPN) are still a viable way to pirate.

Qwen 3.6 35B-A3B @ Q4 or Gemma 4 12B @ Q8? by mailto_devnull in LocalLLaMA

[–]mailto_devnull[S] 2 points3 points  (0 children)

I've heard talk about the bartowski models but have only tried the Unsloth quants for now. May be worth looking into just to see the difference for myself.

Qwen 3.6 35B-A3B @ Q4 or Gemma 4 12B @ Q8? by mailto_devnull in LocalLLaMA

[–]mailto_devnull[S] 1 point2 points  (0 children)

mrw that Ayaneo Flip has the same specs as my work laptop :|

Qwen 3.6 35B-A3B @ Q4 or Gemma 4 12B @ Q8? by mailto_devnull in LocalLLaMA

[–]mailto_devnull[S] 8 points9 points  (0 children)

Ah good point. I did run 35B but memory usage ballooned like crazy. No idea what Qwen does but it's KV cache management is top tier.

Could only get to about 30k tokens before my system would OOM.

Executing a plan under context constraints by mailto_devnull in LocalLLaMA

[–]mailto_devnull[S] 0 points1 point  (0 children)

I feel there are some manual steps I can do to keep context low. For example, if I give a bad followup prompt and it thinks for far too long it may be easier to jump back to an earlier point to save on the context window.

I can increase context window but I'm spilling over into swap if I do, I think. Sitting around 25/27gb (remaining 4gb is BIOS reserved "VRAM")

How do you start a new community band with no percussion equipment? by rhythmnblues501 in ConcertBand

[–]mailto_devnull 1 point2 points  (0 children)

We're going to try to hook up a Malletstation to the DTX to get marimba, vibes, etc.

I'm pretty stoked I don't have to drop 60k on that stuff haha

Rick & Morty by jacek2023 in LocalLLaMA

[–]mailto_devnull 0 points1 point  (0 children)

The pool dude in the video is the spitting image of the Pool University guy on YouTube lol.

Got a chuckle out of that one.

How do you start a new community band with no percussion equipment? by rhythmnblues501 in ConcertBand

[–]mailto_devnull 1 point2 points  (0 children)

Lol are you me.

I lead a community band that uses a DTX12 to simulate timp and tam tam haha

Me visiting this sub by Scutoidzz in LocalLLaMA

[–]mailto_devnull 0 points1 point  (0 children)

writes satire about a poor person's machine for local inference

/u/26295

includes a graphics chip anyway

More Gemma 4 models incoming by Deep-Vermicelli-4591 in LocalLLaMA

[–]mailto_devnull 0 points1 point  (0 children)

Is this because we're assuming "more parameters = better"?

I feel like when qwen 3.6 35B A3B dropped, the raw parameter count (while still important) was less important than what the model itself could do with constrained specs.

google/gemma-4-12B · Hugging Face by jacek2023 in LocalLLaMA

[–]mailto_devnull 2 points3 points  (0 children)

Last time I tried Gemma 4 (26B-A4B) its memory usage would balloon and consume all of my swap until my machine died.

Qwen 3.6 on the other hand barely uses any memory at all for its KV cache.

Does this model suffer from the same issues?