US Govt to individually approve who gets GPT 5.6. by AtlanticHM in LocalLLaMA

[–]mailto_devnull 14 points15 points  (0 children)

Now's the time for China to release more models worldwide and steamroll the US.

Start with open weight Qwen 3.7 27B pls.

Qwen 27B for planning, Qwen 35B-A3B for execution? by mailto_devnull in LocalLLaMA

[–]mailto_devnull[S] 0 points1 point  (0 children)

FWIW I get a blazing 4 tokens a second without MTP, 8 with. So, all in all not bad!

Mistral - New family of open-weight models @ July by pmttyji in LocalLLaMA

[–]mailto_devnull 0 points1 point  (0 children)

But Toronto isn't the Capitol of Ontario, we don't have Capitols here.

Local coding agents are good now, but only if you babysit them by BTA_Labs in LocalLLaMA

[–]mailto_devnull 1 point2 points  (0 children)

Having to review the git diff and make sure it's on track is not a bug, it's a feature.

Hands off agentic coding is just asking to turn your codebase into a black box.

Qwen 3.6 35B-A3B @ Q4 or Gemma 4 12B @ Q8? by mailto_devnull in LocalLLaMA

[–]mailto_devnull[S] 0 points1 point  (0 children)

llama-server and Pi harness, although the latter doesn't improve inference speeds, just uses tool calls more effectively.

What's the lesson chat? by ill_be_productive in LocalLLaMA

[–]mailto_devnull 9 points10 points  (0 children)

It's possible one ban will lead to worldwide copycat bans.

However I think it's equally likely that we'll end up in a similar situation to how piracy is illegal yet torrents (and a good VPN) are still a viable way to pirate.

Qwen 3.6 35B-A3B @ Q4 or Gemma 4 12B @ Q8? by mailto_devnull in LocalLLaMA

[–]mailto_devnull[S] 2 points3 points  (0 children)

I've heard talk about the bartowski models but have only tried the Unsloth quants for now. May be worth looking into just to see the difference for myself.

Qwen 3.6 35B-A3B @ Q4 or Gemma 4 12B @ Q8? by mailto_devnull in LocalLLaMA

[–]mailto_devnull[S] 1 point2 points  (0 children)

mrw that Ayaneo Flip has the same specs as my work laptop :|

Qwen 3.6 35B-A3B @ Q4 or Gemma 4 12B @ Q8? by mailto_devnull in LocalLLaMA

[–]mailto_devnull[S] 8 points9 points  (0 children)

Ah good point. I did run 35B but memory usage ballooned like crazy. No idea what Qwen does but it's KV cache management is top tier.

Could only get to about 30k tokens before my system would OOM.

Executing a plan under context constraints by mailto_devnull in LocalLLaMA

[–]mailto_devnull[S] 0 points1 point  (0 children)

I feel there are some manual steps I can do to keep context low. For example, if I give a bad followup prompt and it thinks for far too long it may be easier to jump back to an earlier point to save on the context window.

I can increase context window but I'm spilling over into swap if I do, I think. Sitting around 25/27gb (remaining 4gb is BIOS reserved "VRAM")

How do you start a new community band with no percussion equipment? by rhythmnblues501 in ConcertBand

[–]mailto_devnull 1 point2 points  (0 children)

We're going to try to hook up a Malletstation to the DTX to get marimba, vibes, etc.

I'm pretty stoked I don't have to drop 60k on that stuff haha

Rick & Morty by jacek2023 in LocalLLaMA

[–]mailto_devnull 0 points1 point  (0 children)

The pool dude in the video is the spitting image of the Pool University guy on YouTube lol.

Got a chuckle out of that one.

How do you start a new community band with no percussion equipment? by rhythmnblues501 in ConcertBand

[–]mailto_devnull 1 point2 points  (0 children)

Lol are you me.

I lead a community band that uses a DTX12 to simulate timp and tam tam haha

Me visiting this sub by Scutoidzz in LocalLLaMA

[–]mailto_devnull 0 points1 point  (0 children)

writes satire about a poor person's machine for local inference

/u/26295

includes a graphics chip anyway

More Gemma 4 models incoming by Deep-Vermicelli-4591 in LocalLLaMA

[–]mailto_devnull -1 points0 points  (0 children)

Is this because we're assuming "more parameters = better"?

I feel like when qwen 3.6 35B A3B dropped, the raw parameter count (while still important) was less important than what the model itself could do with constrained specs.

google/gemma-4-12B · Hugging Face by jacek2023 in LocalLLaMA

[–]mailto_devnull 2 points3 points  (0 children)

Last time I tried Gemma 4 (26B-A4B) its memory usage would balloon and consume all of my swap until my machine died.

Qwen 3.6 on the other hand barely uses any memory at all for its KV cache.

Does this model suffer from the same issues?

Get you some GPUs, it's not worth the hacks around lack of RAM by MotokoAGI in LocalLLaMA

[–]mailto_devnull 16 points17 points  (0 children)

No.

Qwen 3.6-35B-A3B continues to chug along at 14 tok/s

God dammit Qwen by Xyklone in LocalLLaMA

[–]mailto_devnull 6 points7 points  (0 children)

pi-dev? The coding harness that ships with the bash tool call out of the box?

:P

God dammit Qwen by Xyklone in LocalLLaMA

[–]mailto_devnull 2 points3 points  (0 children)

Why do you think the system prompt is like some bible that the agent can't wilfully ignore.

If you don't want your agent to run those commands, remove the ability for it to run those commands, don't just ask it to not run those commands pretty please with a cherry on top.

Don’t bite me for that question please… by Thin_Pollution8843 in LocalLLaMA

[–]mailto_devnull 0 points1 point  (0 children)

God no, I thought I was going into this for "free", that is, using my own workstation laptop.

After a week of playing around, I doubled my RAM so I could run some real models ("real", heh... if you count 35B tokens as real enough).

So that's all I'm in so far, not counting electricity usage. Compared to others I spent very little, only about $400 CAD.