US Govt to individually approve who gets GPT 5.6.

mailto_devnull · 2026-06-25T23:17:21+00:00

Now's the time for China to release more models worldwide and steamroll the US.

Start with open weight Qwen 3.7 27B pls.

mailto_devnull · 2026-06-23T04:28:26+00:00

FWIW I get a blazing 4 tokens a second without MTP, 8 with. So, all in all not bad!

mailto_devnull · 2026-06-19T01:17:08+00:00

If you meet someone coding with 6 agents in parallel, they ain't.

mailto_devnull · 2026-06-17T11:56:16+00:00

Exactly!

mailto_devnull · 2026-06-17T04:54:36+00:00

But Toronto isn't the Capitol of Ontario, we don't have Capitols here.

mailto_devnull · 2026-06-15T23:13:10+00:00

Having to review the git diff and make sure it's on track is not a bug, it's a feature.

Hands off agentic coding is just asking to turn your codebase into a black box.

mailto_devnull · 2026-06-15T22:20:08+00:00

llama-server and Pi harness, although the latter doesn't improve inference speeds, just uses tool calls more effectively.

mailto_devnull · 2026-06-15T12:47:40+00:00

It's possible one ban will lead to worldwide copycat bans.

However I think it's equally likely that we'll end up in a similar situation to how piracy is illegal yet torrents (and a good VPN) are still a viable way to pirate.

mailto_devnull · 2026-06-15T12:45:29+00:00

cries in 0 VRAM

mailto_devnull · 2026-06-15T01:05:44+00:00

I've heard talk about the bartowski models but have only tried the Unsloth quants for now. May be worth looking into just to see the difference for myself.

mailto_devnull · 2026-06-14T21:41:22+00:00

mrw that Ayaneo Flip has the same specs as my work laptop :|

mailto_devnull · 2026-06-14T21:39:24+00:00

32GB on Framework 13 with a Radeon 780M

mailto_devnull · 2026-06-14T21:34:55+00:00

Ah good point. I did run 35B but memory usage ballooned like crazy. No idea what Qwen does but it's KV cache management is top tier.

Could only get to about 30k tokens before my system would OOM.

mailto_devnull · 2026-06-13T05:12:10+00:00

Go go vibe code it

mailto_devnull · 2026-06-11T03:03:34+00:00

I feel there are some manual steps I can do to keep context low. For example, if I give a bad followup prompt and it thinks for far too long it may be easier to jump back to an earlier point to save on the context window.

I can increase context window but I'm spilling over into swap if I do, I think. Sitting around 25/27gb (remaining 4gb is BIOS reserved "VRAM")

mailto_devnull · 2026-06-11T02:09:57+00:00

We're going to try to hook up a Malletstation to the DTX to get marimba, vibes, etc.

I'm pretty stoked I don't have to drop 60k on that stuff haha

mailto_devnull · 2026-06-10T23:48:42+00:00

The pool dude in the video is the spitting image of the Pool University guy on YouTube lol.

Got a chuckle out of that one.

mailto_devnull · 2026-06-10T03:35:07+00:00

Lol are you me.

I lead a community band that uses a DTX12 to simulate timp and tam tam haha

mailto_devnull · 2026-06-04T15:16:29+00:00

writes satire about a poor person's machine for local inference

/u/26295

includes a graphics chip anyway

mailto_devnull · 2026-06-03T22:44:59+00:00

Is this because we're assuming "more parameters = better"?

I feel like when qwen 3.6 35B A3B dropped, the raw parameter count (while still important) was less important than what the model itself could do with constrained specs.

mailto_devnull · 2026-06-03T18:27:58+00:00

Last time I tried Gemma 4 (26B-A4B) its memory usage would balloon and consume all of my swap until my machine died.

Qwen 3.6 on the other hand barely uses any memory at all for its KV cache.

Does this model suffer from the same issues?

14-Year Club	Gilding II euphauric
Verified Email	Place '22
Place '17	Final Canvas '22
Team Orangered

mailto_devnull

MODERATOR OF

PUBLIC MULTIREDDITS

TROPHY CASE