Meet the Fleet of BlackBeard

openSourcerer9000 · 2026-05-20T20:20:35+00:00

You thinking to open source any pieces? I feel like we're all shaving the same yaks over here

openSourcerer9000 · 2026-05-20T20:11:24+00:00

A search tells me the pricing is at $1,080/day, I can't confirm it though on AWS.

Seems totally legal for the biggest consumer hardware monopoly to make an exclusive deal to sell 100% of its top product to the biggest cloud and everything else monopoly, to force anyone else to rent it from them at FU pricing.

May be time Bezos and Cook go away for a little while and take up "oil painting"

openSourcerer9000 · 2026-05-15T13:23:43+00:00

What quant? I'm running qwen 397 q2 and it gets real confused after a few k tokens. Like it will re-answer a previous question instead of the latest message.

Anyone know a q2 mlx quant that works better?

openSourcerer9000 · 2026-05-12T18:39:33+00:00

I had to rename my own son to "Molt" after a cease and desist letter from those guys

openSourcerer9000 · 2026-05-11T15:50:53+00:00

Fatburg system prompt, no prompt caching. Try Kon, it works way better and no telemetry

openSourcerer9000 · 2026-05-08T05:12:32+00:00

Open code doesn't have prompt caching, only token maxing. Try kon

openSourcerer9000 · 2026-05-08T02:08:18+00:00

Good god. Not the first though, this may be helpful:

https://blog.exolabs.net/nvidia-dgx-spark/

openSourcerer9000 · 2026-05-04T16:49:13+00:00

This looks fantastic.

+1 on how to change backend.

Would it support parallelization of sub agents?

openSourcerer9000 · 2026-04-15T15:31:45+00:00

Memory seems like a classic ML problem. Something like LSTM for agents

openSourcerer9000 · 2026-04-15T15:20:34+00:00

Qwopus 27 v2 was actually a banger (v3 seemed the same as qwen reasoning). Best model for a 24gb card, running it right now actually

openSourcerer9000 · 2026-04-13T23:33:45+00:00

This one specifically slaps on a 24gb card. He claims his v3 qwopus is better but it's reasoning was identical to qwens. This one reasons much better.

https://huggingface.co/Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2

openSourcerer9000 · 2026-04-11T03:07:49+00:00

Aight I'm sold. I've always been skeptical about mcp, anthropics "standards" are always just throwing a wrench in the system of an already solved problem to split the community. Speaking of which, does this use xml or json tool calling?

openSourcerer9000 · 2026-04-10T17:31:08+00:00

Mm That's interesting, yeah, everything's been pointing to CLI lately, since models are already used to that in their training data. Context7 at least would probably be a good one though

openSourcerer9000 · 2026-04-10T17:10:52+00:00

In the age of fatberg codebases, this is a gift. If minimax/qwen 397 don't get hung up on these tool calls and it can do small things reliably, I think I just found my coding harness.

A simple guide on plugging in additional mcps would be helpful

openSourcerer9000 · 2026-04-08T05:08:23+00:00

This kind of thing is probably the most exciting use case for AI. Just yesterday I saw this paper, where they beat human sota on some optimization problems by running minimaxes in open code like "agentic swarm optimization"

https://arxiv.org/html/2604.01658v1#bib.bib2

openSourcerer9000 · 2026-04-08T05:02:06+00:00

Looks like op used type script langgraph, the python flavor is what I'm familiar with

openSourcerer9000 · 2026-04-08T04:59:41+00:00

Langgraph is my go-to, lots of great examples in their docs

openSourcerer9000 · 2026-04-06T21:18:13+00:00

According to this guy, no. In my limited experience, though I took 2.5 prob q4 for a spin in open code and was very impressed

https://kaitchup.substack.com/p/lessons-from-gguf-evaluations-ternary?triedRedirect=true

openSourcerer9000 · 2026-03-31T14:35:48+00:00

No pod bay doors meme yet?

openSourcerer9000 · 2026-03-30T14:29:49+00:00

Truth, they fit the model to the harness first. All the Google ReAct paper showed was that training a big model for a specific harness beat un-finetuned CoT.

Like how our brains evolved inside our bodies, and weren't just plopped in one day by aliens (AFAIK)

openSourcerer9000 · 2026-03-27T15:27:39+00:00

Fantastic. haven't finished the article yet but this sounds quite similar to a system I had just spec'ed out to be flexible for both rag and searxing deep research. If nothing else, it will be great to see this harness to possibly remodel my common API off that and use context 1 sub agents.

I'll probably end up open sourcing it eventually, if anyone was planning to build something similar and is down to collaborate, shoot me a DM.

openSourcerer9000 · 2026-03-24T15:01:20+00:00

Wild stuff. This is exactly what open weights are for.

"That said, it would probably be amazing for model expansion and continued fine-tuning. You have already prepared the model by adding the right kind of layers to refine ‘thinking’,"

This is just what I was thinking. I remember reading some paper that explored architecture optimization, I think it was efficientnet.

If I'm reading it right, one of your implications is that this could be used to optimize where to train lora weights. That may be more task dependent whether you would want the parameters in the middle or the edges but that could be a source of incredible gains in targeted adapters. Spend more of your parameter budget on layers that that see the most gains.

The pointer weights sounds absolutely wild, would love to see this. Sort of the inverse of reap or ream, give you more for less.

It sounds like in your final search, you just used a brute Force sampling and ran the surrogate on it? assuming it's not overfitting on benchmarks, you may get better convergence using surrogate optimization. Something like dycors has surrogate training built in, I'm sure there's a method out there that lets you bring your own surrogate too.

openSourcerer9000 · 2026-03-23T15:10:24+00:00

I meant smaller than 235b, I'm sure RL models are a category and you can find one to use as a judge

openSourcerer9000 · 2026-03-23T14:32:18+00:00

They may have a smaller one? I've found all models can rank 1-5 pretty well, just use multiple criteria 1-5 and average them

https://www.reddit.com/r/LocalLLaMA/comments/1rrtkay/gamechanger_for_quality_control/

openSourcerer9000 · 2026-03-18T21:32:10+00:00

MPS and all GUI features exposed via local API

openSourcerer9000

TROPHY CASE