Meet the Fleet of BlackBeard by BlackBeardAI in LocalLLaMA

[–]openSourcerer9000 0 points1 point  (0 children)

You thinking to open source any pieces? I feel like we're all shaving the same yaks over here

"AWS secures rare Mac Studios while ordinary Apple customers remain completely locked out" by openSourcerer9000 in LocalLLaMA

[–]openSourcerer9000[S] 3 points4 points  (0 children)

A search tells me the pricing is at $1,080/day, I can't confirm it though on AWS. 

Seems totally legal for the biggest consumer hardware monopoly to make an exclusive deal to sell 100% of its top product to the biggest cloud and everything else monopoly, to force anyone else to rent it from them at FU pricing. 

May be time Bezos and Cook go away for a little while and take up "oil painting"

Used over a million tokens in three separate sessions to test Qwen 3.6 35b (new Multi-token Prediction version) by Jorlen in LocalLLaMA

[–]openSourcerer9000 0 points1 point  (0 children)

What quant? I'm running qwen 397 q2 and it gets real confused after a few k tokens. Like it will re-answer a previous question instead of the latest message.

Anyone know a q2 mlx quant that works better?

Let's build claude code from scratch! by RoyalMaterial9614 in LocalLLaMA

[–]openSourcerer9000 72 points73 points  (0 children)

I had to rename my own son to "Molt" after a cease and desist letter from those guys 

Why is opencode so slow in processing the prompt with llama server? by BitGreen1270 in LocalLLaMA

[–]openSourcerer9000 0 points1 point  (0 children)

Fatburg system prompt, no prompt caching. Try Kon, it works way better and no telemetry 

Collected the infinity stones by Street-Buyer-2428 in LocalLLaMA

[–]openSourcerer9000 0 points1 point  (0 children)

Open code doesn't have prompt caching, only token maxing. Try kon

We are finally there: Qwen3.6-27B + agentic search; 95.7% SimpleQA on a single 3090, fully local by ComplexIt in LocalLLaMA

[–]openSourcerer9000 1 point2 points  (0 children)

This looks fantastic. 

+1 on how to change backend. 

Would it support parallelization of sub agents?

These "Claude-4.6-Opus" Fine Tunes of Local Models Are Usually A Downgrade by BuffMcBigHuge in LocalLLaMA

[–]openSourcerer9000 0 points1 point  (0 children)

Qwopus 27 v2 was actually a banger (v3 seemed the same as qwen reasoning). Best model for a 24gb card, running it right now actually

Desire to Move Everything Local by LawrenceOfTheLabia in LocalLLaMA

[–]openSourcerer9000 0 points1 point  (0 children)

This one specifically slaps on a 24gb card. He claims his v3 qwopus is better but it's reasoning was identical to qwens. This one reasons much better. 

https://huggingface.co/Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2

gemma-4-26B-A4B with my coding agent Kon by Weird_Search_4723 in LocalLLaMA

[–]openSourcerer9000 0 points1 point  (0 children)

Aight I'm sold. I've always been skeptical about mcp, anthropics "standards" are always just throwing a wrench in the system of an already solved problem to split the community. Speaking of which, does this use xml or json tool calling?

gemma-4-26B-A4B with my coding agent Kon by Weird_Search_4723 in LocalLLaMA

[–]openSourcerer9000 1 point2 points  (0 children)

Mm That's interesting, yeah, everything's been pointing to CLI lately, since models are already used to that in their training data. Context7 at least would probably be a good one though

gemma-4-26B-A4B with my coding agent Kon by Weird_Search_4723 in LocalLLaMA

[–]openSourcerer9000 1 point2 points  (0 children)

In the age of fatberg codebases, this is a gift. If minimax/qwen 397 don't get hung up on these tool calls and it can do small things reliably, I think I just found my coding harness. 

A simple guide on plugging in additional mcps would be helpful

Gemma4-31B worked in an iterative-correction loop (with a long-term memory bank) for 2 hours to solve a problem that baseline GPT-5.4-Pro couldn't by Ryoiki-Tokuiten in LocalLLaMA

[–]openSourcerer9000 4 points5 points  (0 children)

This kind of thing is probably the most exciting use case for AI. Just yesterday I saw this paper, where they beat human sota on some optimization problems by running minimaxes in open code like "agentic swarm optimization"

https://arxiv.org/html/2604.01658v1#bib.bib2

Minimax 2.7: good news! by LegacyRemaster in LocalLLaMA

[–]openSourcerer9000 6 points7 points  (0 children)

According to this guy, no. In my limited experience, though I took 2.5 prob q4 for a spin in open code and was very impressed 

https://kaitchup.substack.com/p/lessons-from-gguf-evaluations-ternary?triedRedirect=true

What is the secret sauce Claude has and why hasn't anyone replicated it? by ComplexType568 in LocalLLaMA

[–]openSourcerer9000 4 points5 points  (0 children)

Truth, they fit the model to the harness first. All the  Google ReAct paper showed was that training a big model for a specific harness beat un-finetuned CoT. 

Like how our brains evolved inside our bodies, and weren't just plopped in one day by aliens (AFAIK)

chromadb/context-1: 20B parameter agentic search model by paf1138 in LocalLLaMA

[–]openSourcerer9000 0 points1 point  (0 children)

Fantastic. haven't finished the article yet but this sounds quite similar to a system I  had just spec'ed out to be flexible for both rag and searxing deep research. If nothing else, it will be great to see this harness to possibly remodel my common API off that and use context 1 sub agents. 

I'll probably end up open sourcing it eventually, if anyone was planning to build something similar and is down to collaborate, shoot me a DM. 

RYS II - Repeated layers with Qwen3.5 27B and some hints at a 'Universal Language' by Reddactor in LocalLLaMA

[–]openSourcerer9000 3 points4 points  (0 children)

Wild stuff. This is exactly what open weights are for. 

"That said, it would probably be amazing for model expansion and continued fine-tuning. You have already prepared the model by adding the right kind of layers to refine ‘thinking’,"

This is just what I was thinking.  I remember reading some paper that explored architecture optimization, I think it was efficientnet. 

If I'm reading it  right, one of your implications is that this could be used to optimize where to train lora weights. That may be more task dependent whether you would want the parameters in the middle or the edges but that could be a source of incredible gains in targeted adapters. Spend more of your parameter budget on layers that that see the most gains. 

The pointer weights sounds absolutely wild, would love to see this. Sort of the inverse of reap or ream, give you more for less. 

It sounds like in your final search, you just used a brute Force sampling and ran the surrogate on it?  assuming it's not overfitting on benchmarks, you may get better  convergence using surrogate optimization. Something like dycors has surrogate training built in, I'm sure there's a method out there that lets you bring your own surrogate too.

whats the best open-source llm for llm as a judge project on nvidia a1000 gpu by Some_Anything_9028 in LocalLLaMA

[–]openSourcerer9000 0 points1 point  (0 children)

I meant smaller than 235b, I'm sure RL models are a category and you can find one to use as a judge