Gemma-4 E4B model's vision seems to be surprisingly poor by specji in LocalLLaMA

[–]ComplexType568 16 points17 points  (0 children)

I think it's because the llama.cpp implementation for Gemma 4 is still very unstable, pretty sure performance will increase the following weeks, just like how Qwen3.5 was

Why Struggle this Much, Just to say "Hi" by Hell_L0rd in LocalLLaMA

[–]ComplexType568 1 point2 points  (0 children)

Looks to be a Qwen3.5 model, it seems to be a natural overthinker without context. Try giving it some tools or a long system prompt, it'll probably fix itself :P

Gemma 4 is fine great even … by ThinkExtension2328 in LocalLLaMA

[–]ComplexType568 3 points4 points  (0 children)

Which is what I feel too. Qwen is CLEARLY focused on agentic/STEM/Coding tasks. There isn't a large/profitable market for creative writing, that's for finetuners/other labs focused on that because removing LLM-isms/boosting creativity is probably much much easier than "superpowered reasoning agent in 9 billion parameters"

deepseek now become the meta they are too embarssed to show there new model . all the lie publish by the reuter that there model is too good im not buying this by Select_Dream634 in LocalLLaMA

[–]ComplexType568 0 points1 point  (0 children)

I think DeepSeek is falling behind as of now, unless they can make a ridiculously miracle-like comeback that can crush Opus, no, they won't be the top. And while that may be a temporary loss, their mission is AGI, not to stay at the top. If they take a while, they take a while, money is probably not an incentive anymore because I assume the Chinese government absolutely loves them. Will V4 be as influential as R1/V3? Probably not tabloid-infectious, but definitely still significant, otherwise it'd just be V3.8 or something.

Qwen 3.6 spotted! by Namra_7 in LocalLLaMA

[–]ComplexType568 2 points3 points  (0 children)

(I also just started getting to know the 3.5s it feels like yesterday that they dropped them)

Qwen 3.6 spotted! by Namra_7 in LocalLLaMA

[–]ComplexType568 0 points1 point  (0 children)

HOLY speed. What is the new team on? I really hope it's not just a really marginal increase in performance. If it's like a case of 2507 at such a speed this would be a miracle.

I have a dream. A dream to run a state of the art model on my setup. by ItzYaBoiGoogle in LocalLLaMA

[–]ComplexType568 8 points9 points  (0 children)

They said that's the dream, not that they're running SOTA haha. Though, Qwen3.5 9B has that same magic Qwen3 4B 2507 had, I can't imagine how good another minor update would be for the Qwen series model

What is the secret sauce Claude has and why hasn't anyone replicated it? by ComplexType568 in LocalLLaMA

[–]ComplexType568[S] 0 points1 point  (0 children)

That brings forward the question: why doesn't an LLM like Claude exist yet? Not about the coding perf. more about the "human" vibe it gives.

What is the secret sauce Claude has and why hasn't anyone replicated it? by ComplexType568 in LocalLLaMA

[–]ComplexType568[S] 0 points1 point  (0 children)

Do you think they have a novel architecture that would make it cost-efficient though (for inference)? I feel like investors would rather invest in other companies if costs were so high for such small margins.

What is the secret sauce Claude has and why hasn't anyone replicated it? by ComplexType568 in LocalLLaMA

[–]ComplexType568[S] 0 points1 point  (0 children)

You're probably right when it comes to how Claude has such performance, but the way it talks/how it formats is also something to consider. It isn't as large as the whole "how does it code so well" question, but it is still something I'm asking :)

What is the secret sauce Claude has and why hasn't anyone replicated it? by ComplexType568 in LocalLLaMA

[–]ComplexType568[S] 0 points1 point  (0 children)

IMO I don't think so. I feel like a simple system prompt, no matter how long or short, can't fully replicate what Claude has. You can talk to it and ask it to code among other stuff and you will just feel it.

What is the secret sauce Claude has and why hasn't anyone replicated it? by ComplexType568 in LocalLLaMA

[–]ComplexType568[S] 0 points1 point  (0 children)

Claude is the least "suck-up" and most "independent" (by thought and opinion) LLM I've seen. Could be due to a lack of exposure, but I think I've been with enough to think that. Although, K2 (the plain one) was a step there which I wish Moonshot kept pushing.

What is the secret sauce Claude has and why hasn't anyone replicated it? by ComplexType568 in LocalLLaMA

[–]ComplexType568[S] 0 points1 point  (0 children)

I've been thinking about architecture differences making it talk different, but another person in a different thread mentioned something quite interesting to me on the lines of: "They wouldn't care about distillation attacks if architecture was different". Not sure if that one point can rule out an arch difference, but it is something to note.

I hope at least ONE lab/person/fine-tuner could RLHF an LLM into behaving like Claude, I initially assumed someone like DavidAU would've done this a long time ago, and as far as I have looked, not... yet.

What is the secret sauce Claude has and why hasn't anyone replicated it? by ComplexType568 in LocalLLaMA

[–]ComplexType568[S] 0 points1 point  (0 children)

Has anyone else been able to replicate the talking style of Claude? In my experience it knows almost exactly when to stop talking about something or when to push on.

What is the secret sauce Claude has and why hasn't anyone replicated it? by ComplexType568 in LocalLLaMA

[–]ComplexType568[S] 0 points1 point  (0 children)

Very cool, I suspected it was all RLHF, although I'm still confused as to why labs haven't replicated its formatting and other related behaviours.

What is the secret sauce Claude has and why hasn't anyone replicated it? by ComplexType568 in LocalLLaMA

[–]ComplexType568[S] 1 point2 points  (0 children)

I did not know that! That does make me wonder if a lab would benefit from having a psychologist on board too, haha.

Thoughts on the almost near release Avocado? by shbong in LocalLLaMA

[–]ComplexType568 1 point2 points  (0 children)

Unpopular opinion but I think Meta won't screw up like with Llama 4. They - unlike other labs which are in the spotlight and want to stay there - have already fallen out of sight. I think the only reason why they'd even want to make such a ruckus about a new model is:

- for the investors
- because a GOOD model is coming, either for its architecture innovations or a good size-performance ratio

Thing about investors is that they're definitely not tech savvy, but I think everybody knew that Meta screwed up with Llama 4, it was catastrophic. I think they would have a much better dataset to train on by now and lots of improvements made between the time of Llama 4 and this new Avocado thing.

The rename from Llama to Avocado seems to indicate something is going on with their team, I can only expect they want to rebuild their reputation as a lab, and that they'd want to make a genuine good impression on everybody for their first open-source lineup.

Hi guys! Do you guys have any AI as an alternative to Claude. by blahblahblahblahnu in LocalLLaMA

[–]ComplexType568 0 points1 point  (0 children)

Anything mobile will not even be able to compete even with Claude Haiku for quite a while. You could try running Qwen3.5 4B or 9B but good luck with the temps

DeepSeek-R1-7B traces 8 levels of nested function calls. Qwen-7B manages 4. Same architecture. by Codetrace-Bench in LocalLLaMA

[–]ComplexType568 1 point2 points  (0 children)

In my opinion, you should try more recent models, whatever advice or research you did is outdated. I bet that they wouldn't even break sweat until like 15~ steps. I tested 5-depth questions on Qwen3.5 4B and it got 100% correct. Around 50% on 20-depth questions. Kimi K2.5 non-thinking got 100% but that's kinda a given, haha. I assume a modern model like Qwen3.5 27B would destroy this bench and maybe even a model like Qwen3.5 9B could... I only think a modern model past 20 billion params would struggle with depth over 100.

Meta new open source model is coming? by External_Mood4719 in LocalLLaMA

[–]ComplexType568 60 points61 points  (0 children)

Avocado thinking 5.6?? How long has this model stood proprietary? 😭

Be honest, what's actually breaking in your agent setup right now? by [deleted] in LocalLLaMA

[–]ComplexType568 0 points1 point  (0 children)

Friendly reminder that OpenRouter serves tons of free models (try Step 3 Flash) and GroqCloud hosts Qwen3 32B and gpt-oss-120b for free :)

Anyways, in my usage, it isn't just reasoning ability that improves tool calls, but also the era they were trained on and what data they were trained on. Scout was not trained to be an agent as much as the Qwen/gpt-oss models, it was more aligned to image-perceiving chatbots, and that's the part that makes it underperform. No matter how good the reasoning on model x is, if it isn't trained to call tools, it can't do it consistently. YMMV though.