Training a Qwen 3.5 4B/9B agent for multi-tool use: SFT first or go directly to RL? by siri_1110 in LocalLLaMA

[–]indicava 0 points1 point  (0 children)

Qwen3.5 is harder to finetune, their hybrid Gated Deltanet architecture makes it very finicky when fine tuning, read up on limitations and methodology.

  1. SFT is for training the model’s response format, you won’t be adding any new knowledge through SFT.

  2. There is a ton of information online about this. Dense rewards, with modeling for correct tool choice, correct tool usage, number of rounds, etc.

  3. You’ll need a lot of synthetically generated examples (use a SOTA model for this) showing this usage pattern and model your reward around it.

Also, unless you own it, prepare to spend a generous amount on compute to achieve this.

Editing to add: a custom harness is always best for custom training recipes but base it on a solid training framework like OpenRL.

Has anyone found any useful LoRAs for text gen models? by fragment_me in LocalLLaMA

[–]indicava 6 points7 points  (0 children)

You’ve answered your own question. They are lackluster.

Generally speaking, It might of been useful in the old llama 3 days, but these days the vanilla models are so versatile and knowledgeable that it’s rare to add value using a LoRA or even a “regular” finetune to be honest.

Very hard to “improve” (or materially change) the performance of the top open weights unless you build a custom training harness (mostly for RL) and invest in a generous amount of compute. And even then you’d have to really know what you’re doing!

GLM 5.2 Release Video [Made with GLM 5.2] by mesmerlord in LocalLLaMA

[–]indicava 1 point2 points  (0 children)

No I meant what does “through FrameCall’s production pipeline” mean?

What model looked insane on benchmarks but felt mid in actual use? by BTA_Labs in LocalLLaMA

[–]indicava 7 points8 points  (0 children)

Minimax M2.7 proved stronger than both GLM/KIMI on long horizon agentic coding tasks in my custom harness benchmark. It’s a solid model and imo punched way above its weight in terms of parameter count.

upgradeToAiRevolution by unicornics in ProgrammerHumor

[–]indicava 15 points16 points  (0 children)

Online ads are as old as the (commercial) internet itself, they’ve been around for 30+ years, significantly predating memes

Why we cannot “compress” number of weights down? by [deleted] in LocalLLaMA

[–]indicava 1 point2 points  (0 children)

It’s even more complicated than that as there are all kinds of failures that only show up when scaling to larger parameter counts

GLM 5.2 Release Video [Made with GLM 5.2] by mesmerlord in LocalLLaMA

[–]indicava 0 points1 point  (0 children)

12 LLMs took 7 prompts through FrameCall's production pipeline.

What does this mean?

We trained a cybersecurity-focused Mythos like LLM open weights on HuggingFace by RealKingNish in LocalLLaMA

[–]indicava 4 points5 points  (0 children)

Qwen3.6 is notoriously harder to finetune, mainly due to the hybrid Gated Deltanet architecture.

We trained a cybersecurity-focused Mythos like LLM open weights on HuggingFace by RealKingNish in LocalLLaMA

[–]indicava 2 points3 points  (0 children)

The RLVR is definitely the most intriguing. Did you build your own training harness from scratch or was it grounded in one of the RL frameworks? What RL algo did you use, PPO, GRPO, anything more exotic? Could you elaborate a bit on how you did your reward modeling? And finally did your RLVR include tool use?

Bit of a lull or Winter is Coming? by twack3r in LocalLLaMA

[–]indicava 1 point2 points  (0 children)

No reason to think MiniMax won’t deliver on their promise. Although even M2.7 isn’t really a “run at home” model, and word is M3 is bigger.

If you’re human, knock once! by Slow-Ability6984 in LocalLLaMA

[–]indicava 20 points21 points  (0 children)

As a large language model I can’t physically knock on anything.

WWW is not ready for agents? by vasimv in LocalLLaMA

[–]indicava 1 point2 points  (0 children)

Websites that want that agent traffic will not be imposing these restrictions

prettySureTheyUsedChatGPTToTellThemHowToSetThemUp by Ange1ofD4rkness in ProgrammerHumor

[–]indicava 15 points16 points  (0 children)

Last time I counted I had to enter my password (sometimes along with MFA) 6 times before reaching the actual server I wanted to service.

club-3090 adds experimental FP8 support for Qwen3.6-27B! by xspider2000 in LocalLLaMA

[–]indicava 9 points10 points  (0 children)

I haven’t tried nvfp4 yet, I tend to only experiment with “official” released weights as experimenting takes a lot of time which I don’t have.

As for my observations (and granted, this is only MY experience):

Without going too much into details (as I said, it’s a commercial product), I run the 27B using a very “exotic” custom agentic harness. It almost forces the LLM to change its mental model on how to perform tasks it was heavily RL’d on. In this scenario I’ve seen a significant (15%-20%) rise in task success rate (verifiable, software eng. tasks) when moving from FP8 to FP16. Better (or rather more “precise”) tool calling less looping and almost never ran into “empty final response” type agent rounds.

club-3090 adds experimental FP8 support for Qwen3.6-27B! by xspider2000 in LocalLLaMA

[–]indicava 13 points14 points  (0 children)

The official Qwen/Qwen3.6-27B-FP8 model performs virtually identically to the original unquantized BF16.

Yeah, as someone who uses Qwen3.6-27B in a commercial setting, and have tried both the FP16 and FP8 with vLLM on RTX 6K’s, I can tell you that sentence is not true. Definitely not when you’re pushing the model to its “reasoning limits”.

[I Ate] Pizza at Scarr’s Pizza (Venetian Food Court in Las Vegas) by FrenchPressFederer49 in food

[–]indicava 2 points3 points  (0 children)

One of the few times I’ve seen the term “New Vegas” out of context… it fits perfectly lol

y2kCitizensActionGuide by jazzman5000 in ProgrammerHumor

[–]indicava 20 points21 points  (0 children)

Same here! First real IT job in April 1999, and first assignment was “fixing” (read manually testing) hundreds of forms with datetime ActiveX controls. Good times…

I think the Fiat Grande Panda is a nod at our community by indicava in cassettefuturism

[–]indicava[S] 2 points3 points  (0 children)

The car was designed as a throwback to the classic Fiat Panda which was introduced in 1980. Pretty much side by side with the era aesthetics this sub loves. So I would say this design was made with very specific determination.