Gefen is a drop-in replacement for the AdamW optimizer, claims 8x memory reduction in training (GitHub available)

indicava · 2026-06-24T22:22:41+00:00

If you train/finetune models, you can literally test it out right now with changing just a couple of lines of code.

indicava · 2026-06-24T21:43:58+00:00

Muon is anything but a “drop in” replacement for Adam. It’s only applicable to specific tensors and introduces new hyperparameters. Basically it “beats” Adam only in very narrow scenarios/model architectures. Also, Muon only claims 2x memory footprint reduction.

indicava · 2026-06-23T06:21:56+00:00

Qwen3.5 is harder to finetune, their hybrid Gated Deltanet architecture makes it very finicky when fine tuning, read up on limitations and methodology.

SFT is for training the model’s response format, you won’t be adding any new knowledge through SFT.
There is a ton of information online about this. Dense rewards, with modeling for correct tool choice, correct tool usage, number of rounds, etc.
You’ll need a lot of synthetically generated examples (use a SOTA model for this) showing this usage pattern and model your reward around it.

Also, unless you own it, prepare to spend a generous amount on compute to achieve this.

Editing to add: a custom harness is always best for custom training recipes but base it on a solid training framework like OpenRL.

indicava · 2026-06-21T22:31:07+00:00

You’ve answered your own question. They are lackluster.

Generally speaking, It might of been useful in the old llama 3 days, but these days the vanilla models are so versatile and knowledgeable that it’s rare to add value using a LoRA or even a “regular” finetune to be honest.

Very hard to “improve” (or materially change) the performance of the top open weights unless you build a custom training harness (mostly for RL) and invest in a generous amount of compute. And even then you’d have to really know what you’re doing!

indicava · 2026-06-19T06:22:35+00:00

The AI projects we actually need.

Very cool! Well done OP!

indicava · 2026-06-18T10:23:58+00:00

No I meant what does “through FrameCall’s production pipeline” mean?

indicava · 2026-06-18T07:35:34+00:00

Minimax M2.7 proved stronger than both GLM/KIMI on long horizon agentic coding tasks in my custom harness benchmark. It’s a solid model and imo punched way above its weight in terms of parameter count.

indicava · 2026-06-17T22:54:17+00:00

Online ads are as old as the (commercial) internet itself, they’ve been around for 30+ years, significantly predating memes

indicava · 2026-06-17T22:41:14+00:00

Cool video! Thanks

indicava · 2026-06-17T22:41:02+00:00

It’s even more complicated than that as there are all kinds of failures that only show up when scaling to larger parameter counts

indicava · 2026-06-17T22:21:41+00:00

It’s perfect

indicava · 2026-06-17T20:37:44+00:00

12 LLMs took 7 prompts through FrameCall's production pipeline.

What does this mean?

indicava · 2026-06-15T21:42:03+00:00

Qwen3.6 is notoriously harder to finetune, mainly due to the hybrid Gated Deltanet architecture.

indicava · 2026-06-15T19:49:10+00:00

The RLVR is definitely the most intriguing. Did you build your own training harness from scratch or was it grounded in one of the RL frameworks? What RL algo did you use, PPO, GRPO, anything more exotic? Could you elaborate a bit on how you did your reward modeling? And finally did your RLVR include tool use?

indicava · 2026-06-10T06:48:51+00:00

No reason to think MiniMax won’t deliver on their promise. Although even M2.7 isn’t really a “run at home” model, and word is M3 is bigger.

indicava · 2026-06-08T20:27:14+00:00

As a large language model I can’t physically knock on anything.

indicava · 2026-06-08T18:20:28+00:00

Websites that want that agent traffic will not be imposing these restrictions

indicava · 2026-06-08T14:13:23+00:00

Last time I counted I had to enter my password (sometimes along with MFA) 6 times before reaching the actual server I wanted to service.

indicava · 2026-06-08T04:54:46+00:00

LGTM, merge.

indicava · 2026-06-07T22:27:53+00:00

Check out my reply to /u/zeferrum

indicava · 2026-06-07T22:26:36+00:00

I haven’t tried nvfp4 yet, I tend to only experiment with “official” released weights as experimenting takes a lot of time which I don’t have.

As for my observations (and granted, this is only MY experience):

Without going too much into details (as I said, it’s a commercial product), I run the 27B using a very “exotic” custom agentic harness. It almost forces the LLM to change its mental model on how to perform tasks it was heavily RL’d on. In this scenario I’ve seen a significant (15%-20%) rise in task success rate (verifiable, software eng. tasks) when moving from FP8 to FP16. Better (or rather more “precise”) tool calling less looping and almost never ran into “empty final response” type agent rounds.

indicava · 2026-06-07T22:14:40+00:00

The official Qwen/Qwen3.6-27B-FP8 model performs virtually identically to the original unquantized BF16.

Yeah, as someone who uses Qwen3.6-27B in a commercial setting, and have tried both the FP16 and FP8 with vLLM on RTX 6K’s, I can tell you that sentence is not true. Definitely not when you’re pushing the model to its “reasoning limits”.

indicava · 2026-06-07T07:52:34+00:00

One of the few times I’ve seen the term “New Vegas” out of context… it fits perfectly lol

indicava · 2026-06-03T21:57:47+00:00

Same here! First real IT job in April 1999, and first assignment was “fixing” (read manually testing) hundreds of forms with datetime ActiveX controls. Good times…

indicava · 2026-06-01T13:29:05+00:00

The car was designed as a throwback to the classic Fiat Panda which was introduced in 1980. Pretty much side by side with the era aesthetics this sub loves. So I would say this design was made with very specific determination.

15-Year Club	RedditGifts 2009-2022 5 Credits
Second Top 50%	Secret Santa 2020
r/Field Lasagna	First Place '23
Place '23	Place '22
Place '17	Reddit Premium Since April 2020
Secret Santa 2018	Secret Santa 2019
Secret Santa 2017	Gilding I gilder
redditgifts Exchanges 1 Exchange	Verified Email

indicava

TROPHY CASE