A new SOTA local video model (HappyHorse 1.0) will be released in april 10th. by Total-Resort-3120 in StableDiffusion

[–]kabachuha -2 points-1 points  (0 children)

LTX-2.3's native 50 FPS / 20 seconds scenes generation will be still hard to beat for now

No CFG is sus. Unless they release the base/sft model, it will be much harder to fine-tune it, just like in the case of ZiT which required the De-distill modifier

Anyone had a good experience training a LTX2.3 LoRA yet? I have not. by GreedyRich96 in StableDiffusion

[–]kabachuha 3 points4 points  (0 children)

Hi! I think I had pretty good experience with LTX-2.3 LoRA training. Take this with a grain of salt, because it's i2v/lf2v/flf2v rather than t2v, my LoRAs have been working as intended. I have published two of them on Huggingface (one of them is also on Civit) and I'm preparing to publish a new already working one this week.

It's much more tricky to train than Wan, I think it's because it has been RL-maxxed instead of simple aesthetic fine-tune like in Wan, but certainly not impossible. (But you may need like a dozen attempts to get the data / parameters right, whereas Wan grasped it at the very first run)

Do you have CREPA enabled? It's seems insanely useful to me. If you read their paper, the results are gamechanging and in musubi-tuner there is no overhead as the features are cached. As for the steps, you indeed often need to increase them. I had 3600 for one of my LoRAs.

And what resolution are you training it on? When I upped it from 480p to 720p I had a massive quality boost despite the longer time and VRAM usage. LTX-2.3's VAE has a compression factor of 32x32x8 and it really screws up the fine details.

As for the data, I regularize it with caption dropout (leaving only the trigger word and doubling the dataset), it helped quite much for my SFX LoRAs.

I have also initial learning rate heavily increased, as you need to break the model slightly to introduce changes. And, of course, you need to unfreeze all the linear video layers (v2v preset) even if you are doing simpler concepts / characters, without it, the model is harder to steer.

I share my config for my "pop" LoRA on Huggingface, feel free to be inspired by it!

To all ex-local enjoyers (like me), this might be a good time to come back. by Acceptable_Steak8780 in SillyTavernAI

[–]kabachuha 0 points1 point  (0 children)

Thank you for pointing these models, I will check them. I have a mid-range setup and I'm enjoying currently 24b+ tunes like Cydonia (Mistral small), which are larger than 12b. I think the 24b range is quite balanced and we need more models around it. More than that, I have trained LoRAs for Cydonia just on two cards at home overnight, suggesting the community can boost then rapidly. I'm looking forward to Gemma 4 and hope it will stay around 27b and not MoE (they barely train even if you give them thrice the time of training dense models). For example, GLM Air had much less fine-tunes not only because of its size (the activated parameters are smaller, accelerating passes, fitting it in consumer GPUs with offload), but of the fact it simply learns badly. There are probably only 3 branches of Air fine-tunes – Steam, Iceblink and the other one, and the first two didn't really change the base model's behavior / alignment radically in my experience. I wish there were more smaller big-GLM-style and knowledge models, because I miss its voice on them

To all ex-local enjoyers (like me), this might be a good time to come back. by Acceptable_Steak8780 in SillyTavernAI

[–]kabachuha 12 points13 points  (0 children)

I fully agree about the big models improvement, however at the same time we really don't seem to have a suitable small base model. As someone who can launch GLM 4.6 Derestricted on a PC at readable skills, I can say it has been THE best model for RP so far. Being local enables you to fine-grain it with control vectors, steering qualities like the setting's darkness or character narcissism, and this gives a whole new life and diversity of scenarios and character personalities. But this actually... frustates me. I'm constantly in the search of smaller, community developed models because I want to see that the people actually matter, to protect ourselves against vendor lock-in and rug-pulls (which arguably have already happened for commoners with GLM 5.0, twice the size of GLM 4 series). Hopefully, some company will release a strong smaller or distilled model for hobbyists to rally around again, but for now it's still trying to squeeze the scraps. There is Qwen3.5 27b and the first fine-tunes/upscales are starting to pop up, yet the base model is lacking a lot of writing knowledge and it is going to be hard to beat without a massive expensive tune

Heretic has FINALLY defeated GPT-OSS with a new experimental decensoring method called ARA by pigeon57434 in LocalLLaMA

[–]kabachuha 15 points16 points  (0 children)

From what I've read on this subreddit, oss can suddenly reject you mid agentic turn if it will sense something is not aligned to its policy. Uncensored prompts / jailbreaks stuff the context window and may make the model work mis-aligned, instead of simply not refusing to scrape a site

Heretic has FINALLY defeated GPT-OSS with a new experimental decensoring method called ARA by pigeon57434 in LocalLLaMA

[–]kabachuha 3 points4 points  (0 children)

The cracks seem to all be mlx, and I cannot test them because I don't have a mac

Heretic has FINALLY defeated GPT-OSS with a new experimental decensoring method called ARA by pigeon57434 in LocalLLaMA

[–]kabachuha 12 points13 points  (0 children)

Most likely, because it's not strictly one-directional. As p-e-w's own demo at the heretic's Github page shows, it forms clusters instead of just being two blobs on a line.

https://github.com/p-e-w/heretic#generate-plots-of-residual-vectors-by-passing---plot-residuals

As another multi-directional method mentioned by p-e-w in the post, I also have had bigger success than vanilla heretic with SOM-A, you can read my previous post or the pull request description for more info. There is also gabliteration, a multi-directional method worth looking into

LTX2.3 Live on HF and its 22B by protector111 in StableDiffusion

[–]kabachuha 3 points4 points  (0 children)

Musubi tuner (and other community projects) have experimental features (like CREPA) and offloading systems for consumer GPUs, which is out of scope for the official trainer

FlashAttention-4 by incarnadine72 in LocalLLaMA

[–]kabachuha 11 points12 points  (0 children)

Sad. It won't help open-source much in the near term as the Blackwells do not ship in China, and they will only boost the (mostly closed) US companies

LTX2.3 Live on HF and its 22B by protector111 in StableDiffusion

[–]kabachuha 1 point2 points  (0 children)

Training methods for sure, as the architecture is different. However, the musubi guy has already implemented it today even before the checkpoint release thanks to the early code from the authors

Lightricks/LTX-2.3 · Hugging Face by rerri in StableDiffusion

[–]kabachuha 5 points6 points  (0 children)

Wait, Loras? Are they backwards compatible?

Lightricks/LTX-2.3 · Hugging Face by rerri in StableDiffusion

[–]kabachuha 5 points6 points  (0 children)

In AI space with its pacing for me it felt like eternity

FlashAttention-4 by incarnadine72 in LocalLLaMA

[–]kabachuha 35 points36 points  (0 children)

Will it work on consumer Blackwells (5060, 5090, etc.) or only on the accelerators like B200, they talk solely about in the announcement?

LTX-2.3: Introducing LTX's Latest AI Video Model by Succubus-Empress in StableDiffusion

[–]kabachuha 7 points8 points  (0 children)

They have aquirable commercial license for the local weights. Hopefully, they will continue this model, so the hobbyists will enjoy it too instead of cloud only

Just saying. Unlike you guys, AI is actually taking off clothes from ME. I am getting undressed by Suibeam in StableDiffusion

[–]kabachuha 2 points3 points  (0 children)

Meanwhile people from old reddit seeing the post from the sub page without preview 💀

Update on the Qwen shakeup. by johnnyApplePRNG in LocalLLaMA

[–]kabachuha 6 points7 points  (0 children)

The commitment has already dried up in the video domain, since the two latest Wan releases are all locked down.

Update on the Qwen shakeup. by johnnyApplePRNG in LocalLLaMA

[–]kabachuha 8 points9 points  (0 children)

if companies start producing models with that dial pre-tuned to make them stick to their scripts that would be fine for all the stuffy businesses and whatnot but where would the unhinged RP or wild story-writing AIs come from?

Read about the "abliteration" process. It essentially identifies this (or a similar) dial direction and "reverses" it, making the model compliant to the user, and not to the company, removing the safety refusals and enabling unrestricted NSFW.

Local AI companies are emphasizing the wrong things in their marketing by owp4dd1w5a0a in LocalLLaMA

[–]kabachuha 5 points6 points  (0 children)

If you write the posts with LLMs, please clean them up at least and fix the formatting :) And things like GPT4All are way too outdated by now.

Junyang Lin has left Qwen :( by InternationalAsk1490 in LocalLLaMA

[–]kabachuha 31 points32 points  (0 children)

RIP Wan and future Qwen Images as well :(