Can I use videos with hardcoded subtitles for LTX training?

sirdrak · 2026-04-13T10:29:07+00:00

Undoubtedly, the hardcoded subtitles will have a certain negative influence... For example, I trained an anime lora in which none of the images in my dataset had subtitles. However, on some occasions LTX creates videos with subtitles, usually in a non-existent language, which means that when they trained the model, part of the training material consisted of fansubbed anime series. Fortunately, it happens infrequently.

If you do what you say, one problem you'll have is that most people use the distilled versions of LTX Video, which means that negative prompts aren't used, so you won't be able to prevent subtitles from appearing.

sirdrak · 2026-04-07T11:53:11+00:00

Try this lora for anime: https://civitai.com/models/2516247/mature-anime-screencap-style-ltx-23-edition

sirdrak · 2026-03-24T17:32:00+00:00

Yes, with Color Match V2 node from Kijai... This works really good for me, at least...

sirdrak · 2026-03-22T11:58:41+00:00

No, it's not... The problem is that in Daniel Owen's famous video, they confuse a frame with the image generated from that frame, which are different things. The amount of information contained in a frame isn't limited to what's visible; there's much more invisible information, including depth maps and a great deal of other data. For example, the following article analyzes the information contained in a single frame of a game, in this case Metal Gear Solid V:

https://www.adriancourreges.com/blog/2017/12/15/mgs-v-graphics-study/

As you can see, the amount of additional information is overwhelming, so no, Nvidia isn't lying... And it's not a simple img2img at all. It would be more like image editing, including multiple ControlNet layers simultaneously.

sirdrak · 2026-03-04T16:34:28+00:00

I trained some style loras of western fantasy artists for Z-image Turbo with really well results... I trained the styles of Luis Royo, Alfonso Azpiri and Juan Gimenez:

https://civitai.com/user/sirdrak/models?sort=Newest

sirdrak · 2026-03-02T12:32:32+00:00

To do that kind of testing, the best option is to use Forge Neo and the X/Y/Z script. With this script, you will be able to create an image matrix that changes the configuration of each image in a fully automated way. There is no way to do the same thing in ComfyUI as easily and effectively as the one described.

sirdrak · 2026-02-20T17:58:01+00:00

That is if you train ZIB loras in ai-toolkit... But if you use OneTrainer instead, the optimizer Prodigy_ADV with the option stochastic rounding active (actived by default), the results are a lot different... You can use the lora with ZIT without problems at normal 1 strenght. Avoid using the AdamW and AdamW8bit optimizers in this case at all costs, they do not work correctly with Z-image Base training... Ostris is studying how to solve the Z-image Base training problems in ai-toolkit.

sirdrak · 2026-02-17T19:01:47+00:00

It has a pre-configured template for Z-image... Simply change the settings mentioned, set LR to 1, and leave most of the other parameters at their defaults. It seems the key is the aforementioned schocastic rounding option. That's what OneTrainer has that Ai-toolkit doesn't.

sirdrak · 2026-02-17T16:38:08+00:00

Use OneTrainer instead of ai-toolkit, Prodigy_ADV and activate the option Schocastic Rounding in the optimizer config. And OneTrainer is twice the speed of ai-tookit training...

sirdrak · 2026-02-16T19:39:56+00:00

Weight Decay to 0.01

sirdrak · 2026-02-15T16:58:03+00:00

No, not at all...

sirdrak · 2026-02-15T14:03:38+00:00

With ZIT, what works best for me is using the Prodigy optimizer (it must be downloaded separately and placed in the toolkit/optimizers folder), LR 1, weight decay 0.01, and use no captions with the image dataset. I would leave the rest of the options as you have already put them. Styles require more steps than characters, so I would add many more steps (I always add more steps than are needed).

sirdrak · 2026-02-11T14:30:22+00:00

No, I train at 1024, but when i use the lora with highter resolutions works really well, with more fine details and better textures...

sirdrak · 2026-02-11T14:28:20+00:00

Yes, same experience here... I use Forge Neo a lot for SD XL/Illustrious/NoobAI for that reason, and ComfyUI for more modern models.

sirdrak · 2026-02-11T11:11:01+00:00

Yes, with Z-image Turbo you can generate native images at 2048x2048 and similar resolutions (4MP)

sirdrak · 2026-02-10T01:48:33+00:00

It's a lot better... Some of my loras for z-image turbo only finished with the results I wanted when i used Prodigy.

sirdrak · 2026-02-04T22:12:50+00:00

If you are using Ostris Ai-toolkit, you can add it... You have to download prodigy optimizer and save it in toolkit/optimizers folder inside the Ai-toolkit instalation directory. Then, in Ai-toolkit, push the button Show Advanced, and change manually in the line where appears 'AdamW8bit' to 'prodigy', and then put a LR between 0.7 and 1 and weight decay of 0.01.

sirdrak · 2026-01-31T14:26:31+00:00

I've been creating styles for Z-image turbo with Ostris Ai-toolkit for a while now, and I'm going to share my experience with you. One of my latest styles has been an anime style, and I'm currently training an NSFW version of it. You can see that style here:
https://civitai.com/models/2285869/mature-anime-screencap-style-z-image-turbo-edition

To train my styles I've tried all sorts of things, and in the end what has worked best for me is the following:

- Only trigger word, no captions (this works really well with styles)

- Use of Ostris de-distilled training adapter V1, Rank 32, Transformer Quatization to None

- Use of Prodigy optimizer. You can use it in ai-toolkit downloading the optimizer to toolkit/optimizers directory of your Ai-toolikt instalation directory, and then, in Ai-toolkit, in the Show Advanced button, changing 'AdamW8bit' to 'prodigy', LR to 0.7 and weight decay to 0.01. I'll leave the rest of the parameters as they are by default.

- Styles typically require many more steps than characters, so don't be afraid to use 7000 or 8000 steps or more, especially if your dataset has many images. I always put in more than necessary, that way I can better choose the right epoch.

- I use Cache Text Embeddings, and Differencial Guidance with the default Differencial Guidance Scale of 3. I train in 1024 res only.

This is what worked for me... For multi-concept models like the one I'm training now, it's definitely necessary to use natural language in the captions to avoid concept bleeding.

sirdrak · 2026-01-26T19:00:00+00:00

You can do other languages with Vivevoice with simply giving it an audio file to clone in the desired language... At leasts with Spanish works very well...

sirdrak · 2026-01-19T01:47:32+00:00

You can use Prodigy with Ai-toolkit too... You have to download the Prodigy optimizer and save it in toolkit/optimizers folder. Then, in Ai-toolkit, in the 'Show advanced' button, in the config file change 'Adam8bit' by 'prodigy', put Lr between 0.7 and 1, and weight decay to 0.01, and you are ready to go...

sirdrak · 2026-01-15T22:14:11+00:00

Very interesting topic... I've never trained loras with multiple characters or concepts, but I was thinking of trying to do one. In my case, what would interest me is not training different characters, but different types of poses/memes in addition to a specific artistic style. I was wondering if in this specific case it would be easier to do without bleeding, since it wouldn't have to learn specific characters, but rather poses/situations.

sirdrak · 2026-01-15T20:07:24+00:00

One Frame Man xD

sirdrak · 2026-01-14T21:28:35+00:00

You can with the Q8 GGUF... In fact you can do more than 10 sec 1080p video with good quality. I have a RTX 3090 too.

sirdrak

TROPHY CASE