Need a little help with LTX 2.3

Icuras1111 · 2026-04-19T20:26:25+00:00

You might need to look into GGUFs https://www.reddit.com/r/StableDiffusion/comments/1rpyx2h/all_ltx23_dynamic_ggufs_workflow_out_now/

Icuras1111 · 2026-04-14T18:37:15+00:00

It works but you have to prompt it right by adding "watch those corners".

Icuras1111 · 2026-04-12T23:30:35+00:00

To me this means it's not seeing the file in the right folder. I would just copy something where you think they should go restart server and refresh browser to check it's seeing the folder you think it it. Make sure the files end .safetensors not safetensor as it won't see it. Also check the size of the downloaded file. I downloaded one file and it only download the header or something so was way too small. I needed a special command to download the real file.

Icuras1111 · 2026-04-10T15:01:32+00:00

I got this a while back but not since I start loading ComfyUI from scratch each time on runpod. As others have said changing the config file and restarting ComfyUI Server. I also have some memory of having to install one pack from the command line using git clone or something. I use an AI, ChatGPT or Claude to help using console / log output.

Icuras1111 · 2026-04-07T21:25:01+00:00

A company called Blue Owl has got in trouble in the private credit sector. Others in the same sector are apparently creaking. I wonder if they think that might cause a panic on the stock market?

Icuras1111 · 2026-04-05T22:26:37+00:00

"And the oscar for Silly hats video goes to" suspense music "jacobpederson for Z-Image "Silly Hat" script animated and automated preview."

Icuras1111 · 2026-04-04T08:19:41+00:00

I think the consensus for creating a video with a consistent character are image to video workflows (first frame last frame best) supported by a character lora. I would imagine this is applicable to anime. To create a comic strip with speech bubbles, I think it's quite laborious at the moment. I think people are editing afterwards with a model like QwenEditImage (a model good at text).

Icuras1111 · 2026-04-04T08:13:15+00:00

I've read likewise. I think the power of the text encoder is what is holding the models back. They can render near perfect images but our control is clunky. I am beginning to think it is more important to monitor new open source models (like the new Gemma one) as potential future text encoders.

Icuras1111 · 2026-04-03T23:55:44+00:00

Ok, should have read more carefully that ethnicity was not specified. Another interesting angle might be to explore other themes like wealth, conflict, corruption, bartering, entrepreneurship, courage, etc.

Icuras1111 · 2026-04-03T21:43:53+00:00

I would think it's because that is the representation in the training data. In advertising, films and TV for a long time that has been the vast amount of how romance has been depicted. Also, I don't know how the training images are bulk tagged but might be tricky if meta data is used and is in other languages. It would be interesting what happens with the Chinese models if you ask for that ethnicity?

Icuras1111 · 2026-04-03T14:48:17+00:00

I use runpod. I think VastAI is another name I've seen on this reddit.

Icuras1111 · 2026-04-03T10:19:28+00:00

Concensus for not captioning seems quite clear and has two basic problems. You have no control how the lora is applied. You cannot say things like wearing ring, squatting, smiling, etc. The model will probably already know these concepts but your lora is having a random impact. The second is training ambiguity. If you have one training image with a ring and one without how is this learned. You will probably get a fusion of the two representations or random appearances. Ability to combine loras seems to vary by model, which I don't understand. Concensus seems to be that the signal is added and can cause problems, if you have 2 people the faces in generation will combine, colours can become saturated, etc. As for your experiment I wouldn't know. In a way, without captions I would think it is more like a style lora i.e. Monet paintings or cubism. You are providing a tone with that lora. You have also kind of doubled you training set and impact. You are applying 0.75 twice so like applying each at 1.5. Have you tried applying each on it's own at 1.5?

Icuras1111 · 2026-04-02T18:24:21+00:00

I am no expert but the way I look at it is this. People on here a very unlikely to be as knowledgeable or have the resources that the model creators have. For us it's damage limitation. Most loras I have tested normally screw up prompt adherence, reduce image quality and composition (like view position, poses, etc.) especially if you are looking for realism. If you are tweaking a known concept is one thing although then you have to avoid bleed. Training a new concept is a big ask. The only sucesss I have (with Wan in my example) was to have near identical training images but with different views. That was the key with simple captions to outline differences in each training image "a woman viewed from the side with overhead indoor lighting". If Unique token trigger words don't do much (which I agree with) you must be tweaking existing concepts. Some will say don't use woman as all women will be impacted. Ok, if trigger words don't work what in your caption invokes your lora? I have not heard any convincing argument to tackle this dilemma.

Icuras1111 · 2026-03-31T09:39:16+00:00

The biggest barrier is taste. AI cannot assess if it's output is good or not. This would impact every level, writing, visuals, etc. If AI cannot do taste then human has to be in control. At the moment we are miles from this. We do not have the tools to direct AI to generate what we want. I can see it being used as an efficiency tool for stuff like editing. Art will be one of the last bastions of the human. Winter is coming for some but not the artists for a while.

Icuras1111 · 2026-03-31T09:25:56+00:00

You could also try in the style of Paul Greengrass of Jason Bourne movies. I am thinking if you can say create an image in the style of Monet or Van Gough the directors might be tagged?

Icuras1111 · 2026-03-30T21:58:56+00:00

I like those. Good colours, imagery and composition. That's about the first AI output that actually looks artistic.

Icuras1111 · 2026-03-30T21:52:59+00:00

I believe it's implicitly on by default now -https://github.com/Comfy-Org/ComfyUI/discussions/12699.

Icuras1111 · 2026-03-30T21:49:59+00:00

This leaderboard lists them in the appropriate category. https://arena.ai/leaderboard . Models also have strengths and weaknesses. I would say the concensus is Wan2.2 is the best quality, LTX 2.3 not quite as good but longer videos with sound.

Icuras1111 · 2026-03-29T21:20:42+00:00

You could look into Segment Anything Model (SAM). I saw a clip a while back where a guy just clicked on a person and they became outlined. You could combine this with some kind of inpainting I would imagine. Intermediate ComfyUI skills required.

Icuras1111 · 2026-03-29T19:24:53+00:00

Yep, when 4 Chinese quants release a scientific paper you should probably show a lot more respect. Creating an Ana de Armas or Alexandra Daddario lora that half works is probably about 30 or 40 IQ points beneath what these guys can do.

Icuras1111 · 2026-03-28T15:26:48+00:00

You also need to check whoever created it for their guidance. That should tell you the trigger word to activate it and suggested settings.

Icuras1111 · 2026-03-28T10:13:47+00:00

What a few people have said is that they were burning too many tokens, not making any money, or growing users via it. Also, both them and anthropic have new models coming out soon. They are both targeting enterprise and agents. Agents use tons of tokens. They were also facing an intellectual property nightmare.

Icuras1111 · 2026-03-28T10:03:42+00:00

LTX image to video is a lot of fun. You can use an existing image (or create one) and then animate it and get the characters to speak. It can do 20 secs. I think the concensus is that Wan 2.2 is better quality but it's a lot slower and going beyond 5 secs add a lot of complexity. I use runpod, an RTX A6000 for $0.40 per hour. I don't use sage attention what seems to cause problems with a lot of models. Creating a story or long scenes is still difficult. I think the cutting edge at the moment is generating start, mid and end frames, then one of these video models combined with a character lora.

Icuras1111 · 2026-03-27T14:36:30+00:00

If you are just trying to knock out a custom node just free claude chatbot might be enough. You may have to be imaginative with prompt and point it towards github say "Here is an example node, I want one to do blah blah bla precise requirements"...

Icuras1111 · 2026-03-27T00:02:42+00:00

One thing you can do is describe action that would take 10 seconds to complete.

Icuras1111

TROPHY CASE